# Curve fitting with genetic algorithm

Before reading this blog it would be great if you could read and get familiarized with Guessing a Number with Genetic Algorithm.

In this blog we would like to fit a line with genetic algorithm for the equation \(y = 7x + 5\). So let’s write the equation in Julia

```
y(x) = 7x + 5
```

Output:

```
y (generic function with 1 method)
```

Now let’s assign `x`

to range 1 to 5 in steps of 0.1

```
x = 0 : 0.1 : 5
```

Output:

```
0.0:0.1:5.0
```

Now let’s plot the line

```
using Plots
```

```
scatter_plot = scatter(x, y.(x), xlims = (-0.1, 5.5), ylims = (0, 50))
```

There is a property in Julia which I want you to remember, let’s say we have a tuple `(5, 7)`

and we assign it to two variables `m`

and `c`

as shown below

```
m, c = (5, 7)
println("m = ", m)
println("c = ", c)
```

Output:

```
m = 5
c = 7
```

Then the first variable will take the first value of the tuple and second will take the second value. We will be using this technique in this blog.

For genetic algorithm, mutation or change is very important. Let’s write a mutation function for a two valued tuple as shown:

```
function mutate(value, mutations = 10)
[value + rand() - 0.5 for i in 1:mutations]
end
function mutate(mc::Tuple, number_of_mutations = 10)
m, c = mc
ms = mutate(m, number_of_mutations)
cs = mutate(c, number_of_mutations)
[(ms[i], cs[i]) for i in 1:number_of_mutations]
end
```

Output:

```
mutate (generic function with 4 methods)
```

and we test that function:

```
mutate((1, 2))
```

```
10-element Vector{Tuple{Float64, Float64}}:
(1.1205168123693385, 2.41514037266611)
(0.5079882676843603, 1.8217999723149112)
(1.1581739571074665, 2.459460436379917)
(0.5796738025677814, 1.9702636743774655)
(1.0321606616260794, 1.7632954510809746)
(1.3545156402167784, 1.8782505949943635)
(1.3407759822406362, 2.3597713445300306)
(0.8524954649600578, 2.4966166282430047)
(0.991673117157789, 2.4749204653332812)
(0.9933905544923662, 1.5623361787716572)
```

Now let me explain how it works. First we have a mutation function that takes single value and mutates it as shown below, to learn it you may go to this blog.

```
function mutate(value, mutations = 10)
[value + rand() - 0.5 for i in 1:mutations]
end
```

Now we kind of want to extend this `mutate`

function to tuple, so we write a declaration for it as shown

```
function mutate(value, mutations = 10)
[value + rand() - 0.5 for i in 1:mutations]
end
function mutate(mc::Tuple, number_of_mutations = 10)
end
```

Note that unlike the `mutate(value, mutations = 10)`

this `mutate(mc::Tuple, number_of_mutations = 10)`

clearly says it needs a tuple. Now imagine I pass a tuple to `mutate`

like `mutate((1, 2))`

, clearly Julia will pass it to `mutate(mc::Tuple, number_of_mutations = 10)`

and not to `mutate(value, mutations = 10)`

. So now let’s define the tuple `mutate`

.

Let’s assume the tuple will have only two values, so the first value should be assigned to a variable `m`

and second one to `c`

as shown

```
function mutate(mc::Tuple, number_of_mutations = 10)
m, c = mc
end
```

Now all we need to do is to mutate `m`

and `c`

by `number_of_mutations`

times and combine it. To mutate `m`

and `c`

we use the following code

```
ms = mutate(m, number_of_mutations)
cs = mutate(c, number_of_mutations)
```

In the above code `mutate(m, number_of_mutations)`

and `mutate(c, number_of_mutations)`

calls `mutate(value, mutations = 10)`

. We store the values of mutation in variable `ms`

, I pronounce it as emmmss and other in seas `cs`

, let’s plugin this code into our function as shown

```
function mutate(mc::Tuple, number_of_mutations = 10)
m, c = mc
ms = mutate(m, number_of_mutations)
cs = mutate(c, number_of_mutations)
end
```

Now we combine these mutations to form a `Array`

of `Tuple`

’s and return it as shown

```
function mutate(mc::Tuple, number_of_mutations = 10)
m, c = mc
ms = mutate(m, number_of_mutations)
cs = mutate(c, number_of_mutations)
[(ms[i], cs[i]) for i in 1:number_of_mutations]
end
```

Now let’s write an error function, let’s say that we have `m`

and `c`

that our algorithm recommends, for a given `x`

, the prediction will be `(m * x + c) `

, and in the training phase the value of `y`

may vary from our prediction, so, we need to minus `y`

from our prediction to get error as shown:

```
∆(m, c, x, y) = (m * x + c) - y
```

Output:

```
∆ (generic function with 1 method)
```

Let’s say our `m`

is 5, and `c`

is 4, let’s see what’s out error when `x`

is 10 as shown below

```
∆(5, 4, 10, y(10))
```

Output:

```
-21
```

Now some one says he has 10 values of \(x\) and corresponding \(y\) with him, he wants us to fit a line to predict the unknown stuff, we decide a value of \(m\) and \(c\), we need to know the total error for all values of our prediction, so let’s write a `total_error`

function as shown:

```
function total_error(m, c, x, y)
ΣΔ = 0
for i in 1:length(x)
ΣΔ += abs(∆(m, c, x[i], y[i]))
end
ΣΔ
end
```

output:

```
total_error (generic function with 1 method)
```

In the `total_error`

, we have a variable `ΣΔ`

to store the total error, we assign it to zero as shown:

```
function total_error(m, c, x, y)
ΣΔ = 0
end
```

now for each element if `x`

```
function total_error(m, c, x, y)
ΣΔ = 0
for i in 1:length(x)
end
end
```

we add absolute value of error to total error variable `ΣΔ`

```
function total_error(m, c, x, y)
ΣΔ = 0
for i in 1:length(x)
ΣΔ += abs(∆(m, c, x[i], y[i]))
end
end
```

we return the total error variable `ΣΔ`

```
function total_error(m, c, x, y)
ΣΔ = 0
for i in 1:length(x)
ΣΔ += abs(∆(m, c, x[i], y[i]))
end
ΣΔ
end
```

Now let’s check the `total_error`

function

```
total_error(5, 4, [1, 2, 3, 4, 5], [1, 2, 3, 4, 5])
```

Output:

```
80
```

Seems to work!

Let’s now write our `top_survivors`

function, this one takes `Array`

of \(m\)’s and \(c\)’s `Tuple`

’s as `mcs`

(I spell it as *emm seee’s*), an array of training inputs namely `x_train`

and `y_train`

, and selects the `top_percent`

of `mcs`

that that deviates the least with `y_train`

for the given `x_train`

.

```
function top_survivors(mcs, x_train, y_train, top_percent = 10)
errors_and_values = []
for mc in mcs
m, c = mc
error = total_error(m, c, x_train, y_train)
push!(errors_and_values, (error, mc))
end
sorted_errors_and_values = sort(errors_and_values)
end_number = Int(length(mcs) * top_percent / 100)
sorted_errors_and_values[1:end_number]
end
```

Output:

```
top_survivors (generic function with 2 methods)
```

I would encourage you to read my previous Genetic Algorithm blog to understand `top_survivors`

function.

Now let’s setup the needed inputs for `top_survivors`

, let’s have initial `mcs`

as `(0, 0)`

:

```
x_train = [1, 2, 3, 4, 5]
y_train = y.(x_train)
mcs = mutate((0, 0))
```

Output:

```
10-element Vector{Tuple{Float64, Float64}}:
(0.16087862232389538, 0.018092305432591882)
(0.08312155751992667, -0.022630590705241538)
(-0.32885966826028445, -0.4936484604469795)
(-0.19041016854936288, 0.4707757045419352)
(0.011998626752193209, -0.276041600093212)
(-0.18351337891221542, -0.31274629114220875)
(0.014823442262999142, 0.03761731346306618)
(-0.19617757028312166, -0.11227454061420317)
(0.07142332361410042, -0.2765731664220268)
(0.3720719485797561, 0.44332686459739223)
```

Let’s test it out:

```
top_survivors(mcs, x_train, y_train)
```

Output:

```
1-element Vector{Any}:
(122.2022864483167, (0.3720719485797561, 0.44332686459739223))
```

Looks like our `m`

of 0.3720719485797561 and `c`

of 0.44332686459739223 are the fittest with an error of `122.2022864483167`

. Now all we need to do it to tell it to select top survivors across many generations so that we get a near perfect fit as shown:

```
generations = 40
top_survivor = Nothing
mc = (0, 0)
for i in 1:generations
mcs = mutate(mc)
top_survivor = top_survivors(mcs, x_train, y_train)[1]
_error, mc = top_survivor
end
hm, hc = mc
p = scatter(x_train, y_train)
h(x) = hm * x + hc
plot!(p, x, h.(x))
```

Output:

The code below is almost the same as code above, but note the `@gif`

which collects the plot images into a gif file so that we can see it as animation:

```
generations = 40
top_survivor = Nothing
mc = (0, 0)
@gif for i in 1:generations
mcs = mutate(mc)
top_survivor = top_survivors(mcs, x_train, y_train)[1]
_error, mc = top_survivor
hm, hc = mc
p = scatter(x_train, y_train)
h(x) = hm * x + hc
plot!(p, x, h.(x), ylims = (0, 50))
end
```

Output:

```
┌ Info: Saved animation to
│ fn = /Users/mindaslab/data_science_with_julia/code/tmp.gif
└ @ Plots /Users/mindaslab/.julia/packages/Plots/g581z/src/animation.jl:104
```

See how in 40 generations our algorithms nearly fits a line to `y(x) = 7x + 5`

, you may change 7 and 5 in `y(x)`

to any thing and our algorithm in successive generations will try to fit line to a bunch of points better and better in successive generations.

When I tried, this was the final value I got for \(m\) and $$c$

```
mc
```

Output:

```
(6.838106002778459, 5.334256635688782)
```

In your computer it could vary, but the line will be a near perfect fit for the bunch of points.

You can get the code for this blog here https://gitlab.com/data-science-with-julia/code/-/blob/master/genetic_alogorithms_line_fitting.ipynb.