Guessing a Number with Genetic Algorithm
What is genetics? What makes us change? Well let’s take for instance the corona virus pandemic, there are some humans who are naturally immune to it, some people who are not. Say that we are not progressed so scientifically, then what would have happened? People who are not immune to it will perish, some will have reduced biological function and may produce less kids, finally a humanity will emerge which is totally immune to corona. These happen because there is tiny variations that are set into us during our birth that makes us different from our mom and dad. So producing more offspring means more variations and more chance of surviving an epidemic we have. That’s why all species that had survived has a very strong urge to reproduce. The more the urge more the offspring variations and survival.
Okay let’s code. So lets define a universe where only numbers that are closest to 42 exist. Why 42? Because its the ultimate answer.
So our universe is defines by this number here:
number = 42
Output:
42
This is a number we must guess using evolutionary technique.
Now our universe need to calculate the difference between the ultimate answer and a values that exist in it, so we define a error function as follows:
function ∆(value, number)
number - value
end
Output:
∆ (generic function with 1 method)
Let’s test our error function:
∆(12, 42)
Output:
30
Now let us experiment with random numbers. I want to generate 500 random numbers that vary from -0.5 to +0.5:
rand(500).- 0.5
Output:
500-element Array{Float64,1}:
0.11811737299697067
0.40975918563037483
0.11512815232487483
0.4407493702816716
-0.06317498449812642
0.47123272092330426
-0.10407972969869217
0.10466653256756975
0.316758714503494
0.05275897872543078
-0.0580903153675667
-0.0636241488226077
0.24940396400825926
⋮
0.4043456432783936
-0.4607426587387351
0.4003121999197845
-0.28198845089727187
0.32761734763714667
-0.35388456207140506
0.33622477093610326
-0.3255506303368696
-0.04515216140695988
0.08163723449954996
-0.4425492244863174
0.09909239016396998
Seems to work.
Top Survivors
Next I want to define a function called top_survivors()
which returns the numbers that are most closes to 42 first and least closest to 42 at the last:
function top_survivors(values, number, top_percent = 10)
errors_and_values = [(abs(∆(value, number)), value) for value in values]
sorted_errors_and_values = sort(errors_and_values)
end_number = Int(length(values) * top_percent / 100)
sorted_errors_and_values[1:end_number]
end
Output:
top_survivors (generic function with 2 methods)
Lets see see in detail how top_survivors()
is coded. First let’s start with an empty function definition:
function top_survivors()
end
Now this function should receive values
who’s survival ability will be determined
function top_survivors(values)
end
The survival ability is determined by a number, the values that are closest to the number survive, so we we pass in a number
to the function top_survivors()
:
function top_survivors(values, number)
end
Not all values should survive, say only top 10% of the values
that are closest to the number
should survive, so we have a variable top_percent
and set a default of 10
to it as shown below:
function top_survivors(values, number, top_percent = 10)
end
First we need to find errors, that is the difference between each value
in values
function top_survivors(values, number, top_percent = 10)
[for value in values]
end
compared to number
:
function top_survivors(values, number, top_percent = 10)
[∆(value, number) for value in values]
end
but there is a problem, let’s say number
is 5
and value
is 3
, then the error is 2
, let’s say another value
is 12
hence error ∆(value, number
is now -7
. Since -7 is less than 2, it does not mean that 12 is closer to 5 than 3. What we need is absolute error and we get it using the abs()
function shown below:
function top_survivors(values, number, top_percent = 10)
errors_and_values = [abs(∆(value, number)) for value in values]
end
What are we going to do just with errors, let’s bundle error and corresponding value into a Tuple
using this code (abs(∆(value, number)), value)
as shown below:
function top_survivors(values, number, top_percent = 10)
[(abs(∆(value, number)), value) for value in values]
end
now let us assign it to a variable errors_and_values
:
function top_survivors(values, number, top_percent = 10)
errors_and_values = [(abs(∆(value, number)), value) for value in values]
end
So we have got an Array of Tuples in errors_and_values
where the first element in the tuple is error and second is the value that is associated with the error and it looks something like this:
[
(error_1, value_1), ..... , (error_n, value_n)
]
Now lets sort it so that the least errors and values are at first
function top_survivors(values, number, top_percent = 10)
errors_and_values = [(abs(∆(value, number)), value) for value in values]
sorted_errors_and_values = sort(errors_and_values)
end
and we store it in a variable sorted_errors_and_values
as shown above, now all that is left is to return the top_percent
of errors_and_values
out, those will be our survivors, for that we calculate the end_number
for our array as shown in the last line of the function below:
function top_survivors(values, number, top_percent = 10)
errors_and_values = [(abs(∆(value, number)), value) for value in values]
sorted_errors_and_values = sort(errors_and_values)
end_number = Int(length(values) * top_percent / 100)
end
If values
had 1943 elements then length(value)
will be 1943. This is multiplied by top_percent
by 100, in our case its 0.1 time 1943, that will be 194.3, and we get integer value of it result Int(length(values) * top_percent / 100)
which is 194, this will be stored in end_number
. The top_percent
of errors_and_values
will be sorted_errors_and_values[1:end_number]
and we add this as last line to our function below which means that this will be returned out:
function top_survivors(values, number, top_percent = 10)
errors_and_values = [(abs(∆(value, number)), value) for value in values]
sorted_errors_and_values = sort(errors_and_values)
end_number = Int(length(values) * top_percent / 100)
sorted_errors_and_values[1:end_number]
end
Let’s test top_survivors()
now, let me generate 500 randome numbers from -0.5 to +0.5 and see which is closest to number
survivors = top_survivors(rand(500).- 0.5, number)
Output:
50-element Array{Tuple{Float64,Float64},1}:
(41.50295720311258, 0.49704279688742026)
(41.503072694126054, 0.4969273058739454)
(41.50606492163384, 0.49393507836615624)
(41.509092656735994, 0.49090734326400853)
(41.511967526254274, 0.4880324737457231)
(41.51218378164256, 0.4878162183574357)
(41.51401215336174, 0.4859878466382588)
(41.51538950354761, 0.4846104964523883)
(41.51687210415236, 0.4831278958476404)
(41.520187541041686, 0.4798124589583119)
(41.52202481756689, 0.47797518243311)
(41.52243285351341, 0.47756714648659404)
(41.52250385998039, 0.47749614001960716)
⋮
(41.57023757877993, 0.42976242122007213)
(41.571326533554725, 0.4286734664452725)
(41.57377340302943, 0.4262265969705692)
(41.57432225294512, 0.4256777470548734)
(41.578400139207496, 0.42159986079250356)
(41.579227438079286, 0.4207725619207161)
(41.58325191425531, 0.41674808574469324)
(41.58460955385501, 0.41539044614499)
(41.5857945937749, 0.41420540622509905)
(41.58642578719165, 0.4135742128083526)
(41.589202493410696, 0.4107975065893039)
(41.5916907905559, 0.40830920944409743)
So we see that top_survivors()
returns top 50 closest, the first one being (41.50295720311258, 0.49704279688742026)
that is it has an error of 41.50295720311258 and value of 0.49704279688742026.
Mutations
We know that 0.497 is far far from 42, so how does it inch near 42? The answer is mutation. Say that this number spawns another set of numbers that vary slightly from 0.497, some of it might be more away from 42, but some could be more near. So let’s write a function to mutate a value:
function mutate(value, mutations = 10)
[value + rand() - 0.5 for i in 1:mutations]
end
Output:
mutate (generic function with 2 methods)
Okay this mutate()
function receives a value as first argument, and second argument is number of children it must have which is defined my mutations
which we have set it to a default of 10. In this statement:
[value + rand() - 0.5 for i in 1:mutations]
value
is added with a random number anywhere between -0.5 to 0.5 and is been collected into array and its done mutations
times. So let’s test our mutate function:
mutate(50)
Output:
10-element Array{Float64,1}:
49.70077118627718
50.38719806541724
49.659770263529325
50.49345080427759
49.682957552922375
50.068482872331586
50.418052425411894
49.73656292165122
50.01836131822904
50.030683616700884
So see above how 50 mutates with values slightly varying from it.
I also want to introduce a function called vcat()
that concatenates or joins two arrays as shown:
vcat([1, 2, 3], [4, 5, 6])
Output:
6-element Array{Int64,1}:
1
2
3
4
5
6
Our mutate function mutates only one value, but for our task we have a list of values, so let’s write a function that mutates a list as shown:
function mutate_list(list, mutations = 10)
output = []
for element in list
output = vcat(output, mutate(element, mutations))
end
output
end
Output:
mutate_list (generic function with 2 methods)
So mutate_list
works as follows, first we have a empty function:
function mutate_list()
end
We receive a list
which must be mutated:
function mutate_list(list)
end
Next we have a variable called mutations
that defines the number of mutations that must take place for each element in the list:
function mutate_list(list, mutations = 10)
end
Let’s have an element called output that will collect all mutations
function mutate_list(list, mutations = 10)
output = []
end
We take each value in the list
into a variable called element
:
function mutate_list(list, mutations = 10)
output = []
for element in list
end
end
Now we add mutated values to output
using this statement output = vcat(output, mutate(element, mutations))
as shown below:
function mutate_list(list, mutations = 10)
output = []
for element in list
output = vcat(output, mutate(element, mutations))
end
end
And finally once the operations is done for all elements in the list, we return the output
:
function mutate_list(list, mutations = 10)
output = []
for element in list
output = vcat(output, mutate(element, mutations))
end
output
end
Now let’s test our mutate()
function:
mutate_list((1, 2, 3, 4))
Output:
40-element Array{Any,1}:
1.1981180737202284
0.6826820656478603
1.4263346623043727
1.0445431960263634
0.9986709617577949
0.5041082408398894
1.206211947133361
0.5703313900822442
0.7085107623470692
0.6808788103524246
1.8926516209570248
2.3122314617476407
2.4123045970406345
⋮
2.5112031422146033
3.0098577316693103
3.5139755069699214
3.5552827017499755
3.632721017569514
3.6865403468441524
3.908433918456275
4.330548003821807
3.8199880565807103
4.4004408963825155
3.8622190204543214
4.299693796909806
So we have condition of best survival defined by number = 42
, then we have a function called top_survivors()
which will clean out values it thinks are unfit and have only those that meet the top percent of the criteria, then we have a function mutate()
that will change the numbers. So we have made the bits and pieces of our universe namely:
- Condition for best fit
number = 42
- Way to clean those that are not fitting well
top_survivors()
- A way to change
mutate()
Creating our universe
Now we have to put this all together. So first let us have a initial set of values, a bunch of numbers to start with:
# let there be initial values
initial_values = rand(500)
Brahma the initiator |
First all those values are survivors, so let us define survivors
and assign it to initial_values
:
# let there be initial values
initial_values = rand(500)
survivors = initial_values
Let’s define a variable called generations
that will hold an value of the maximum number of generations allowed in our universe:
# let there be initial values
initial_values = rand(500)
survivors = initial_values
generations = 500
Next for plotting purpose, for us to see what happened visually we sample and store values / numbers that are created in each generation that matches closest to number
, that is 42 in our case, we store that sampled values in top_survivors_sample
# let there be initial values
initial_values = rand(500)
survivors = initial_values
generations = 500
top_survivors_sample = []
Now for each generation:
# let there be initial values
initial_values = rand(500)
survivors = initial_values
generations = 500
top_survivors_sample = []
for generation in 1:generations
end
we make the survivors mutate and create offspring:
# let there be initial values
initial_values = rand(500)
survivors = initial_values
generations = 500
top_survivors_sample = []
for generation in 1:generations
survivors = mutate_list(survivors)
end
Shiva the god of Destruction |
now the force of destruction comes to play only those that are closet to ideal condition are selected in this line errors_and_values = top_survivors(survivors, number)
others are eliminated:
# let there be initial values
initial_values = rand(500)
survivors = initial_values
generations = 500
top_survivors_sample = []
for generation in 1:generations
survivors = mutate_list(survivors)
errors_and_values = top_survivors(survivors, number)
end
Vishnu the God that saves and ensures continuity |
now we salvage only the values for the next generation in the following statement survivors = [value for (error, value) in errors_and_values]
as shown below:
# let there be initial values
initial_values = rand(500)
survivors = initial_values
generations = 500
top_survivors_sample = []
for generation in 1:generations
survivors = mutate_list(survivors)
errors_and_values = top_survivors(survivors, number)
survivors = [value for (error, value) in errors_and_values]
end
Saraswathi the God of knowledge and learning |
Now we record the top 10 survivors into our sample array top_survivors_sample
in this statement push!(top_survivors_sample, survivors[1:10])
, this will help us to plot and understand what happened:
# let there be initial values
initial_values = rand(500)
survivors = initial_values
generations = 500
top_survivors_sample = []
for generation in 1:generations
survivors = mutate_list(survivors)
errors_and_values = top_survivors(survivors, number)
survivors = [value for (error, value) in errors_and_values]
push!(top_survivors_sample, survivors[1:10])
end
And for the first and every 10th generation we print out the top values using these lines
if (generation == 1) || (generation % 10 == 0)
println(generation, " => ", survivors[1:5])
end
So you can see the completed code below:
# let there be initial values
initial_values = rand(500)
survivors = initial_values
generations = 500
top_survivors_sample = []
for generation in 1:generations
survivors = mutate_list(survivors)
errors_and_values = top_survivors(survivors, number)
survivors = [value for (error, value) in errors_and_values]
push!(top_survivors_sample, survivors[1:10])
if (generation == 1) || (generation % 10 == 0)
println(generation, " => ", survivors[1:5])
end
end
Output:
1 => [1.489963570249114, 1.4748790114913484, 1.4622739648861949, 1.4510704237615362, 1.4490538800141282]
10 => [5.660855018791918, 5.598888609438815, 5.593011278753805, 5.5762836556003545, 5.566632716837373]
20 => [10.24997392360673, 10.215807815922725, 10.21448723688237, 10.214464268619894, 10.204137257813436]
30 => [14.909055039342904, 14.889731189243328, 14.884161682306756, 14.874322175303265, 14.867230337010271]
40 => [19.52250107015328, 19.50132983259004, 19.48141475284547, 19.47896089764954, 19.478514865080626]
50 => [24.185076343905422, 24.171229043560263, 24.16911898120017, 24.168797157127305, 24.151685651797276]
60 => [28.777004719428835, 28.73967040390202, 28.730165929867656, 28.721490919382102, 28.705793402309293]
70 => [33.38810260130008, 33.3806820730437, 33.35186837385183, 33.34216038995459, 33.32739352190749]
80 => [37.931008629223165, 37.9235457841195, 37.91415226790492, 37.91195802545634, 37.91106132860069]
90 => [41.99992407747299, 42.00008624788804, 41.99991137236084, 41.99979061755105, 42.000233034952544]
100 => [42.000123781083616, 42.000235748738795, 42.000307239484606, 41.99960275074321, 41.99956774943434]
110 => [42.000020774696715, 41.99997835399716, 41.99993951603409, 41.99979201596632, 41.999598390078326]
120 => [42.000244420648414, 41.999688461201856, 42.00033212168877, 42.000419165315776, 41.999043587401665]
130 => [42.00000316583509, 41.99980027149457, 42.00043079801067, 41.99954988980232, 41.99950264181341]
140 => [41.999923715753546, 41.99977377760766, 42.00033523277993, 41.99951474643227, 42.00055740776272]
150 => [41.99999977643734, 42.00037448792328, 42.00045229876902, 41.99954594667108, 42.000492403893965]
160 => [41.999937769558926, 41.999854484705295, 41.999819702244785, 42.00025056814451, 42.00039399008882]
170 => [42.00005547930738, 41.99979529808799, 41.99972233935926, 41.99971873415906, 41.99963575444441]
180 => [42.00003109415735, 42.000200426218896, 41.999755219184195, 42.00043618008025, 41.999501915890455]
190 => [41.99985969909203, 42.00021165635373, 41.99976762837727, 42.00031788049009, 42.000332341607766]
200 => [41.99992962001092, 42.00017098936294, 41.99979094577462, 42.00030728974183, 41.999650626630036]
210 => [42.00020298104202, 41.99975261684113, 41.99975023469254, 41.99957908132207, 41.999445950654746]
220 => [41.99992584076578, 41.999855258167386, 42.000200104286066, 41.999702460834605, 42.00035077159908]
230 => [42.00035209859171, 41.99963549012812, 42.00037216436359, 41.99951880529105, 42.00050803530517]
240 => [41.99984779759352, 41.99974134448669, 42.00031532262816, 41.99962683164088, 42.00052285058658]
250 => [41.999728429745, 42.000393543238104, 42.00051227111797, 41.999461252932434, 42.00064395532183]
260 => [42.00025814042138, 42.00031589894168, 42.00041985364005, 41.99956313819255, 41.9993910261099]
270 => [41.99999510362798, 42.00001999291032, 41.9999579395399, 41.999948699620724, 42.00035888305544]
280 => [42.0000349216614, 41.99994198625842, 41.99991656164299, 42.00016283910495, 41.99974913725434]
290 => [42.000088552147226, 42.00009177413705, 42.000285959629764, 41.999368276803075, 42.00085092034136]
300 => [42.00002153736082, 42.00017318821541, 42.00027218263787, 42.000360859044456, 41.99956805018513]
310 => [41.99982214130841, 41.99967598916952, 42.00033996524177, 41.99959063668834, 42.000454542266226]
320 => [42.00006126086911, 42.00008176129088, 41.99991628417523, 41.99984827263701, 42.0006948512896]
330 => [41.999920931176746, 42.000084474769224, 42.00011049825621, 42.0001780995309, 41.999746240047266]
340 => [42.00016462077862, 41.99973103912161, 41.99972713436191, 42.000314926474644, 41.99954583326124]
350 => [41.999909064720605, 41.99989664326481, 42.00012807380437, 41.99981089152715, 42.00044965487015]
360 => [41.99998035859879, 41.999741519792536, 41.99972834361312, 42.000466246698764, 41.999445378979175]
370 => [41.99998473751336, 42.000118961242094, 42.00019927235091, 41.99967103681347, 42.0003893406721]
380 => [42.00004994314298, 42.000088485492014, 41.99989507939007, 42.00021089501884, 41.999768499453445]
390 => [42.00016741179328, 41.99983192017003, 42.00019597710459, 42.000246488574184, 41.9997500616498]
400 => [42.0000690034192, 42.00032210252985, 42.00044477418595, 41.99950447485576, 41.999409987775174]
410 => [41.99984997793723, 42.00021601004096, 41.999733823597175, 41.99943005559297, 42.000572405921815]
420 => [41.99978319192324, 42.0003312867216, 41.99956076412637, 42.000495777523554, 42.00057076477468]
430 => [41.9999969877506, 42.00015622715756, 41.99978672785641, 42.00025219550749, 41.999745559762715]
440 => [41.999961313124366, 41.99992889660888, 42.00022494455986, 42.00027211260345, 41.99971622361044]
450 => [41.99992660449093, 42.00012955558257, 41.99986339510042, 42.00018388328064, 41.99973756767785]
460 => [41.99980983221668, 41.99979169174335, 41.99960684933666, 41.99945475746643, 42.00056035870122]
470 => [42.000037583752786, 42.000103148938535, 42.000148858716024, 41.999803261589705, 41.99958494029524]
480 => [42.00010180604779, 41.999839312694654, 42.000441302743766, 42.000564805088516, 42.00069368757735]
490 => [41.999991513319785, 42.000069257104975, 41.99971242785191, 42.000289368310895, 41.99933899177168]
500 => [42.00014966355293, 41.99981721794562, 42.00030285516861, 42.000342118257144, 41.99949493404719]
As you can see above as the index progresses, the top 5 values approach 42, sometimes they shoot slightly above 42, but they are limited by the universal number
that determines the survival ability, so those which shoot up too much are also eliminate.
Plotting what we had done
Let’s plot and learn from stored values in top_survivors_sample
. First let’s have a range for plotting that spans from 1 to generations
:
plotting_range = 1 : generations
Output:
1:500
Next let’s plot the target, which is nothing but number
, we create a array having values that equal number
and is generations
long using this code fill(number, generations)
, fill()
creates an array of generations
long and fills it with number
, we pass it as second argument to plot()
as shown:
using Plots
progress_plot = plot(plotting_range, fill(number, generations), label = "number to guess")
Output:
plot()
by default does a line plot, so we get a straight line at 42 as shown above. We store the plot in a variable named progress_plot
.
In the code below, for the 1st and every 10th generation, we append progress_plot
with ten values of i
that is the generation value using this code fill(i, 10)
for x axis and on the y axis we plot the corresponding value of top survivors for that generation top_survivors_sample[i]
as shown below:
for i in plotting_range
if (i == 1) || (i % 10 == 0)
plot!(progress_plot, fill(i, 10), top_survivors_sample[i], label = "gen $i", seriestype = :scatter)
end
end
progress_plot
Output:
So as we see above, some where around the 100th generation we are reaching the targeted 42, why don’t you raise the value of number
in the notebook 71 and see when are we reaching the target, can you say why? Can you increase the number to -7658 and see if we are able to reach it? If yes why and if not why not?
Fine grained plot
top_survivors_sample
is an array, but since the plot dimensions above are so huge we are unable to see it, so let’s do a fine grained plot, so we plot the number
first:
fine_grain_plot = plot(plotting_range, fill(number, generations), label = "number to guess")
Output:
Next just from generation 90 to 110 we plot the top_survivors_sample
as shown below:
for i in 90:110
if (i % 5 == 0)
plot!(fine_grain_plot, fill(i, 10), top_survivors_sample[i], label = "gen $i", seriestype = :scatter)
end
end
fine_grain_plot
Output:
Let’s enlarge the plot so that we can see better, for that let’s zoom in, we set x-limits xlims
from 85 to 125:
plot!(fine_grain_plot, xlims = (85, 125))
Output:
So you can see how the generations vary in value, and how they hover near 42.
The Jupyter notebook for this blog is here https://gitlab.com/data-science-with-julia/code/-/blob/master/genetic_algorithm_to_guess_number.ipynb.
Lakshmi - Goddess of comfort, wealth and progress you gain by your experience |