What is genetics? What makes us change? Well let’s take for instance the corona virus pandemic, there are some humans who are naturally immune to it, some people who are not. Say that we are not progressed so scientifically, then what would have happened? People who are not immune to it will perish, some will have reduced biological function and may produce less kids, finally a humanity will emerge which is totally immune to corona. These happen because there is tiny variations that are set into us during our birth that makes us different from our mom and dad. So producing more offspring means more variations and more chance of surviving an epidemic we have. That’s why all species that had survived has a very strong urge to reproduce. The more the urge more the offspring variations and survival.

Okay let’s code. So lets define a universe where only numbers that are closest to 42 exist. Why 42? Because its the ultimate answer.

So our universe is defines by this number here:

number = 42

Output:

This is a number we must guess using evolutionary technique.

Now our universe need to calculate the difference between the ultimate answer and a values that exist in it, so we define a error function as follows:

function ∆(value, number)
    number - value
end

Output:

∆ (generic function with 1 method)

Let’s test our error function:

∆(12, 42)

Output:

Now let us experiment with random numbers. I want to generate 500 random numbers that vary from -0.5 to +0.5:

rand(500).- 0.5

Output:

500-element Array{Float64,1}:
  0.11811737299697067
  0.40975918563037483
  0.11512815232487483
  0.4407493702816716
 -0.06317498449812642
  0.47123272092330426
 -0.10407972969869217
  0.10466653256756975
  0.316758714503494
  0.05275897872543078
 -0.0580903153675667
 -0.0636241488226077
  0.24940396400825926
  ⋮
  0.4043456432783936
 -0.4607426587387351
  0.4003121999197845
 -0.28198845089727187
  0.32761734763714667
 -0.35388456207140506
  0.33622477093610326
 -0.3255506303368696
 -0.04515216140695988
  0.08163723449954996
 -0.4425492244863174
  0.09909239016396998

Seems to work.

Top Survivors

Next I want to define a function called top_survivors() which returns the numbers that are most closes to 42 first and least closest to 42 at the last:

function top_survivors(values, number, top_percent = 10)
    errors_and_values = [(abs(∆(value, number)), value) for value in values]
    sorted_errors_and_values = sort(errors_and_values)
    end_number = Int(length(values) * top_percent / 100)
    sorted_errors_and_values[1:end_number]
end

Output:

top_survivors (generic function with 2 methods)

Lets see see in detail how top_survivors() is coded. First let’s start with an empty function definition:

function top_survivors()
end

Now this function should receive values who’s survival ability will be determined

function top_survivors(values)
end

The survival ability is determined by a number, the values that are closest to the number survive, so we we pass in a number to the function top_survivors():

function top_survivors(values, number)
end

Not all values should survive, say only top 10% of the values that are closest to the number should survive, so we have a variable top_percent and set a default of 10 to it as shown below:

function top_survivors(values, number, top_percent = 10)
end

First we need to find errors, that is the difference between each value in values

function top_survivors(values, number, top_percent = 10)
    [for value in values]
end

compared to number:

function top_survivors(values, number, top_percent = 10)
    [∆(value, number) for value in values]
end

but there is a problem, let’s say number is 5 and value is 3, then the error is 2, let’s say another value is 12 hence error ∆(value, number is now -7. Since -7 is less than 2, it does not mean that 12 is closer to 5 than 3. What we need is absolute error and we get it using the abs() function shown below:

function top_survivors(values, number, top_percent = 10)
    errors_and_values = [abs(∆(value, number)) for value in values]
end

What are we going to do just with errors, let’s bundle error and corresponding value into a Tuple using this code (abs(∆(value, number)), value) as shown below:

function top_survivors(values, number, top_percent = 10)
    [(abs(∆(value, number)), value) for value in values]
end

now let us assign it to a variable errors_and_values:

function top_survivors(values, number, top_percent = 10)
    errors_and_values = [(abs(∆(value, number)), value) for value in values]
end

So we have got an Array of Tuples in errors_and_values where the first element in the tuple is error and second is the value that is associated with the error and it looks something like this:

[
  (error_1, value_1), ..... , (error_n, value_n)
]

Now lets sort it so that the least errors and values are at first

function top_survivors(values, number, top_percent = 10)
    errors_and_values = [(abs(∆(value, number)), value) for value in values]
    sorted_errors_and_values = sort(errors_and_values)
end

and we store it in a variable sorted_errors_and_values as shown above, now all that is left is to return the top_percent of errors_and_values out, those will be our survivors, for that we calculate the end_number for our array as shown in the last line of the function below:

function top_survivors(values, number, top_percent = 10)
    errors_and_values = [(abs(∆(value, number)), value) for value in values]
    sorted_errors_and_values = sort(errors_and_values)
    end_number = Int(length(values) * top_percent / 100)
end

If values had 1943 elements then length(value) will be 1943. This is multiplied by top_percent by 100, in our case its 0.1 time 1943, that will be 194.3, and we get integer value of it result Int(length(values) * top_percent / 100) which is 194, this will be stored in end_number. The top_percent of errors_and_values will be sorted_errors_and_values[1:end_number] and we add this as last line to our function below which means that this will be returned out:

function top_survivors(values, number, top_percent = 10)
    errors_and_values = [(abs(∆(value, number)), value) for value in values]
    sorted_errors_and_values = sort(errors_and_values)
    end_number = Int(length(values) * top_percent / 100)
    sorted_errors_and_values[1:end_number]
end

Let’s test top_survivors() now, let me generate 500 randome numbers from -0.5 to +0.5 and see which is closest to number

survivors = top_survivors(rand(500).- 0.5, number)

Output:

50-element Array{Tuple{Float64,Float64},1}:
 (41.50295720311258, 0.49704279688742026)
 (41.503072694126054, 0.4969273058739454)
 (41.50606492163384, 0.49393507836615624)
 (41.509092656735994, 0.49090734326400853)
 (41.511967526254274, 0.4880324737457231)
 (41.51218378164256, 0.4878162183574357)
 (41.51401215336174, 0.4859878466382588)
 (41.51538950354761, 0.4846104964523883)
 (41.51687210415236, 0.4831278958476404)
 (41.520187541041686, 0.4798124589583119)
 (41.52202481756689, 0.47797518243311)
 (41.52243285351341, 0.47756714648659404)
 (41.52250385998039, 0.47749614001960716)
 ⋮
 (41.57023757877993, 0.42976242122007213)
 (41.571326533554725, 0.4286734664452725)
 (41.57377340302943, 0.4262265969705692)
 (41.57432225294512, 0.4256777470548734)
 (41.578400139207496, 0.42159986079250356)
 (41.579227438079286, 0.4207725619207161)
 (41.58325191425531, 0.41674808574469324)
 (41.58460955385501, 0.41539044614499)
 (41.5857945937749, 0.41420540622509905)
 (41.58642578719165, 0.4135742128083526)
 (41.589202493410696, 0.4107975065893039)
 (41.5916907905559, 0.40830920944409743)

So we see that top_survivors() returns top 50 closest, the first one being (41.50295720311258, 0.49704279688742026) that is it has an error of 41.50295720311258 and value of 0.49704279688742026.

Mutations

We know that 0.497 is far far from 42, so how does it inch near 42? The answer is mutation. Say that this number spawns another set of numbers that vary slightly from 0.497, some of it might be more away from 42, but some could be more near. So let’s write a function to mutate a value:

function mutate(value, mutations = 10)
    [value + rand() - 0.5 for i in 1:mutations]
end

Output:

mutate (generic function with 2 methods)

Okay this mutate() function receives a value as first argument, and second argument is number of children it must have which is defined my mutations which we have set it to a default of 10. In this statement:

[value + rand() - 0.5 for i in 1:mutations]

value is added with a random number anywhere between -0.5 to 0.5 and is been collected into array and its done mutations times. So let’s test our mutate function:

mutate(50)

Output:

10-element Array{Float64,1}:
70077118627718
38719806541724
659770263529325
49345080427759
682957552922375
068482872331586
418052425411894
73656292165122
01836131822904
030683616700884

So see above how 50 mutates with values slightly varying from it.

I also want to introduce a function called vcat() that concatenates or joins two arrays as shown:

vcat([1, 2, 3], [4, 5, 6])

Output:

6-element Array{Int64,1}:
 1
 2
 3
 4
 5
 6

Our mutate function mutates only one value, but for our task we have a list of values, so let’s write a function that mutates a list as shown:

function mutate_list(list, mutations = 10)
    output = []
    for element in list
        output = vcat(output, mutate(element, mutations))
    end
    output
end

Output:

mutate_list (generic function with 2 methods)

So mutate_list works as follows, first we have a empty function:

function mutate_list()
end

We receive a list which must be mutated:

function mutate_list(list)
end

Next we have a variable called mutations that defines the number of mutations that must take place for each element in the list:

function mutate_list(list, mutations = 10)
end

Let’s have an element called output that will collect all mutations

function mutate_list(list, mutations = 10)
    output = []
end

We take each value in the list into a variable called element:

function mutate_list(list, mutations = 10)
    output = []
    for element in list
    end
end

Now we add mutated values to output using this statement output = vcat(output, mutate(element, mutations)) as shown below:

function mutate_list(list, mutations = 10)
    output = []
    for element in list
        output = vcat(output, mutate(element, mutations))
    end
end

And finally once the operations is done for all elements in the list, we return the output:

function mutate_list(list, mutations = 10)
    output = []
    for element in list
        output = vcat(output, mutate(element, mutations))
    end
    output
end

Now let’s test our mutate() function:

mutate_list((1, 2, 3, 4))

Output:

40-element Array{Any,1}:
1981180737202284
6826820656478603
4263346623043727
0445431960263634
9986709617577949
5041082408398894
206211947133361
5703313900822442
7085107623470692
6808788103524246
8926516209570248
3122314617476407
4123045970406345
 ⋮
5112031422146033
0098577316693103
5139755069699214
5552827017499755
632721017569514
6865403468441524
908433918456275
330548003821807
8199880565807103
4004408963825155
8622190204543214
299693796909806

So we have condition of best survival defined by number = 42, then we have a function called top_survivors() which will clean out values it thinks are unfit and have only those that meet the top percent of the criteria, then we have a function mutate() that will change the numbers. So we have made the bits and pieces of our universe namely:

Condition for best fit number = 42
Way to clean those that are not fitting welltop_survivors()
A way to change mutate()

Creating our universe

Now we have to put this all together. So first let us have a initial set of values, a bunch of numbers to start with:

# let there be initial values
initial_values = rand(500)


Brahma the initiator

First all those values are survivors, so let us define survivors and assign it to initial_values:

# let there be initial values
initial_values = rand(500)
survivors = initial_values

Let’s define a variable called generations that will hold an value of the maximum number of generations allowed in our universe:

# let there be initial values
initial_values = rand(500)
survivors = initial_values
generations = 500

Next for plotting purpose, for us to see what happened visually we sample and store values / numbers that are created in each generation that matches closest to number, that is 42 in our case, we store that sampled values in top_survivors_sample

# let there be initial values
initial_values = rand(500)
survivors = initial_values
generations = 500
top_survivors_sample = []

Now for each generation:

# let there be initial values
initial_values = rand(500)
survivors = initial_values
generations = 500
top_survivors_sample = []

for generation in 1:generations
end

we make the survivors mutate and create offspring:

# let there be initial values
initial_values = rand(500)
survivors = initial_values
generations = 500
top_survivors_sample = []

for generation in 1:generations
    survivors = mutate_list(survivors)
end


Shiva the god of Destruction

now the force of destruction comes to play only those that are closet to ideal condition are selected in this line errors_and_values = top_survivors(survivors, number) others are eliminated:

# let there be initial values
initial_values = rand(500)
survivors = initial_values
generations = 500
top_survivors_sample = []

for generation in 1:generations
    survivors = mutate_list(survivors)
    errors_and_values = top_survivors(survivors, number)
end


Vishnu the God that saves and ensures continuity

now we salvage only the values for the next generation in the following statement survivors = [value for (error, value) in errors_and_values] as shown below:

# let there be initial values
initial_values = rand(500)
survivors = initial_values
generations = 500
top_survivors_sample = []

for generation in 1:generations
    survivors = mutate_list(survivors)
    errors_and_values = top_survivors(survivors, number)
    survivors = [value for (error, value) in errors_and_values]
end


Saraswathi the God of knowledge and learning

Now we record the top 10 survivors into our sample array top_survivors_sample in this statement push!(top_survivors_sample, survivors[1:10]), this will help us to plot and understand what happened:

# let there be initial values
initial_values = rand(500)
survivors = initial_values
generations = 500
top_survivors_sample = []

for generation in 1:generations
    survivors = mutate_list(survivors)
    errors_and_values = top_survivors(survivors, number)
    survivors = [value for (error, value) in errors_and_values]
    push!(top_survivors_sample, survivors[1:10])
end

And for the first and every 10th generation we print out the top values using these lines

if (generation == 1) || (generation % 10 == 0)
    println(generation, " => ", survivors[1:5])
end

So you can see the completed code below:

# let there be initial values
initial_values = rand(500)
survivors = initial_values
generations = 500
top_survivors_sample = []

for generation in 1:generations
    survivors = mutate_list(survivors)
    errors_and_values = top_survivors(survivors, number)
    survivors = [value for (error, value) in errors_and_values]
    push!(top_survivors_sample, survivors[1:10])

    if (generation == 1) || (generation % 10 == 0)
        println(generation, " => ", survivors[1:5])
    end
end

Output:

=> [1.489963570249114, 1.4748790114913484, 1.4622739648861949, 1.4510704237615362, 1.4490538800141282]
=> [5.660855018791918, 5.598888609438815, 5.593011278753805, 5.5762836556003545, 5.566632716837373]
=> [10.24997392360673, 10.215807815922725, 10.21448723688237, 10.214464268619894, 10.204137257813436]
=> [14.909055039342904, 14.889731189243328, 14.884161682306756, 14.874322175303265, 14.867230337010271]
=> [19.52250107015328, 19.50132983259004, 19.48141475284547, 19.47896089764954, 19.478514865080626]
=> [24.185076343905422, 24.171229043560263, 24.16911898120017, 24.168797157127305, 24.151685651797276]
=> [28.777004719428835, 28.73967040390202, 28.730165929867656, 28.721490919382102, 28.705793402309293]
=> [33.38810260130008, 33.3806820730437, 33.35186837385183, 33.34216038995459, 33.32739352190749]
=> [37.931008629223165, 37.9235457841195, 37.91415226790492, 37.91195802545634, 37.91106132860069]
=> [41.99992407747299, 42.00008624788804, 41.99991137236084, 41.99979061755105, 42.000233034952544]
=> [42.000123781083616, 42.000235748738795, 42.000307239484606, 41.99960275074321, 41.99956774943434]
=> [42.000020774696715, 41.99997835399716, 41.99993951603409, 41.99979201596632, 41.999598390078326]
=> [42.000244420648414, 41.999688461201856, 42.00033212168877, 42.000419165315776, 41.999043587401665]
=> [42.00000316583509, 41.99980027149457, 42.00043079801067, 41.99954988980232, 41.99950264181341]
=> [41.999923715753546, 41.99977377760766, 42.00033523277993, 41.99951474643227, 42.00055740776272]
=> [41.99999977643734, 42.00037448792328, 42.00045229876902, 41.99954594667108, 42.000492403893965]
=> [41.999937769558926, 41.999854484705295, 41.999819702244785, 42.00025056814451, 42.00039399008882]
=> [42.00005547930738, 41.99979529808799, 41.99972233935926, 41.99971873415906, 41.99963575444441]
=> [42.00003109415735, 42.000200426218896, 41.999755219184195, 42.00043618008025, 41.999501915890455]
=> [41.99985969909203, 42.00021165635373, 41.99976762837727, 42.00031788049009, 42.000332341607766]
=> [41.99992962001092, 42.00017098936294, 41.99979094577462, 42.00030728974183, 41.999650626630036]
=> [42.00020298104202, 41.99975261684113, 41.99975023469254, 41.99957908132207, 41.999445950654746]
=> [41.99992584076578, 41.999855258167386, 42.000200104286066, 41.999702460834605, 42.00035077159908]
=> [42.00035209859171, 41.99963549012812, 42.00037216436359, 41.99951880529105, 42.00050803530517]
=> [41.99984779759352, 41.99974134448669, 42.00031532262816, 41.99962683164088, 42.00052285058658]
=> [41.999728429745, 42.000393543238104, 42.00051227111797, 41.999461252932434, 42.00064395532183]
=> [42.00025814042138, 42.00031589894168, 42.00041985364005, 41.99956313819255, 41.9993910261099]
=> [41.99999510362798, 42.00001999291032, 41.9999579395399, 41.999948699620724, 42.00035888305544]
=> [42.0000349216614, 41.99994198625842, 41.99991656164299, 42.00016283910495, 41.99974913725434]
=> [42.000088552147226, 42.00009177413705, 42.000285959629764, 41.999368276803075, 42.00085092034136]
=> [42.00002153736082, 42.00017318821541, 42.00027218263787, 42.000360859044456, 41.99956805018513]
=> [41.99982214130841, 41.99967598916952, 42.00033996524177, 41.99959063668834, 42.000454542266226]
=> [42.00006126086911, 42.00008176129088, 41.99991628417523, 41.99984827263701, 42.0006948512896]
=> [41.999920931176746, 42.000084474769224, 42.00011049825621, 42.0001780995309, 41.999746240047266]
=> [42.00016462077862, 41.99973103912161, 41.99972713436191, 42.000314926474644, 41.99954583326124]
=> [41.999909064720605, 41.99989664326481, 42.00012807380437, 41.99981089152715, 42.00044965487015]
=> [41.99998035859879, 41.999741519792536, 41.99972834361312, 42.000466246698764, 41.999445378979175]
=> [41.99998473751336, 42.000118961242094, 42.00019927235091, 41.99967103681347, 42.0003893406721]
=> [42.00004994314298, 42.000088485492014, 41.99989507939007, 42.00021089501884, 41.999768499453445]
=> [42.00016741179328, 41.99983192017003, 42.00019597710459, 42.000246488574184, 41.9997500616498]
=> [42.0000690034192, 42.00032210252985, 42.00044477418595, 41.99950447485576, 41.999409987775174]
=> [41.99984997793723, 42.00021601004096, 41.999733823597175, 41.99943005559297, 42.000572405921815]
=> [41.99978319192324, 42.0003312867216, 41.99956076412637, 42.000495777523554, 42.00057076477468]
=> [41.9999969877506, 42.00015622715756, 41.99978672785641, 42.00025219550749, 41.999745559762715]
=> [41.999961313124366, 41.99992889660888, 42.00022494455986, 42.00027211260345, 41.99971622361044]
=> [41.99992660449093, 42.00012955558257, 41.99986339510042, 42.00018388328064, 41.99973756767785]
=> [41.99980983221668, 41.99979169174335, 41.99960684933666, 41.99945475746643, 42.00056035870122]
=> [42.000037583752786, 42.000103148938535, 42.000148858716024, 41.999803261589705, 41.99958494029524]
=> [42.00010180604779, 41.999839312694654, 42.000441302743766, 42.000564805088516, 42.00069368757735]
=> [41.999991513319785, 42.000069257104975, 41.99971242785191, 42.000289368310895, 41.99933899177168]
=> [42.00014966355293, 41.99981721794562, 42.00030285516861, 42.000342118257144, 41.99949493404719]

As you can see above as the index progresses, the top 5 values approach 42, sometimes they shoot slightly above 42, but they are limited by the universal number that determines the survival ability, so those which shoot up too much are also eliminate.

Plotting what we had done

Let’s plot and learn from stored values in top_survivors_sample. First let’s have a range for plotting that spans from 1 to generations:

plotting_range = 1 : generations

Output:

1:500

Next let’s plot the target, which is nothing but number, we create a array having values that equal number and is generations long using this code fill(number, generations), fill() creates an array of generations long and fills it with number, we pass it as second argument to plot() as shown:

using Plots
progress_plot = plot(plotting_range, fill(number, generations), label = "number to guess")

Output:

svg

plot() by default does a line plot, so we get a straight line at 42 as shown above. We store the plot in a variable named progress_plot.

In the code below, for the 1st and every 10th generation, we append progress_plot with ten values of i that is the generation value using this code fill(i, 10) for x axis and on the y axis we plot the corresponding value of top survivors for that generation top_survivors_sample[i] as shown below:

for i in plotting_range
    if (i == 1) || (i % 10 == 0)
        plot!(progress_plot, fill(i, 10), top_survivors_sample[i], label = "gen $i", seriestype = :scatter)
    end
end

progress_plot

Output:

svg

So as we see above, some where around the 100th generation we are reaching the targeted 42, why don’t you raise the value of number in the notebook 71 and see when are we reaching the target, can you say why? Can you increase the number to -7658 and see if we are able to reach it? If yes why and if not why not?

Fine grained plot

top_survivors_sample is an array, but since the plot dimensions above are so huge we are unable to see it, so let’s do a fine grained plot, so we plot the number first:

fine_grain_plot = plot(plotting_range, fill(number, generations), label = "number to guess")

Output:

svg

Next just from generation 90 to 110 we plot the top_survivors_sample as shown below:

for i in 90:110
    if (i % 5 == 0)
        plot!(fine_grain_plot, fill(i, 10), top_survivors_sample[i], label = "gen $i", seriestype = :scatter)
    end
end

fine_grain_plot

Output:

svg

Let’s enlarge the plot so that we can see better, for that let’s zoom in, we set x-limits xlims from 85 to 125:

plot!(fine_grain_plot, xlims = (85, 125))

Output:

svg

So you can see how the generations vary in value, and how they hover near 42.

The Jupyter notebook for this blog is here https://gitlab.com/data-science-with-julia/code/-/blob/master/genetic_algorithm_to_guess_number.ipynb.


Lakshmi - Goddess of comfort, wealth and progress you gain by your experience