Evolution works!

Source: Wikimedia

Hamlet: Do you see yonder cloud that’s almost in shape of a camel?
Polonius: By the mass, and ’tis like a camel, indeed.
Hamlet: Methinks it is like a weasel.
from Hamlet by William Shakespeare

The best way to see how evolution works, is to watch it in action! You can watch the evolution of cars live in this application (but be careful, it’s addictive): BoxCar 2D

It is fascinating to see how those cars get better and better over time, sometimes finding very impressive solutions:

To understand how evolution works even better, let us create an artificial evolution in R!

The famous evolutionary biologist Richard Dawkins gave in his book “The Blind Watchmaker” the following thought experiment:

I don’t know who it was first pointed out that, given enough time, a monkey bashing away at random on a typewriter could produce all the works of Shakespeare. The operative phrase is, of course, given enough time. Let us limit the task facing our monkey somewhat. Suppose that he has to produce, not the complete works of Shakespeare but just the short sentence ‘Methinks it is like a weasel’, and we shall make it relatively easy by giving him a typewriter with a restricted keyboard, one with just the 26 (capital) letters, and a space bar. How long will he take to write this one little sentence?

We are now going to put this idea into practice! The following outline is from the Wikipedia article on the weasel program (Weasel program):

  1. Start with a random string of 28 characters.
  2. Make 100 copies of the string (reproduce).
  3. For each character in each of the 100 copies, with a probability of 5%, replace (mutate) the character with a new random character.
  4. Compare each new string with the target string “METHINKS IT IS LIKE A WEASEL”, and give each a score (the number of letters in the string that are correct and in the correct position).
  5. If any of the new strings has a perfect score (28), halt. Otherwise, take the highest scoring string, and go to step 2.

So let us first define some variables and helper functions for reproduction, mutation and fitness calculation:

target <- unlist(strsplit("METHINKS IT IS LIKE A WEASEL" , "")) # assign target string to "target"
pop_sz <- 100 # assign population size 100 to "pop_sz"
mt_rt <- 0.05 # assign mutation rate 5% to "mt_rt"

reproduce <- function(string) {
  # input: vector "string"
  # output: matrix with "pop_sz" columns, where each column is vector "string"
  matrix(string, nrow = length(string), ncol = pop_sz)
}

mutate <- function(pop) {
  # input: matrix of population "pop"
  # output: matrix of population where each character, with a probability of mt_rt per cent (= 5%), is replaced with a new random character
  mt_pos <- runif(length(pop)) <= mt_rt
  pop[mt_pos] <- sample(c(LETTERS, " "), sum(mt_pos), replace = TRUE)
  pop
}

fitness <- function(pop) {
  # input: matrix of population "pop"
  # output: vector of the number of letters that are correct (= equal to target) for each column
  colSums(pop == target)
}

After that we are going through all five steps listed above:

# 1. Start with a random string of 28 characters.
set.seed(70)
start <- sample(c(LETTERS, " "), length(target), replace = TRUE)

# 2. Make 100 copies of this string (reproduce).
pop <- reproduce(start)

# 3. For each character in each of the 100 copies, with a probability of 5%, replace (mutate) the character with a new random character.
pop <- mutate(pop)

# 4. Compare each new string with the target "METHINKS IT IS LIKE A WEASEL", and give each a score (the number of letters in the string that are correct and in the correct position).
score <- fitness(pop)

# 5. If any of the new strings has a perfect score (28), halt. Otherwise, take the highest scoring string, and go to step 2.
highscorer <- pop[ , which.max(score)] # assign string to "highscorer" which has max. score in the population
gen_no <- 1 #assign 1 to generation counter "gen_no"

while (max(score) < length(target)) {
  cat("No. of generations: ", gen_no, ", best so far: ", highscorer, " with score: ", max(score), "\n", sep = "")
  pop <- reproduce(highscorer)           # 2. select the highest scoring string for reproduction
  pop <- mutate(pop)                     # 3. mutation
  score <- fitness(pop)                  # 4. fitness calculation
  highscorer <- pop[ , which.max(score)] # assign string to "highscorer" which has max. score in the population
  gen_no <- gen_no + 1                   # increment generation counter
}
## No. of generations: 1, best so far: BZRDXXINEIMYQVJWBFZKFCVUPFYL with score: 2
## No. of generations: 2, best so far: BZRDXNINEIMYQVJWBFZKFCVUPFYL with score: 3
## No. of generations: 3, best so far: BZRDXNINEIMYQVJWBFZKACVEPFYR with score: 4
## No. of generations: 4, best so far: BZRDININEIMYQBJWBFZKACVEPFYR with score: 5
## No. of generations: 5, best so far: BZRDININEIMYIBJWBFZKACVEPFYR with score: 6
## No. of generations: 6, best so far: BZRDININEIMYIBJLBFZKACVEPFYR with score: 7
## No. of generations: 7, best so far: BRRDININEIMYIBJLOFZKACVEPFYL with score: 8
## No. of generations: 8, best so far: BRRDININEIMYIZJLOFZKACVEAFYL with score: 9
## No. of generations: 9, best so far: BRRDINKNEIMYIZJLOFZKAT EAFYL with score: 10
## No. of generations: 10, best so far: BRRDINKNEIMYIZJLOFZKATVEASYL with score: 11
## No. of generations: 11, best so far: BRRDINKNEIMYIZJLOFEKATVEASYL with score: 12
## No. of generations: 12, best so far: BRRUINKNEIMYIZJLOFEKATVEASEL with score: 13
## No. of generations: 13, best so far: BERUINKNEIMYIZJLOFEKATVEASEL with score: 14
## No. of generations: 14, best so far: BERHINKNEIMYIZJLVFEKATVEASEL with score: 15
## No. of generations: 15, best so far: BERHINKNEIMQIZJLVFE ATVEASEL with score: 16
## No. of generations: 16, best so far: BERHINKNEIMQIZ LVFE ATVEASEL with score: 17
## No. of generations: 17, best so far: BERHINKNEIMQIZ LVFE ATVEASEL with score: 17
## No. of generations: 18, best so far: BERHINKNEIMQIZ LVFE ATVEASEL with score: 17
## No. of generations: 19, best so far: TERHINKNEIMQIZ LVFE ATDEASEL with score: 17
## No. of generations: 20, best so far: TERHINKNEIMQIZ LVFE ATDEASEL with score: 17
## No. of generations: 21, best so far: TERHINKNJISQIZ LVFE ATDEASEL with score: 17
## No. of generations: 22, best so far: TERHINKNJISQIZ LVFE A DEASEL with score: 18
## No. of generations: 23, best so far: TERHINKNJISQIZ LVFE A DEASEL with score: 18
## No. of generations: 24, best so far: TERHINKNJITQIZ LVFE A YEASEL with score: 19
## No. of generations: 25, best so far: TERHINKNJITQIZ LPFE A YEASEL with score: 19
## No. of generations: 26, best so far: TERHINKN ITQIZ LPFE A YEASEL with score: 20
## No. of generations: 27, best so far: MERHINKN ITQIZ LPFE A YEASEL with score: 21
## No. of generations: 28, best so far: MERHINKN IT IZ LPFE A YEASEL with score: 22
## No. of generations: 29, best so far: MERHINKN IT IS LPFE A YEASEL with score: 23
## No. of generations: 30, best so far: MERHINKN IT IS LPFE A YEASEL with score: 23
## No. of generations: 31, best so far: MERHINKN IT IS LPFE A YEASEL with score: 23
## No. of generations: 32, best so far: MERHINKN IT IS LAFE A WEASEL with score: 24
## No. of generations: 33, best so far: METHINKN IT IS LAFE A WEASEL with score: 25
## No. of generations: 34, best so far: METHINKN IT IS LAFE A WEASEL with score: 25
## No. of generations: 35, best so far: METHINKN IT IS LAFE A WEASEL with score: 25
## No. of generations: 36, best so far: METHINKN IT IS LAFE A WEASEL with score: 25
## No. of generations: 37, best so far: METHINKN IT IS LAFE A WEASEL with score: 25
## No. of generations: 38, best so far: METHINKU IT IS LIFE A WEASEL with score: 26
## No. of generations: 39, best so far: METHINKU IT IS LIFE A WEASEL with score: 26
## No. of generations: 40, best so far: METHINKU IT IS LIFE A WEASEL with score: 26
## No. of generations: 41, best so far: METHINKU IT IS LIKE A WEASEL with score: 27
## No. of generations: 42, best so far: METHINKU IT IS LIKE A WEASEL with score: 27
## No. of generations: 43, best so far: METHINKU IT IS LIKE A WEASEL with score: 27
## No. of generations: 44, best so far: METHINKU IT IS LIKE A WEASEL with score: 27
## No. of generations: 45, best so far: METHINKU IT IS LIKE A WEASEL with score: 27

cat("Mission accomplished in ", gen_no, " generations: ", highscorer, sep = "")
## Mission accomplished in 46 generations: METHINKS IT IS LIKE A WEASEL

As you can see, the algorithm arrived at the target phrase pretty quickly. Now, you can try to tweak different parameter setting, like the population size or the mutation rate, and see what happens. You can of course also change the target phrase.

A minority of (often very religious) people reject the fact of evolution because they miss a crucial step: selection based on fitness. Selection gives evolution direction towards solutions that are better able to solve a certain problem. It is the exact opposite of pure randomness which many people still suspect behind evolution.

To see the difference the only thing we have to do is to comment out the line
pop <- reproduce(highscorer) which selects the highest scoring string for reproduction. We can see that without selection there is no improvement to be seen and the algorithm would run “forever”:

## No. of generations: 1, best so far: UJGGZYOEDJMRADTQUXFWAVWPBGFX with score: 2
## No. of generations: 2, best so far: UHGGZQOEDJERAD QBXFSBRWPBGFX with score: 2
## No. of generations: 3, best so far: UNGDZYOEDSERADTQIXFSBVWPAGFX with score: 3
## No. of generations: 4, best so far: UHGGZQNEDJERAG QBXFSBRWPBGWX with score: 2
## No. of generations: 5, best so far: IDGGTJOELJERAETQBDFSBVWEBGFX with score: 2
## No. of generations: 6, best so far: IDGGTJOELJERNETQBDFSBVWEBGFX with score: 2
## No. of generations: 7, best so far: FNJGZYOESJERERTQGXGSBVWEBSFX with score: 3
## No. of generations: 8, best so far: UJGWZBOERJMUAQTQUXFVAVWKKSFX with score: 3
## No. of generations: 9, best so far: VETGRYOEYVVSAOTQBKOSTVPPGGFM with score: 3
## No. of generations: 10, best so far: VETGRYOEYVVSAOTQBKOSTVPPGGFM with score: 3
## No. of generations: 11, best so far: VETGRYOEYVVSAKTQBKOSTVPPGGFM with score: 3
## No. of generations: 12, best so far: IETGRYOTYVVDAKTQBKOCTVPPGGFM with score: 3
## No. of generations: 13, best so far:  TTVVZOKDJERADELYXFKWGWXKGYO with score: 3
## No. of generations: 14, best so far: UNGWCYOZDEWRAD WKXKSBVWECGFX with score: 3
## No. of generations: 15, best so far: UNGWCYOZDEWRBD WKXKSBVWECGFX with score: 3
## No. of generations: 16, best so far: UNGSCYOZDEWRBD WKXKSAVCECGFX with score: 3
## No. of generations: 17, best so far: MXKGZYOMSJ RIOTQBLJSBVNPAGDL with score: 4
## No. of generations: 18, best so far: MXKGZYOMSJ RIOTQBLJSBVNPAGDL with score: 4
## No. of generations: 19, best so far: MXKGZYOMZJ RIOTQBLJSVVNPAGDL with score: 4
## No. of generations: 20, best so far:  TTVVJGKDDERADELYJXKRGWEKGYU with score: 4
## No. of generations: 21, best so far:  TTVVJGKDDERADELYDXBRGWEKGYU with score: 4
## No. of generations: 22, best so far:  TTWVJGKDQERADELYDXBRGWEKGYU with score: 4
## No. of generations: 23, best so far: MXKGOYOMCJ RIOTQBLJIVVAPAJDG with score: 3
## No. of generations: 24, best so far: MXKGOYOMCJ RIOTQBLJIVVAPAJDG with score: 3
## No. of generations: 25, best so far: MXKGOYOMCJ RIOTQBLJIVVAPAJDG with score: 3
## No. of generations: 26, best so far: MXKGOYOMCJ RIOTQBLJIVVAPAJDG with score: 3
## No. of generations: 27, best so far: TNTUXYKJPJNDAITLAJTYBAWPMGGB with score: 4
## No. of generations: 28, best so far: MXKGOYOMCJ RIOTLBLJIVVAPAJDX with score: 4
## No. of generations: 29, best so far: MXKGOYOMCJ RIOTLBLJIVVAJAJDX with score: 4
## No. of generations: 30, best so far: TUTUYYKNPJNDAITLAJTYBAAPMOGB with score: 3
## No. of generations: 31, best so far:  NGAFULYDZELWD QDPRSMPWYAPZH with score: 3
## No. of generations: 32, best so far: HKUOZSJSXDERS TLBHASAVGPBEJT with score: 3
## No. of generations: 33, best so far:  NGAFULYDTELWD QDPRSMPWYAPZH with score: 3
## No. of generations: 34, best so far: HKUYMSJAXDERS TLBHA AVGPBEJT with score: 3
## No. of generations: 35, best so far: HKUYMSJAXDSRS TLBHA AVGPBEJT with score: 3
## No. of generations: 36, best so far: HKXYMSJYXDSRS TLBHA AVGPNEJT with score: 3
## No. of generations: 37, best so far: KNGABULYDTELWD QDORSFPWYAPZH with score: 3
## No. of generations: 38, best so far: LLCIZN EOISJ DHFIEGPXNWYMYOX with score: 4
## No. of generations: 39, best so far: LLCIZN EOISJ DHFIEXPXNWYMYOX with score: 4
## No. of generations: 40, best so far: MZN KMIESQRRILELIIILFIGRYRZZ with score: 4
## No. of generations: 41, best so far: ITQXZEKK SENLSCJXAKQ EKNCNUJ with score: 3
## No. of generations: 42, best so far: MELBV VEUBRKXSNHWGILBU JVLZX with score: 3
## No. of generations: 43, best so far: DZNAKMIEOQRRILELIVILKIGVYRZZ with score: 3
## No. of generations: 44, best so far: DZNAKMIEOQRRILELIVILKIGVYRZZ with score: 3
## No. of generations: 45, best so far: LRPDILXMGCWDD ZQD BKANWHMKFI with score: 3
## No. of generations: 46, best so far: KEGAMRLYDAELDDUXLORSFPWOAPLH with score: 3
## No. of generations: 47, best so far: KEGAMRLYDAELDDUXLORSFPWOAPLH with score: 3
## No. of generations: 48, best so far: KEGAMRLYDAELDZUXLORHFPWOAPLH with score: 3
## No. of generations: 49, best so far: KEGAMRLYDAEWDZUXLORHFPWOAPLH with score: 3
## No. of generations: 50, best so far: KEGAMRLYDAEWDZDXLORHFPWOAPLH with score: 3

If this was how evolution really worked it wouldn’t work at all.

Because evolution is a very powerful optimization method there are also real world applications of so called genetic algorithms (GA). In the following example we want to find the global optimum of the so called Rastrigin function. What makes this task especially difficult for this popular test problem is the large number of local minima, as can be seen when plotting the function:

library(GA)
## Loading required package: foreach
## Loading required package: iterators
## Package 'GA' version 3.2
## Type 'citation("GA")' for citing this R package in publications.
## 
## Attaching package: 'GA'
## The following object is masked from 'package:utils':
## 
##     de
Rastrigin <- function(x1, x2) {
  20 + x1^2 + x2^2 - 10*(cos(2*pi*x1) + cos(2*pi*x2))
}

x1 <- x2 <- seq(-5.12, 5.12, by = 0.1)
f <- outer(x1, x2, Rastrigin)
persp3D(x1, x2, f, theta = 50, phi = 20)

filled.contour(x1, x2, f, color.palette = bl2gr.colors)

To find the global minimum (spoiler: it is at (0,0)) we use the GA package (because GA only maximizes we use the minus sign in front of the fitness function):

set.seed(70)
GA <- ga(type = "real-valued", 
         fitness =  function(x) -Rastrigin(x[1], x[2]),
         lower = c(-5.12, -5.12), upper = c(5.12, 5.12), 
         maxiter = 1000)
summary(GA)
## -- Genetic Algorithm ------------------- 
## 
## GA settings: 
## Type                  =  real-valued 
## Population size       =  50 
## Number of generations =  1000 
## Elitism               =  2 
## Crossover probability =  0.8 
## Mutation probability  =  0.1 
## Search domain = 
##          x1    x2
## lower -5.12 -5.12
## upper  5.12  5.12
## 
## GA results: 
## Iterations             = 1000
## Fitness function value = -3.630204e-07 
## Solution = 
##               x1           x2
## [1,] 2.81408e-05 3.221658e-05

plot(GA)

filled.contour(x1, x2, f, color.palette = bl2gr.colors, plot.axes = {
  axis(1); axis(2); points(GA@solution[ , 1], GA@solution[ , 2], pch = 3, cex = 2, col = "white", lwd = 2) 
  }
)

Quite impressive, isn’t it! Evolution just works!

In an upcoming post we will use evolutionary methods to find a nice functional form for some noisy data with a method called symbolic regression or genetic programming – so stay tuned!

Update
The post is now online: Symbolic Regression, Genetic Programming… or if Kepler had R

6 thoughts on “Evolution works!”

  1. I don’t think your example proves “evolution just works”. In fact you are closer to the truth of how “evolution works” when you give us the example without scoring; which is to say without the target sentence. Now you have to explore the entire 28^26 word space which means, as you say, forever. Because Darwinian evolution is non-teleological. The target sentence is a telos. The target sentence is equivalent to using an existing species as a target in natural evolution. This cannot happen because evolution has no direction. It can bring you back to the primordial soup just as easy as towards a functioning organism. Again, this is because there is no target out there waiting to be achieved. But you might say, “it is not so much a species, rather a function.” You are aware of the concept of irreducible complexity. This throws in a monkey wrench because the function only becomes significant–capable of replication–when it is essentially complete. It is no use saying that 10% of an eye means 10% sight because that’s not how things with function work. Any mechanical example will suffice.

    1. DWSWesVirginny: But in evolution there is a scoring: fitness through the survival and reproduction of the individual. In this toy example, fitness is resemblance to the target sentence. In the real world, fitness is surviving long enough (predators, sickness, the environment) and being able to mate and produce descendants.
      Evolution has no set, predefined or arbitrary target; nature didn´t try to make humans (or sharks, or ants, or sequoyas) it just happened that those living structures were able to get going and reach our present day. Many of the unsuccesful brethen of their antecessors did not leave a species behind (or their species were a dead end, literally) because they failed at the fiteness scoring function.
      Evolution does not bring you backwards to the primordial soup; in fact, is an ascending stair of complexity, although simple life forms can coexist with complex ones (as bacteria, algae and viruses do life and thrive among mammals and the rest of the living entities). We can discuss what complex means (longer DNA? body with more parts and functions? A more varied set of behaviors?) but whatever definition we choose, you´ll see that complexity increases: tress are more complex that unicellular algae, mice are more complex than amoeba, and monekys are more complex than worms. Note that I do not say that ‘complex’ means ‘more evolved’. Anything that has survived up to today is equally ‘evolved’. They managed to survive through immemorial time, sometimes keeping thinks simple (bacteria, amoeba) sometimes going with very convoluted structures (like the human brain). But evlution can work through simplicity.
      If there is a catastrophic event that wipes out half of the ‘complex’ lifeforms on planet earth it just means that simpler/smaller lifeform now have the chance to evolve and occupy the niches left vacant by the deceased species. It has happened serveral times during the existanc of our planet.
      Mechanical examples are not adequate because mechanical devices are dependent on its creator; they do not evolve. And yes, 10% of an eye is better than no eyes at all, as you might know if you have ever meet an almost blind person, who enjoys a better control of his environment that a completely blind person. The fossil record has documented evolution of eyes, for example, in trilobites.
      In evolution, randomness drives change, but fitness drives selection. Your argument about irreductible complexity does not hold, because in real life replication is already there, so the function is complete (or complete enough); the issue is about the variation of the replicated entity. Asking for the function to be complete is like asking for evolution to arrive at a very specific design, which evolution does not do. It just improves on existing designs, creating many failures but giving chances to a few succesfull ones that start a new batch of variation.
      Until the sun explodes and consumes us all.

      1. I want to thank the author for this very interesting contribution.
        Miguel, I fully agree with your comment.

      2. I’d also add that there could be some kind of destiny to arrive.
        Evolution isn’t totally random to a point where a being can evolve.

        We start from a point where a program (a genetic one for living beings) is more probable if it’s short than if it’s long one.
        Say you have a triangle living being and a 5 vertex living being allocated at random.
        The program you have to do to create the triangle being is way shorter and easier than the 5 vertex living being.
        So living beings with a low algorithmical complexity are probably to appear than a complex one. And higher complexity beings appear only when low algorithmic complexity beings stacks to form another.
        I summarised here comparing living beings with polygons

Leave a Reply to Joël Courtheyn Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.