Source code to this article may be found here.
There are numerous resources on the Internet, that provide description of the theory of Genetic Algorithms and theoretical explanation thereof. I, however, have found a bit more then none giving a real example (I may have not searched that good, though). Therefore, I decided to try and implement the theory into a live example. While there are thousands of areas where GA may be applied, I decided to choose trivial quadratic equation solution process for the sake of simplicity of the implementation. The equation is y = 13x^2 - 5x - 12. It has two roots (points where its graph crosses the X axis) at x = -0.7875184 and at x = 1.17213378. Although, this particular GA implementation has no application in real life (you can solve that equation on paper in several seconds), it demonstrates GA at its finest.
There are numerous resources on the Internet, that provide description of the theory of Genetic Algorithms and theoretical explanation thereof. I, however, have found a bit more then none giving a real example (I may have not searched that good, though). Therefore, I decided to try and implement the theory into a live example. While there are thousands of areas where GA may be applied, I decided to choose trivial quadratic equation solution process for the sake of simplicity of the implementation. The equation is y = 13x^2 - 5x - 12. It has two roots (points where its graph crosses the X axis) at x = -0.7875184 and at x = 1.17213378. Although, this particular GA implementation has no application in real life (you can solve that equation on paper in several seconds), it demonstrates GA at its finest.
Data Encoding
Classic implementation of GA implies encoding of data as bit strings (chromosomes where each gene is represented by one or more bits), however, I decided to change it a bit (which is not prohibited) and use real numbers instead of the bit strings. Each chromosome includes two genes, two possible solutions, one for each root of the equation.
typedef struct _CHROMO_
{
double value1, value2;
double fitness;
}chromo_t;
The third member of the chromosome is fitness and is used only to store the fitness of the solution represented by this chromosome. The genes (values) are initiated to random values when the initial population is created:
#define POPULATION 2000
chromot_t population[POPULATION];
for(int i = 0; i < POPULATION; i++)
{
population[i].value1 = (double)(rand() % 10) + 5.0;
population[i].value2 = -(double)(rand() % 10) - 5.0;
}
Fitness Function
Fitness function is used to estimate the suitability of the current solution. In this particular case, the fitness function is the the sum of values produced by the equation when fed the values from the chromosome. This means that the lower the value returned by the fitness function the closed we get to the proper solution.
double fitness(chromo_t* ch)
{
return fabs(13.0 * pow(ch->value1, 2.0) - 5.0 * ch->value1 - 12.0) + fabs(13.0 * pow(ch->value2, 2.0) - 5.0 * ch->value2 - 12.0);
}
While you iterate through the population calculating fitness for each phenotype, you should save the indices of two solutions (the amount may actually vary from case to case) as those would be used to produce the next generation.
Crossover
Crossover operator is the crucial part of any GA implementation. It is a function that takes two (or more) best phenotypes and uses them to produce a new generation. Suggestion is to crossover that pair to produce one (or more - depends on the implementation) child and then crossover the best phenotype with the rest of the population, thus, entirely replacing the old generation with the new one. In this case, the crossover function is rather trivial:
void cross(chromo_t* chMom, chromo_t* chDad, chromo_t* chChild)
{
static int mutation = 0; // Which operation to use for crossover
switch(mutation)
{
case 0:
chChild->value1 = chMom->value1 + chDad->value1 * MUTATION_RATE;
chChild->value2 = chMom->value2 + chDad->value2 * MUTATION_RATE;
break;
case 1:
chChild->value1 = chMom->value1 + chDad->value1 * MUTATION_RATE;
chChild->value2 = chMom->value2 - chDad->value2 * MUTATION_RATE;
break;
case 3:
chChild->value1 = chMom->value1 - chDad->value1 * MUTATION_RATE;
chChild->value2 = chMom->value2 + chDad->value2 * MUTATION_RATE;
break;
case 4:
chChild->value1 = chMom->value1 - chDad->value1 * MUTATION_RATE;
chChild->value2 = chMom->value2 - chDad->value2 * MUTATION_RATE;
break;
case 5:
chChild->value1 = chDad->value1 - chMom->value1 * MUTATION_RATE;
chChild->value2 = chDad->value2 - chMom->value2 * MUTATION_RATE;
break;
case 6:
chChild->value1 = chDad->value1 + chMom->value1 * MUTATION_RATE;
chChild->value2 = chDad->value2 - chMom->value2 * MUTATION_RATE;
break;
case 7:
chChild->value1 = chDad->value1 - chMom->value1 * MUTATION_RATE;
chChild->value2 = chDad->value2 + chMom->value2 * MUTATION_RATE;
break;
}
mutation++;
if(mutation > 7)
mutation = 0;
}
where the MUTATION_RATE is the speed at which solutions evolve (in this example set to 0.0001). The higher the MUTATION_RATE, the faster you may get to the proper solution, however, it may be less accurate. The best would be to reduce it while approaching the solution.
As you may have noticed, crossover and mutation are combined in a single function, while most of the time you would want to separate them and use mutation only in case of stagnation (inability of the phenotypes to evolve into a proper solution).
Stop Condition
Not much can be said here. You should stop the execution once you reach desired precision or once you realize that there is no solution as it is totally possible that certain quadratic equation has no roots, instead, it has its minimum or maximum. It is important to mention, that the precision and speed of certain GA implementation depends on the mutation rate and amount of phenotypes in population. In fact, you may implement another GA in order to find optimal values for the current one :).
Testing the Implemented Algorithm
This specific implementation approaches the roots of the aforementioned quadratic equation in a bit less then 23 seconds. Generated solutions are precise enough for most needs.
Let us take a look at the evolution of the solutions generated by the described implementation of GA.
This graph shows how solutions change over the execution time. |
The fitness curve looks quite similar:
Fitness curve |
As you may see, GA is capable of finding the right direction in a very fast manner.
Conclusion
While this specific implementation is trivial and, to be honest, quite lame, I hope it shows that GAs are a very powerful tool. It is sage to say, that solving a quadratic equation is one of the simplest tasks where GA may be successfully applied. Genetic Algorithms of different kinds may be used for selection of Artificial Neural Network topology, different algorithm optimizations, etc. The success of certain GA only depends on the correctness of the chromosome and crossover implementations. Let me reiterate - it is always possible to implement another GA in order to get proper implementation of the current one.
Hope this post was helpful. See you at the next.