When & How to Solve Problems with Genetic Algorithms

Genetic algorithms are a class of algorithms designed to explore a large search space and find optimal solutions by mimicking evolution and natural selection. Potential solutions are randomly found, evaluated, and bred with one another in hopes of producing better solutions.

Basic Steps

The process of using genetic algorithms goes like this:

  1. Determine the problem and goal
  2. Break down the solution to bite-sized properties (genomes)
  3. Build a population by randomizing said properties
  4. Evaluate each unit in the population
  5. Selectively breed (pick genomes from each parent)
  6. Rinse and repeat

When to Use Genetic Algorithms

GAs are not good for all kinds of problems. They’re best for problems where there is a clear way to evaluate fitness. If your search space is not well constrained or your evaluation process is computationally expensive, GAs may not find solutions in a sane amount of time. In my experience, they’re most helpful when there is a decent algorithm in place, but the “knobs” just need to be tweaked.

For Example…

On a recent project, we had an algorithm that we knew would work, but we needed to find the best possible settings for the tune-able parameters.

We were using pressure data from inside a testing lung to parse out breath waves and calculate derived values such as breath rate, tidal volumes, and peak pressures. We had many recordings of pressure data and knew the expected results. The goal was simple: find the set of variables that would get us as close as possible to the expected results. Our properties were a handful of settings for our existing algorithm: low/high thresholds, smoothing factors, averaging window sizes, etc.

A simpler illustration

To walk through how it works, let’s use something a little simpler: a cannon.

Like the old Cannon Fodder game, we have a cannon that has two variables: powder and elevation. Our goal is to shoot as far as possible. (Let’s pretend we don’t know about physics and don’t know that 45 degrees is the answer.)

Our goal: max distance
Our genomes: powder (0-100) P and elevation (-90-90 degrees) E

Initial population

We start by generating a sample population and evaluating its members. Normally, this sample would be much larger, but for our example, we’ll keep it small:

Powder Elevation Distance
5 -90 0
100 30 80
75 44 65
25 90 35
33 70 20

Given these results, we can use an elitist strategy to select the top X percent of the population to “reproduce.” Once selected, we use crossover/recombination to blend the best of the population.

Crossover and mutation

Our “Elites” include:

Powder Elevation Distance
100 30 80
75 44 65

You may use more than two “parents,” but to keep things simple, I’ll just use two. We mix and match values from the parents (just like in biology). We also mutate a percentage of the values to introduce some randomness into our genomes.

The amount of mutation can greatly affect your results and should be tweaked based on your domain and the problem you are trying to solve. To keep our gene pool from becoming too focused, we’ll also include a couple of non-elites from the previous generation.

After crossover:

Powder Elevation
100 30
100 75
75 30
75 44

With non-elites for diversity:

Powder Elevation
100 30
100 75
75 30
75 44
25 90

After mutation:

Powder Elevation
100 30
92 75
75 30
95 44
25 80

Keeping the non-elites and mutating some of the values will keep us from reaching local optima.

Are we done yet?

Now, we just repeat this process until we’re done. But, how do we know we’re done? GAs are usually terminated by:

  • Fixed number of generations
  • Fixed amount of processing time
  • Optimal/good enough solution is found
  • Good ol’ manual killing

For our example here, the algorithm will trend toward a full powder shot at 45 degrees, and we’ll have our clear winner.

Depending on runtime, problem, and domain, any of these terminators are acceptable. For our breath parsing from pressure samples, we used a manual intervention. If possible, I recommend saving off your populations so you can resume long-running simulations.

In the end, using a GA helped us get our algorithm within tolerances fairly quickly. When algorithms changed or new genomes were discovered, we simply added the genome and started over again with a few of our saved elite values. GAs aren’t a perfect fit for a lot of problems, but they’re definitely a fun and interesting tool to have in the toolbox.