Thursday, January 29, 2009

Modern Genetic (And Other) Algorithms Explained: Part 6 - Tabu Search

(This is part 6 in the Modern Genetic Algorithms Explained series, click here to go to the first post, or browse through all the parts with the Table Of Contents at the end of this post.)

Tabu search is not really "modern" anymore, but still widely used nonetheless.

The pseudocode looks like this:

Construct an initial solution
Until timelimit hit or satisfiable solution found do:
    Find all neighbours which are not in the tabu list, and calculate their score
    Pick the best neighbour, add the previous solution to the tabu list

Notice that the "best neighbour" must not be better than the current solution, or than the ever best found solution.

Maintaining a tabu list can be time and memory consuming, alternatively we could construct a list like this: add the difference between two following solutions to the list (so that they cannot be undone), and keep it in the list for n iterations. N, or the length of the list is important: make it too long and the algorithm might get stuck, make it too short and the algorithm will tend towards local maxima.

This time, I've coded the example in PHP, I hope nobody minds:

$target = 300;

//construct an initial solution
$tabulist = array('ttl' => array(), 'change' => array());
$solution = array();
$min = 0;
$max = 100;
for ($i=0;$i<6;$i++)
    $solution[] = rand($min,$max);
//until solution found
while (true){
    $best_neighbour_solution = false;
    $best_neighbour_score = 1000;
    $best_neighbour_tabu = false;
    for ($position=0;$position<6;$position++){
        if (!in_array("$position+",$tabulist['change'])
                and $solution[$position] < $max){
            $temp_solution = $solution;
            $score = fitness($temp_solution,$target);
            if ($score < $best_neighbour_score){
                $best_neighbour_score = $score;
                $best_neighbour_solution = $temp_solution;
                //make sure this step doesn't get undone
                $best_neighbour_tabu = "$position-";
                and $solution[$position] > $min){
            $temp_solution = $solution;
            $score = fitness($temp_solution,$target);
            if ($score < $best_neighbour_score){
                $best_neighbour_score = $score;
                $best_neighbour_solution = $temp_solution;
                //make sure this step doesn't get undone
                $best_neighbour_tabu = "$position+";
    //pick the best neighbour
    $solution = $best_neighbour_solution;
    $fitness = fitness($solution,$target);
    echo "Iteration $iterations: fitness = $fitness\n";
    if ($fitness == 0){
        echo "Perfect solution found:\n";
    //add change to tabu list
    $tabulist['ttl'][$iterations] = 5; //remember for 5 iteration
    $tabulist['change'][$iterations] = $best_neighbour_tabu;
    //update the tabulist
    foreach ($tabulist['ttl'] as $key => &$val){
        if ($val <= 0){
    echo "Iteration $iterations: tabulist now contains "
        .count($tabulist['ttl'])." items \n";

function fitness($array, $target){
    return abs(array_sum($array)-$target);

The neighbour calculation could be done a little better (there's a bit of ugly duplicate code), but the following output is given:

Iteration : fitness = 57
Iteration : tabulist now contains 1 items
Iteration 1: fitness = 56
Iteration 1: tabulist now contains 2 items
Iteration 2: fitness = 55
Iteration 2: tabulist now contains 3 items
Iteration 3: fitness = 54
Iteration 3: tabulist now contains 4 items
Iteration 4: fitness = 53
Iteration 4: tabulist now contains 4 items
Iteration 5: fitness = 52
Iteration 5: tabulist now contains 4 items
Iteration 55: tabulist now contains 4 items
Iteration 56: fitness = 1
Iteration 56: tabulist now contains 4 items
Iteration 57: fitness = 0
Perfect solution found:
    [0] => 66
    [1] => 14
    [2] => 20
    [3] => 99
    [4] => 14
    [5] => 87

With this sample problem, such an output was, of course, expected. Notice that I've used the second method of keeping a tabulist.

This post concludes this series, we only have one post to go, with a general conclusion.

Table Of Contents (click a link to jump to that post)

Thursday, January 22, 2009

Modern Genetic (And Other) Algorithms Explained: Part 5 - Ant Colony Optimization

(This is part 5 in the Modern Genetic Algorithms Explained series, click here to go to the first post, or browse through all the parts with the Table Of Contents at the end of this post.)

First, let's go to Wikipedia for a definition:
The ant colony optimization algorithm (ACO), is a probabilistic technique for solving computational problems which can be reduced to finding good paths through graphs.

This algorithm is a member of Ant colony algorithms family, in Swarm intelligence methods, and it constitutes some metaheuristic optimizations. Initially proposed by Marco Dorigo in 1992 in his PhD thesis [1] [2] , the first algorithm was aiming to search for an optimal path in a graph; based on the behavior of ants seeking a path between their colony and a source of food. The original idea has since diversified to solve a wider class of Numerical problems, and as a result, several problems have emerged, drawing on various aspects of the behavior of ants.

"Good paths through graphs". I have not converted our sample problem from the previous posts to a graphing problem. So no Python sample this time, sorry.

You do get to see some pseudocode though:
Given a number of nodes
Given some edges between nodes (paths)
Given BT := {}, this will contain our best tour
Randomly construct a tour T
Create a number of "ants" (often equal to number of nodes)
Associate a distance, pheromone value, and delta pheromone value to every edge
Iterate until time limit hit or satisfiable solution found:
  For each ant do:
    Do until tour constructed:
    Next node is chosen depending on visibility (e.g. 1/distance) and pheromone trail
    E.g. choose next node with probability (visibility^a)*(pheromone trail^b)
    Calculate fitness of this tour
    Copy this tour to the best tour if fitness is better
    Update the pheromone trail of each edge of this ant's tour:
    E.g. delta pheromone for edge := 1/(tour cost)
  For each edge:
    Lower pheromone value by a factor
    Add delta pheromone value to pheromone value
    Set delta pheromone := 0

ACO is a very interesting method, again, we see certain aspects already used in the previous posts, like using pheromone trails to avoid local maxima. Since the algorithm is closely tied to graphs, nodes and paths, it's no wonder it's often used to find shortest paths, or to solve the travelling salesman problem.

There are some interesting links on the Wikipedia page, one of them is this application:
After a while, an ant finds a path to the food and places pheromones:

Those who are interested can read more about Swarm intelligence or Particle swarm optimization.

Table Of Contents (click a link to jump to that post)

Monday, January 19, 2009

Modern Genetic (And Other) Algorithms Explained: Part 4 - Simulated Annealing

(This is part 4 in the Modern Genetic Algorithms Explained series, click here to go to the first post, or browse through all the parts with the Table Of Contents at the end of this post.)

First of all: Simulated Annealing is not a genetic algorithm, but it is a modern optimization technique.

Wikipedia tells us the following:
Simulated annealing (SA) is a generic probabilistic meta-algorithm for the global optimization problem, namely locating a good approximation to the global minimum of a given function in a large search space. It is often used when the search space is discrete (e.g., all tours that visit a given set of cities). For certain problems, simulated annealing may be more effective than exhaustive enumeration — provided that the goal is merely to find an acceptably good solution in a fixed amount of time, rather than the best possible solution.
The name and inspiration come from annealing in metallurgy, a technique involving heating and controlled cooling of a material to increase the size of its crystals and reduce their defects. The heat causes the atoms to become unstuck from their initial positions (a local minimum of the internal energy) and wander randomly through states of higher energy; the slow cooling gives them more chances of finding configurations with lower internal energy than the initial one.
Let's take a look at some pseudocode:

Randomly construct a valid solution
For each temperature do:
  For number of trials to do at each temperature do:
    Move solution to a neighbour
    Accept neighbour with probability(old_score, new_score, temperature)
    Lower the temperature by a reduction factor

It becomes clear that this method can be used in discrete optimization problems only, so that we can construct neighbours from our current state. E.g., a first solution in the problem we've been discussing might be [10,51,5,18] with a neighbour [10,52,5,18].

Also, a probability function needs to be defined. Some functions are constructed so that better solutions will always be accepted. Ideally, we would always like to assign a certain probability, so that worse solutions have a chance of being accepted too (or better ones have a chance at rejection). Again: this is to avoid local maxima (comparable with GA's).

Often, the new solution is accepted with an exponential distribution: exp( (new_score-old_score)/temperature ).

Note: the pseudocode at the Wikipedia page doesn't use trials, for some problems, this is good enough.

Let's see some Python code (again, based on the code mentioned in the previous posts):

from random import randint, random
from operator import add
from math import *

def individual(length, min, max):
  'Create a member of the population.'
  return [ randint(min,max) for x in xrange(length) ]

def fitness(individual, target):
  Determine the fitness of an individual. Higher is better.
  individual: the individual to evaluate
  target: the target number individuals are aiming for
  sum = reduce(add, individual, 0)
  return abs(target-sum)

def probability(o_fitness, n_fitness, temperature):
  if n_fitness < o_fitness:
    return 1
  return exp( (n_fitness-o_fitness) / temperature)

def temperature(step, max_steps):
  return max_steps - step

def neighbour(ind, min, max):
  pos_to_mutate = randint(0, len(ind)-1)

  if random() < 0.5:
    ind[pos_to_mutate] -= 1
    ind[pos_to_mutate] += 1

  if ind[pos_to_mutate] < min:
    ind[pos_to_mutate] = min
  elif ind[pos_to_mutate] > max:
    ind[pos_to_mutate] = max

  return ind

def evolve(ind, nr_trials, step, max_steps, min, max, target):
  best_fit = 10000;
  for i in range(1,nr_trials):
    n_ind = neighbour(ind, min, max)
    o_fitness = fitness(ind,target)
    n_fitness = fitness(n_ind,target)

  if n_fitness < best_fit:
    best_fit = n_fitness

  #move to new state?
  if probability(o_fitness, n_fitness, temperature(step,max_steps)) >= random():
    ind = n_ind
    print "Best fitness this evolution:",best_fit
    print "Temperature this evolution:",temperature(step,max_steps)

  return ind

If the fitness of the neighbour is better (remember: that means lower), we immediately accept it. We don't really need a chance of rejection for this problem. Otherwise, we use exp( (n_fitness-o_fitness) / temperature).
We use a separate function to calculate the temperature for each step. This function is kept fairly simple (max steps - this step), but non-linear temperature can also be implemented.

Let's try it, our starting temperature becomes 100, using 1000 trials per temperature:

import sys
from annealing import *
target = 300
i_length = 6
i_min = 0
i_max = 100
i = individual(i_length, i_min, i_max)
print fitness(i, target)
i_k = 0
i_kmax = 100
i_trials = 1000
while i_k < i_kmax:
  i = evolve(i, i_trials, i_k, i_kmax, i_min, i_max, target)
  i_k += 1

The output:
Best fitness this evolution: 34
Temperature this evolution: 100
Best fitness this evolution: 7
Temperature this evolution: 99
Best fitness this evolution: 0
Temperature this evolution: 98
Best fitness this evolution: 44
Temperature this evolution: 97
Best fitness this evolution: 43
Temperature this evolution: 96
Best fitness this evolution: 33
Temperature this evolution: 95
Best fitness this evolution: 39
Temperature this evolution: 94
Best fitness this evolution: 44
Temperature this evolution: 93
Best fitness this evolution: 34
Temperature this evolution: 92
Best fitness this evolution: 36
Temperature this evolution: 91
Best fitness this evolution: 50
Temperature this evolution: 90
Best fitness this evolution: 68
Temperature this evolution: 89
Best fitness this evolution: 67
Temperature this evolution: 88
Best fitness this evolution: 53
Temperature this evolution: 87
Best fitness this evolution: 49
Temperature this evolution: 86
Best fitness this evolution: 35
Temperature this evolution: 85
Best fitness this evolution: 55
Temperature this evolution: 84
Best fitness this evolution: 5
Temperature this evolution: 83
Best fitness this evolution: 0

Simulated annealing is easy to program and implement. Provided it is easy to construct neighbours, and a sensible combination of temperature and probability functions can be constructed, and the number of trials is well defined. Still: this method might be too naive to solve more difficult problems.

The source code can be downloaded here.

An interesting implementation is this. It is based on another genetic programming experiment located here (both are very interesting and fun examples, I highly recommend reading them).

Another Java applet to look at: solving a travelling salesman problem with simulated annealing.

Table Of Contents (click a link to jump to that post)

Testing CSS For Code Tag

Just ignore this... I'm testing the CSS for my code.

width: 464px;
overflow: auto;
background: #f8f8f8;
border-right: 1px dashed #e6e6e6;
border-top: 1px dashed #e6e6e6;
border-bottom: 1px dashed #e6e6e6;
border-left: 4px solid #e6e6e6;
margin-left: 10px;
padding: 4px;

pre,code,tt {
font: 'Liberation Mono', 'lucida console', 'courier new', monospace;

Friday, January 16, 2009

Nokia N95, 3G, Bluetooth, And Ubuntu Intrepid - 3G Tethering Howto

The following steps are instructions to surf the web using a Nokia N95 with 3G, connected to your Intrepid laptop via Bluetooth.

1. Make sure you have the required packages installed:

sudo apt-get install bluez-utils bluez-pin ppp

2. Find out the phone's MAC address. Enable Bluetooth on phone and laptop, and enter the following command:

hcitool scan

Output example:
$ hcitool scan
Scanning ...
00:1C:9A:26:F5:DD Macuyiko N95

And note your MAC-address of your phone.

3. Find out the phone's channel, with the N95, this might change from time to time:

sdptool search --bdaddr MAC DUN | grep Channel

Replace MAC with your phone's MAC address.

Output example:
$ sudo sdptool search --bdaddr 00:1C:9A:26:F5:DD DUN | grep Channel
Channel: 4

Note this channel.

4. Edit /etc/ppp/peers/BluetoothDialup:

sudo gedit /etc/ppp/peers/BluetoothDialup

Paste the following:
/dev/rfcomm1 115200
connect "/usr/sbin/chat -v -f /etc/chatscripts/proximus-gprs"
remotename proximus
lcp-restart 5

There are a few lines which you might need to change. I'm using the Belgium Proximus operator.

First of all, change /etc/chatscripts/proximus-gprs to something more related to your provider (e.g.: /etc/chatscripts/myprovider-gprs). We're going to create this script in the next step. Also: you might need to change the ms-dns entry as well (in most cases you can leave it out, but I had to add it though). Also notice that I have used /dev/rfcomm1 as the used device, we'll use this in the next steps as well.

5. Create a chatscript at /etc/chatscripts/proximus-gprs

sudo gedit /etc/chatscripts/proximus-gprs

Note that you may have chosen a different name for your chatscript in the previous step. Paste:
"" ATZ

Notice the bold entries? You need to change them for your provider. Look up your APN and data profile number. If you google for "APN 3g [yourprovider]" you will often find the correct results, or look here for APNs for many providers. The data profile number line will often be OK ATDT*99# or OK ATDT*99***1#, so try them both.

6. Try it out

Enter the following command:

rfcomm connect RFCOMM# MAC CHANNEL

Replace RFCOMM# with the /dev/rfcomm-number you've used before (only the number!), I've used 1. MAC is your phone's MAC adres again, and CHANNEL is the channel you found earlier.

If all went well it should say:
$ rfcomm connect 1 00:1C:9A:26:F5:DD 4
Connected /dev/rfcomm1 to 00:1C:9A:26:F5:DD on channel 4
Press CTRL-C for hangup

Now we're going to enable the PPP connection, in a new terminal window (keep the "CTRL-C for hangup"-one open), enter:

pon BluetoothDialup

BluetoothDialup is the filename of the file we have created in /etc/ppp/peers/ earlier in step 4.

If all went well you should see an entry now in your ifconfig output:
$ ifconfig
ppp0      Link encap:Point-to-Point Protocol
          inet addr:  P-t-P:  Mask:
          RX packets:3 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:3
          RX bytes:54 (54.0 B)  TX bytes:69 (69.0 B)

If you're done surfing the internet, turn off PPP:


Press CTRL-C in that other terminal window to break the Bluetooth/3G-connection.

Note: if you've done something wrong (e.g.: used the wrong channel), you can release rfcomm's with:
rfcomm release RFCOMM#

With RFCOMM# equal to the /dev/rfcomm-number you've used before.

Optional steps: use gnome-ppp to connect

If you have gnome-ppp installed, you can also use a graphical interface to configure the above steps.

First of all, you still have to execute:

rfcomm connect RFCOMM# MAC CHANNEL

But you don't have to create the files from steps 4 and 5. We could automate the connect-step as well, but since the N95's channel changes from time to time, this wouldn't be very convenient. Also, I like having a terminal open to notify me that I'm still surfing via my phone.

Then open up gnome-ppp. If you have to enter a blank username and password for your provider, just enter some dummy values. I used "blank" and "blank" :).

Phone number: I tried *99***1# this time. And it also seemed to work, great!

Then press Setup. Enter the following values:
  • Device: /dev/rfcomm1 (or the other rfcomm you defined earlier)
  • Type: USB Modem (yes, USB!)
  • Speed: 460800 works here, this probably means I could have used this value in the previous configuration files as well, instead of 115200
  • Phone Line: Tone (default)
  • Volume: High (default)
Then press Init Strings.
  • Leave Init 2 unchanged (ATQ0 V1 E1 S0=0 &C1 &D2 +FCLASS=0)
  • Enter this in Init 3: AT+CGDCONT=1,"IP",""
    Again, change this for your provider! I also had to manually define a DNS in the Networking tab, just like in the previous steps. This might not be the case for you.
You're done, make sure you're not wired or wirelessly connected to test this :).


Final note: some people have to add novj to /etc/ppp/options as well. I didn't, tho. Check the Ubuntu forums/Google for information about your specific operator and/or hardware.

These instructions were only tested with my Thinkpad, my N95, and my operator. I've set up laptops with Vodafone cards before, and you can use gnome-ppp for those as well, just make sure you're using the correct device. Often the device will be at /dev/ttyS0, but use dmesg to find out the exact location.

Tuesday, January 13, 2009

Modern Genetic (And Other) Algorithms Explained: Part 3 - CHC Eshelman

(This is part 3 in the Modern Genetic Algorithms Explained series, click here to go to the first post, or browse through all the parts with the Table Of Contents at the end of this post.)

In this part, we will look at a "special" genetic algorithm, CHC by Eshelman. The original paper is very hard to find, but it is mentioned in a lot of other works.

The pseudocode looks like this (thanks to the commenters for sorting some problems out):
Create initial population: P
L := length of an individual chromosome
N := population size
Threshold :=
Threshold := MutProb * (1.0-MutProb) * L (or L/4 is also used)
Evolution until timelimit hit or satisfiable solution found:
  CPop := {}
  For i := 1 to N/2 do:
    Choose two random parents: P1 and P2
    If (different bits between P1 an P2) / 2 > threshold do:
      Create children C1 and C2 using half-uniform crossover
      Add C1 and C2 to CPop

  If there are no children in Cpop:
    Threshold := threshold - 1
    P := best N individuals from P and CPop

  If threshold < 0
    Cataclysmic creation of new population P
Threshold := MutProb * (1.0-MutProb) * L (or L/4 is also used)

There is another pseudocode description in the slides found here.

A few interesting remarks:
(1) N (population size) is almost always set to 50, but can range from 50 - 10000.
(2) The different bits between P1 and P2 can be defined as the hamming distance between them.
(3) Half uniform crossover swaps exactly half of the non-matching bits. However, often a uniform crossover is used, with a chance of 0.5 - 0.8 of swapping.
(4) MutProb is the mutation probability and originally set to 0.35 (35%).
(5) A "cataclysmic event" occurs when there are no children created for a certain period of time. New children can only be made between parents which are different enough. Basically this means: whenever the population converges towards a certain points, a cataclysm occurs.
(6) What this cataclysm will do depends on the actual implementation. Originally, and often, the following method is used: take the single best individual, and put it in the new population. Now, mutate each of its bits with a 35% chance, this will be the second individual. Repeat this to create a new population of size N. Sometimes, the mutation chance is set to a higher value.

My Python - again, based on the code found here - implementation looks as follows (the original problem was unchanged):
from random import randint, random
from operator import add

def individual(length, min, max):
  return [ randint(min,max) for x in xrange(length) ]

def population(count, length, min, max):
  return [ individual(length, min, max) for x in xrange(count) ]

def fitness(individual, target):
  sum = reduce(add, individual, 0)
  return abs(target-sum)

def grade(pop, target):
  summed = reduce(add, (fitness(x, target) for x in pop))
  return summed / (len(pop) * 1.0)

def hamming(ind1, ind2):
  nr_hamming = 0
  for x in range(0, len(ind1) - 1):
    if (ind1[x] != ind2[x]):
      nr_hamming += 1
  return nr_hamming

def cataclysm(pop, target, min, max):
  #keep the best individual, flip 35% of bits to get new individuals
  pop.sort (lambda x, y : cmp (fitness(x, target), fitness(y, target)))
  firstind = pop[0]
  newpop = [firstind]
  for x in range(1, len(pop)):
    nextind = firstind[:]
    for i in range(0, len(nextind)):
      if 0.35 > random():
        nextind[i] = randint(min, max)
  return newpop

def hux(ind1, ind2):
  child_one = []
  child_two = []
  hamming_dist = hamming(ind1, ind2);
  nr_swaps = 0
  for x in range(0, len(ind1)):
    if (ind1[x] == ind2[x]) or (random > 0.5) or (nr_swaps > hamming_dist / 2):
      #same, just copy to both
      #different, swap with .5 probability, until hamming/2 swaps
      nr_swaps += 1
  return [child_one,child_two]

def evolve(pop, target, min, max, treshold):
  child_population = []
  for i in range(1, len(pop)/2):
    #choose two random parents:
    parent_one = pop[randint(0, len(pop)-1)]
    parent_two = pop[randint(0, len(pop)-1)]
    if (hamming(parent_one, parent_two)/2) > treshold[0]:
      #do hux crossover
      children = hux(parent_one, parent_two)
  if len(child_population) == 0:
    print "No children evolved"
    p_count = len(pop);
    print len(child_population),"children"
    for x in child_population:
    pop.sort (lambda x, y : cmp (fitness(x, target), fitness(y, target)))
    #for x in pop:
    # if fitness(x,target) == 0:
    # print "Perfect individual found:",x
    pop = pop[:p_count]
    print len(pop),"new population, grade:", grade(pop, target)
  if treshold[0] < 0:
    pop = cataclysm(pop, target, min, max)
    print "Cataclysm, newpop length:",len(pop),"grade:",grade(pop,target)
    treshold[0] = len(pop[0]) / 4.0
    print "Treshold is now:",treshold[0]
  return pop

This reminds me: I should really work on a css class for code (update: done), instead of writing everything in monospace. A few remarks:
(1) The implementation is a bit hacky. Python passes everything by reference, except immutable objects. I wanted to pass treshold by reference, which did not work, it being a float and such. That's why I've wrapped it in a list.
(2) I'll use L/4 as the treshold; and I still use a 35% mutate rate, although we are not using bit encoded individuals, though we could set this a bit higher if we wanted.
(3) We do crossover by randomly swapping different values with a 0.5 chance, until half of the values are swapped. Probability-wise, this is not the same as randomly picking half of the different bits. This doesn't matter that much for this example, though.

Let's test it:
import sys
from chc_eshelman import *
target = 300
p_count = 50
i_length = 6
i_min = 0
i_max = 100
treshold = [i_length / 4.0]
p = population(p_count, i_length, i_min, i_max)
print "First grade: ",grade(p, target)
for i in range(0,100):
  p=evolve(p, target, i_min, i_max, treshold)

In the first run, it took two cataclysms to reach a completely perfect population (grade is the average score for the complete population, not for the best single individual, it might be possible to have a perfect individual in the first evolution, still, because this problem is so simple, we look at the complete population):
First grade: 66.88
48 children
50 new population, grade: 30.34
44 children
50 new population, grade: 18.92
48 children
50 new population, grade: 10.64
38 children
50 new population, grade: 6.68
40 children
50 new population, grade: 4.74
36 children
50 new population, grade: 3.84
12 children
50 new population, grade: 3.48
12 children
50 new population, grade: 3.12
6 children
50 new population, grade: 3.0
No children evolved
No children evolved
Cataclysm, newpop length: 50 grade: 48.24
Treshold is now: 1.5
46 children
50 new population, grade: 17.36
36 children
50 new population, grade: 7.8
32 children
50 new population, grade: 4.1
20 children
50 new population, grade: 2.76
14 children
50 new population, grade: 2.44
16 children
50 new population, grade: 2.12
22 children
50 new population, grade: 1.68
20 children
50 new population, grade: 1.28
18 children
50 new population, grade: 1.0
No children evolved
No children evolved
Cataclysm, newpop length: 50 grade: 48.86
Treshold is now: 1.5
40 children
50 new population, grade: 21.04
46 children
50 new population, grade: 5.3
36 children
50 new population, grade: 1.56
40 children
50 new population, grade: 0.38
32 children
50 new population, grade: 0.0

Another run only takes four evolutions to reach a perfect population, with a beautiful convergence:
First grade: 51.16
46 children
50 new population, grade: 24.26
46 children
50 new population, grade: 12.6
34 children
50 new population, grade: 5.78
38 children
50 new population, grade: 0.94
34 children
50 new population, grade: 0.0
20 children
50 new population, grade: 0.0

Sometimes however, the algorithm gets stuck in a loop:
18 children
50 new population, grade: 1.0
22 children
50 new population, grade: 1.0
24 children
50 new population, grade: 1.0
16 children
50 new population, grade: 1.0
26 children
50 new population, grade: 1.0
24 children
50 new population, grade: 1.0
24 children
50 new population, grade: 1.0
18 children
50 new population, grade: 1.0
18 children
50 new population, grade: 1.0
14 children
50 new population, grade: 1.0
16 children
50 new population, grade: 1.0
16 children
50 new population, grade: 1.0

This has something to do with the way the hamming distance is calculated. Sometimes, a pool of two different solutions will be made, but with more than one different values, thus this will always be above the hamming treshold, but will always create the same children, and the same resulting new population.

For example, the algorithm can get stuck in a pool with two types of parents:
1: [83, 19, 67, 64, 23, 44], sum 300
(the target was 300, so fitness: 300-300 = 0: perfect)
2: [38, 28, 67, 64, 6, 97], sum 300

Both are optimal (but the sum might also be both 299, or 299 and 301, etc)... Notice that the hamming distance between them is four, far above the treshold, thus the following children can be created:
[38, 28, 67, 64, 23, 44], sum 264
[38, 19, 67, 64, 6, 97], sum 291
[83, 28, 67, 64, 6, 44], sum 292
[83, 19, 67, 64, 6, 97], sum 336

However, these children all perform worse and will never be considered for the new population, and this is how we get stuck in a loop.

If we'd used a bit-representation, or other workarounds, this would've worked better. For example use another check: if there is no change in the population members: do treshold := treshold - 1. Still, it's good enough to show the workings of the algorithm.

In conclusion, CHC performs very well with only a very limited population size, even in problems where local maxima are common.

If you want to download the source code used in this post, you can find it here.

Table Of Contents (click a link to jump to that post)

Saturday, January 10, 2009

Modern Genetic (And Other) Algorithms Explained: Part 2 - Genetic Algorithms

(This is part 2 in the Modern Genetic Algorithms Explained series, click here to go to the first post, or browse through all the parts with the Table Of Contents at the end of this post.)

Let's take another look at the blog post on which we will base ourselves on. A general genetic algorithms can be described as follows:

Construct initial population
Repeat until time limit hit or satisfiable solution found:
  Assign a fitness score to each individual
  Use a selection method to pick individuals for reproduction
  Construct a new population, using a crossover method
  Use a mutation method to mutate the new population

There are still a lot of gaps to fill in: how large should the population be, how do we construct individuals, which fitness function do we use, which selection methods are available, how do we evolute the population, should we mutate, and how?

Population and Individuals: Encoding and Construction
Let's start with the population: if the population count is too low, there will not be enough diversity between the individuals to find an optimal or good solution. If the count is too high, the algorithm will execute slower, conversion to a solution might become slower as well. There is no 'golden rule' to find an optimal number.

How do we construct individuals? Certainly a simple (and popular choice) is the bit-representation:

0110 0111 0001 1011 ...

When evaluation the individuals, those bits are converted in something meaningful, be it numbers, strings, or other symbols, depending on the problem.

If we had used this representation, an individual for the problem described in the article (find x numbers each between a and b so that the summation of those numbers equal z) would look like this (e.g. x = 4):

01100111 01000101 11010101 11010111

Which would represent 4 numbers between zero and 255: 103+69+213+215. Also notice that our individuals have a fixed-length. Variable-length is a bit more complex, but also possible, and sometimes needed.

What do we do if the number lies out of the permitted bounds? We could:
(1) Avoid that situation by making sure we never generate an individual with a number < a or > b.
(2) Permit those individuals, but assign a very low fitness score to them. In the article, the score is calculated as the absolute difference between the sum of the individual's numbers and the target value. So a score zero (0) means perfect fitness: a valid solution. In this case, invalid individuals would get a score of 1000, for example. Or even better, we evaluate it normally, but then add a "penalty". E.g. the absolute difference between the bounds and the numbers times two.
Why would we use (2)? In some cases it is too difficult or too time-consuming to check every individual to see if it is valid, before putting it in the population pool. Also: permitting invalid individuals might still be useful. They can introduce more variety in the population, and their valid parts can still be used to create offspring.

Why would we use this representation? As we will see in the following steps, a crossover between two individuals will happen to create new offspring, with a bit representation, the crossover point can be placed at any point:

01100111 010 | 00101 11010101 11010111
01101111 000 | 00100 10010111 10011100

When we use a list of integers, we place the crossover point between two numbers, this is different from the previous example, where the crossover point could be arbitrarily placed "in the middle of a number":

103 | 69 , 213 , 215
13  | 22 , 123 , 76

Bit-representation is thus often used when it is not clear how to place the crossover point, but when we do know a way to "convert" a solution to a bitstring. However, always watch out. Some representations become too sensitive to "random" tinkering with bits, making them quickly invalid. In our case: using a bit-representation would be permitted (changing a random bit still creates four integers), but in other cases this method becomes infeasible.

Good, we now have constructed an initial population, containing, say 100, individuals. We also have a fitness function to assign a score to each of them. Lower is better, 0 being a perfect solution.

How do we pick individuals to use to create offspring? We'll need two individuals (a father and a mother). A first way (1) to choose them might be: just pick them at random. Of course, it is ease to see that this is a bad way of choosing parents. There has to be a relation between the fitness and the selection, so that better offspring can be created. When we choose parents at random, bad parents have an equal chance of producing children than fitter parents. (This is against the 'survival of the fittest'-idea, on which genetic algorithms are based on.)

In the article, a second (2) method is used:

Pick the n-best individuals from our current population
Pick two different random individuals from those n-best
Use those as parents for a child

This is certainly a better option, but this way, we completely abandon the lower scoring individuals. It is always better to give even the worst individuals a little chance to be a parent, to maintain a higher level of possible diversity.

With that remark in mind, we arrive at a third (3) solution, the roulette-wheel selection method: the chance of each individual to be selected becomes: fitness of that indivual / total of fitness scores of every individual in the population. For example, consider the following population of five individuals, ordered by their score (let's say that higher is better).

i3: 9
i2: 7
i5: 6
i4: 3
i1: 1
Total: 26

Now pick a random value between zero and 26. Let's say 8: 8 > 1, continue; 8 > 1 + 3, continue; 8 > 1 + 3 + 6, we pick i5. It's easy to see where the name "roulette selection" comes from.

Another selection method (4) is called tournament selection:
Choose k (the tournament size) individuals from the population at random
  Choose the best individual from pool/tournament with probability p
  Choose the second best individual with probability p*(1-p)
  Choose the third best individual with probability p*((1-p)^2)

Note that p can be = 1. Then the best out of k individuals is chosen, this is fairly common. For k, often 2, 3, or 4 is used. This is an often-used method because it is easy to implement, and can be used in parallel environments. Note that when p = 1, k = 1 this method essentially becomes a pure random selection.

Now that we know how to select two parents, how do we create children? Again, there are many techniques here. The first one (1) is the one-point crossover (simple crossover). I will illustrate the following examples with bit-represented individuals.

0000 0000 0000 0000 | 0000 0000
1111 1111 1111 1111 | 1111 1111

Creates two children:

0000 0000 0000 0000 | 1111 1111
1111 1111 1111 1111 | 0000 0000

The crossover point can be randomly chosen, or can be a fixed location (1/4, 1/3, 1/2 are commonly used). After the crossover point, two children are created by swapping their bits.

The second (2) crossover method is two-point crossover, and looks a lot like the previous method:

     v                v
0000 | 0000 0000 0000 | 0000 0000
1111 | 1111 1111 1111 | 1111 1111

Creates two children:

     v                v
0000 | 1111 1111 1111 | 0000 0000
1111 | 0000 0000 0000 | 1111 1111

Again: it's swap - and swap.

Cut and splice (3) is another one, and is only interesting when you need variable-length individuals. I will skip the description, it's in the Wikipedia page (all sources are mentioned at the end of this post).

UX (Uniform Crossover) (4) is a bit more interesting:

1111 1111 1111 0000 0000 0000
1111 1111 1111 1111 1111 1111

Creates two children:

1111 1111 1111 0010 1101 0100
1111 1111 1111 1101 0010 1011

It works as such: every bit has a 0.5% chance of swapping. Of course, if both parents have a bit 0 or 1, it stays 0 or 1.

HUX (Half Uniform Crossover) (5) swaps exactly half of the non-matching bits. So pick N/2 bits out of N non-matching bits, and swap them.

In our reference article, a single fixed crossover point is used, placed in the middle.

Now that that's out of the way, there is only one aspect to look at: mutation. This phase makes sure that there is enough randomness and diversity in the population. Again, there are different options.

First (1) of all: no mutation. For example when their is enough diversity by using smart selection and crossover methods, or when the optimization problem is as such that there are no local maxima (more about that a bit later).

A second (2) method: take N individuals out of our population, and change each of their bits/values with a P chance. Often: N is equal to the complete population size, with P a very small value. Or:

(3) Take N individuals, pick K bits/values in every individual, and change those bits/values. Again: N is often equal to the complete population, and K also low (one for example). This is the method used in the article.

Sometimes, also the following additional (4) method is used: create offspring equal to (P-N), with P being to population size and 0<=N<=P, then add N random individuals. When this method is used, often values like N=P/5 to P/10 are used.

In Action
That's it, we're done! Let's see a few action examples of genetic algorithms.

A problem involves: given a constrained plane and a few randomly placed circles, how can we place another circle so that the radius is maximal, without overlapping the other circles. You can download and try it for yourself here (all sites are also mentioned at the end of this post).

Before the evolution starts, the situation looks like this:

After only 61 generations, we are coming close to the optimum:

Another really cool example can be found here. Be sure to download the binary and test it for yourself:

This site implements a Travelling Salesman Solver with a GA Java applet:

Finally, using the code from the article:
import sys
  from genetic import *
  target = 300
  p_count = 100
  i_length = 5
  i_min = 0
  i_max = 100
  p = population(p_count, i_length, i_min, i_max)
  fitness_history = [grade(p, target),]
  for i in xrange(100):
    p = evolve(p, target)
    fitness_history.append(grade(p, target))

  for datum in fitness_history:
    print datum


After 14 evolutions we already see a perfect population, not bad... Do note however that this is an extremely easy problem: there are many optimum solutions.

Remarks and Problems
Genetic Algorithms are not perfect, nor are they a silver bullet. When badly configured, genetic algorithms tend to expose the same flaws as stochastic hill climbing indeed: a tendency to converge towards local optima.

Consider the following function:

It's not hard to construct a GA which will tend toward the global maximum, even without mutation or introducing much diversity. Consider the following animation, with the green dots representing a few individuals:

However, when you consider the following function:

We might get lucky and end up with:

Notice that there is one individual tending towards the local maximum, but the others "pull" it towards the global one. However, the following could also happen:

To prevent these situations from happening, we have various options at our disposal:
(1) Choose sensible parameters. Population count, selection- crossover- and mutation-methods. Try to find a balance between enough diversity and fast convergence.
(2) Run the GA multiple times, take the best solution.
(3) If the population converges towards a certain solution, randomly regenerate the N worst members of the population and continue evolution (a similar technique could be used in the mutation stage).
(4) Use parallel solutions. A parallel genetic algorithm executes multiple populations at once, often on many computers (or a cluster). Often, there is an extra migration stage added, in which individuals migrate from one population to another.

Wow, what a lot of text! If you're interested in knowing more, check the following links:
  • The original article which inspired this text.
  • Wikipedia also has lots of information: Genetic algorithms, Selection (Roulette and Tournament), Crossover and Mutation.
  • An old tutorial on ai-junkie (but still useful). Back in the day, this was the first thing I read about genetic algorithms.
  • Another old website, made by Marek Obitko, still, the information contained is relevant and interesting. Also hosts the TSP Java applet.
  • This opendir contains university slides about emerging computing technologies. A032 talks about genetic algorithms. The slides are ugly as hell, but contain some good information! (There is a part about CHC Eshelman, which we will discuss in the next part in this series).
  • This Powerpoint file contains some information about CHC Eshelman as well. The file is not that interesting, but does contain a good example on crossover feasibility: sometimes our individuals are encoded as such that they can become invalid after each crossover. We must then "normalize" or correct parents or children to produce valid offspring. We mentioned this problem in the above post.
  • Evolution of Mona Lisa, check this!
There are a lot of software implementations around for genetic algorithms, most of them are written in C, C++ or FORTRAN77, but recently languages such as Java and Python are becoming more popular.
Check Wikipedia's External links for more tutorials and software libraries.

In the next section: we will explore a particular GA implementation: CHC by Eshelman.

Table Of Contents (click a link to jump to that post)

Wednesday, January 07, 2009

Modern Genetic (And Other) Algorithms Explained: Part 1 - Introduction

A few days ago I saw this on reddit, linking to a well-written article about genetic algorithms (GA):

Genetic algorithms are a mysterious sounding technique in mysterious sounding field--artificial intelligence. This is the problem with naming things appropriately. When the field was labeled artificial intelligence, it meant using mathematics to artificially create the semblance of intelligence, but self-engrandizing researchers and Isaac Asimov redefined it as robots.
The name genetic algorithms does sound complex and has a faintly magical ring to it, but it turns out that they are one of the simplest and most-intuitive concepts you'll encounter in A.I.
If you're interested in GA's or modern optimization techniques in general, I strongly suggest you read it. The sample code in Python is fairly easy to read, even if you've never programmed in Python. On the reddit comment section, the following discussion was then started:
The nice thing about GAs is that they are so simple. [...] As soon as you step out of the specialized GA literature and into other fields where the authors just happen to choose a GA for their problem, it's more likely than not that they've chosen a poor implementation.
The problems that simple GAs work pretty well on tend to be the same as those that a simple hill climber would demolish the GA on. As soon as the problems get hard, the GA converges way too quickly to suboptimal solutions. [...]
Part of the problem is the GA community itself. For years, GAs were touted as "global optimizers". We haven't talked like that for several years now, but I still see references to people who seem to think that solving a problem is as simple as just throwing a GA at it.
Followed by a response:
Back in the look-ma-no-hands-period of AI, GAs converged too rapidly because they weren't tuned right: they used too much selection, too little mutation or other exploration mechanisms, too small of a population, etc. Since the field was unified with other evolutionary computation techniques in the early '90s, I don't think premature convergence has been a major impediment for almost 20 years now. Evolutionary computation techniques are pretty much the best available methods for much of stochastic search (and yes, I'm looking at you, simulated annealing). They're highly parallelizable, have good theory, and are representation-independent. Recent versions of them (notably ant colony optimization) are the gold standard for certain major tasks (like stochastic combinatorics problems).
As to tabu search: now there is a technique which has some real pathologies. Perhaps this wasn't the best example you could have chosen.
Interesting... I recommend reading the full discussion at the reddit page.

Since I've always been interested in genetic and other modern optimization techniques, I wanted to a little research. In this series of posts, I will try to explain:
  • Genetic Algorithms
  • A more specialized GA: CHC. Eshelman's algorithms, as mentioned in the reddit comments.
  • Simulated Annealing
  • Ant Colony Optimization
  • And: a comparison with Tabu search

I will mention pseudocode in each of the posts, and will also edit the original Python to use it with the other techniques.

First a disclaimer though: I'm not an expert, I'm not a specialized GA-programmer, neither a mathematician. Also: I'm fairly new at Python, but I thought this would be a good opportunity to toy around with some code, so please excuse my possibly poor programming. Still, that being said, I try my best to provide valid information and real results.

In part 2 I will discuss (general) Genetic Algorithms.

Table Of Contents (click a link to jump to that post)

Monday, January 05, 2009

Blog Updates!

I've made some small tweaks to this blog:

  • Social widget added with latest Twitter status, and link to Friendfeed account. Completely clientsided with Javascript. Using twitter.js.
  • Was getting bored of the black theme, so I made a white one, made two css files and added a switcher - using this code . (You should be able to see it at the right-top of the page.)
    Your choice is saved in a cookie and also Javascript-based (no other choice with Blogger).
    The Bidvertiser ads still have a black background in the white theme, I'll look at that later (I might remove them anyway).