Download - Utilizing Condor to Support Genetic Algorithm Design Research

Mary Beth Kurz, PhDAssistant Professor

Department of Industrial EngineeringClemson University

Utilizing Condor to Support Genetic Algorithm Design

Research

Genetic algorithms are metaheuristics for optimization

2

Solution space Objective space

Chromosome space

Chromosome 1

Chromosome 2

Chromosome N

Feasible solutions

Infeasible solutions

decoding

evaluation

My research focuses on what chromosomes should look like and asks whether the “solution representation” impacts the quality of a genetic algorithm

Let’s make this more concrete: the TSP

3

Solution space Objective space

Chromosome space

Chromosome 1

Chromosome 2

Chromosome N

Feasible Solutions: tours that visit all

cities exactly once

Infeasible Solutions:

anything else

decoding

evaluation

The Traveling Salesman Problem asks how to route the salesman through his cities so he returns home as quickly as possible

Each feasible solution’s total travel time is that

solution’s objective

How do I represent the tours? • Directly – the city list? • Indirectly – the list of roads taken?

My Hypothesis: Solution representation affects GA design significantly

Kurz4

Solution space

Chromosome space

Chromosome 1

Chromosome 2

Feasible solutions

Infeasible solutions

Objective space

Optimal objective

value

No chromosomes map to these

solutions

This is the optimal solution!!!

Fix or forbid these chromosomes?

Genetic algorithms are motivated by an analogy to “real” genetics

Kurz5

Chromosome 1

Chromosome 2

Chromosome N

Population(t)Chromosome 1

Chromosome 2

Chromosome N

Population(t+1)

A chromosome is composed of genes, generally randomly selected initially

Genetic OperatorsRandomness comes here

Selection picks some chromosomes as potential parents in crossover

Crossover creates new chromosomes by taking genes from 2 parents

Mutation changes a small number of genes in the entire population

This research is empirical and requires immense computational time

Genetic Algorithms are inherently randomIs it possible that some representation consistently

finds better solutions for a specific problem? Most GA research currently uses 50 replications on

numerous data files180 problem types, 10 files each, 3 representations

= 5400 filesSimplest representation – 1800 files would take

about 45 hours in my Lab (a few years ago)50 replications of 5400 files → at least 241 days of

running time!This is simply not feasible

Kurz6

Grid computing is saving me

Kurz7

Spring 2007 325,000 hrs

Summer – Fall 2007 212,000 hrs

Spring 2008 124,000 hrs

Total: about 660,000 hrs

Since last spring, I’ve had to relearn how to do research!How do I compile all this data?

VBA and Excel!What can I actually analyze?

Not pictures like thisReduce the data to correlations

What statistics do I need to use?Needed to learn non-parametric statisticsNeeded to use SPSS for the analysisUsed VBA to create the input files

Reran to get different output dataSummer – Fall 2007 212,000 hrs

Kurz8

I don’t know about random numbersI started using rand() in C!I use up to 600,000,000 random numbers in each run I have 270,000 runs (5400 * 50)Trying to use Mersenne Twister

Period is 219937 – 1 which is plenty bigHow do I make sure I have independent sets of random

numbers?Use the same initial seed, then “burn” through (n-1) numbers

until we get to the nth setWould take over 4000 days to burn through 269,999 sets for

the last runAgain … not feasible!

Tried to initialize using run numberSpring 2008 124,000 hrs

Kurz9

I still love CondorBut I don’t know about random numbersThought about saving the random numbers in an

input file of 600,000,000 numbers eachStopped generating the first file after it got to 3 GBThis would mean 3*270,000 GB of random number

files!Looking at dynamic streams from Mersenne

TwisterJust heard about SPRNG from Todd on Tuesday

Gearing up for another set of runs … all I need is this set of runs to get a paper out!

Kurz10