Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University...

35
Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam

Transcript of Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University...

Page 1: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

Brief introduction togenetic algorithms

andgenetic programming

A.E. EibenFree University Amsterdam

Page 2: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 2 EvoNet Summer School 2002

Genetic algorithm(s)

Developed: USA in the 1970’s Early names: J. Holland, K. DeJong, D. Goldberg Typically applied to:

discrete optimization Attributed features:

not too fast good solver for combinatorial problems

Special: many variants, e.g., reproduction models, operators formerly: the GA, nowdays: a GA, GAs

Page 3: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 3 EvoNet Summer School 2002

Genotype space = {0,1}LPhenotype space

Encoding (representation)

Decoding(inverse representation)

011101001

010001001

10010010

10010001

Representation

Page 4: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 4 EvoNet Summer School 2002

GA: crossover (1)

Crossover is used with probability pc

1-point crossover: Choose a random point on the two parents (same for both) Split parents at this crossover point Create children by exchanging tails

n-point crossover: Choose n random crossover points Split along those points Glue parts, alternating between parents

uniform crossover: Assign 'heads' to one parent, 'tails' to the other Flip a coin for each gene of the first child Make an inverse copy of the gene for the second child

Page 5: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 5 EvoNet Summer School 2002

GA: crossover (2)

Page 6: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 6 EvoNet Summer School 2002

GA: mutationMutation: Alter each gene independently with a probability pm

Relatively large chance for not being mutated

(exercise: L=100, pm =1/L)

Page 7: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 7 EvoNet Summer School 2002

Crossover OR mutation?

Decade long debate: which one is better / necessary / main-background

Answer (at least, rather wide agreement): it depends on the problem, but in general, it is good to have both both have another role mutation-only-EA is possible, xover-only-EA would not work

Page 8: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 8 EvoNet Summer School 2002

Exploration: Discovering promising areas in the search space, i.e. gaining information on the problem

Exploitation: Optimising within a promising area, i.e. using information

There is co-operation AND competition between them

Crossover is explorative, it makes a big jump to an area somewhere “in between” two (parent) areas

Mutation is exploitative, it creates random small diversions, thereby staying near (i.e., in the area of ) the parent

Crossover OR mutation? (cont’d)

Page 9: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 9 EvoNet Summer School 2002

Only crossover can combine information from two parents

Only mutation can introduce new information (alleles)

Crossover does not change the allele frequencies of the population (thought experiment: 50% 0’s on first bit in the population, ?% after performing n crossovers)

To hit the optimum you often need a ‘lucky’ mutation.

Crossover OR mutation? (cont’d)

Page 10: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 10 EvoNet Summer School 2002

Selection Main idea: better individuals get higher chance 2ndary idea: chances proportional to fitness Implementation: roulette wheel technique

Assign to each individual a part of the roulette wheel (“unfair”: size proportional to its fitness)

Spin the wheel n times to select n individuals (fair)

fitness(A) = 3

fitness(B) = 1

fitness(C) = 2

A C

1/6 = 17%

3/6 = 50%

B

2/6 = 33%

Page 11: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 11 EvoNet Summer School 2002

Selection (cont’d)

Fitness proportional selection (FPS):

Expected number of times fi is selected for mating is: .

Disadvantages: Outstanding individuals take over the entire population

very quickly danger for premature convergence. Low selection pressure when fitness values are near each

other. Behaves differently on transposed versions of the same

function.

fi

f

Page 12: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 12 EvoNet Summer School 2002

Tournament selection: Pick k individuals randomly, without replacement Select the best of these k comparing their fitness values k is called the size of the tournament selection is repeated as many times as necessary

Selection (cont’d)

Page 13: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 13 EvoNet Summer School 2002

Generational GA reproduction cycle

1. Select parents for the mating pool

(size of mating pool = population size)

2. Shuffle the mating pool

3. For each consecutive pair apply crossover with probability pc

4. For each new-born apply mutation (bit-flip with probability pm)

5. Replace the whole population by the resulting mating pool

Page 14: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 14 EvoNet Summer School 2002

Generational GA reproduction cycle

string1

string2

string3

string4

Generation t

string2

string4

string2

string1

Mating pool

child1(2,4)

mut(child2(2,4))

string2

mut(string1)

Generation t+1

Notes:• Offspring can be: clone, pure mutant, pure crossing, mutated crossing• Generational replacement: whole population deleted/replaced

To be discussed: no survival of the fittest here?

Page 15: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 15 EvoNet Summer School 2002

An example after Goldberg ‘89 (1)

Simple problem: max x2 over {0,1,…,31} GA approach:

Representation: binary code, e.g. 01101 13 Population size: 4 1-point xover, no mutation (just an example!) Roulette wheel selection Random initialisation

One generational cycle with the hand shown

Page 16: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 16 EvoNet Summer School 2002

An example after Goldberg ‘89 (2)

Page 17: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 17 EvoNet Summer School 2002

An example after Goldberg ‘89 (3)

Page 18: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 18 EvoNet Summer School 2002

The simple GA

Has been subject of many (early) studies Is often used as benchmark for novel GAs Shows many shortcomings, e.g.

Representation is too restrictive Mutation & crossovers only applicable for bit-string & integer

representations Selection mechanism sensitive for converging populations with close

fitness values Generational population model can be improved with explicit survivor

selection

Page 19: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 19 EvoNet Summer School 2002

Genetic programming Developed: USA in the 1990’s Early names: J. Koza Typically applied to:

machine learning tasks Attributed features:

competes with neural nets and alike slow needs huge populations (thousands)

Special: non-linear chromosomes: trees, graphs mutation possible but not necessary (disputed!)

Page 20: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 20 EvoNet Summer School 2002

MotivationWhy introduce yet another EA?

Reasons: Elements of a search space may vary in length Linear representation may be too ‘unnatural’ Complex variable hierarchy can not be (easily) mapped on linear

structures

Example search space: Graphs without restriction on size and structure

Because fixed length linear representations are too rigid

Page 21: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 21 EvoNet Summer School 2002

Credit score example (1)

Given: lot of historical data on: customer profile and creditability index (good/bad).

Needed: a model that classifies good customers. (to be used for evaluating loan applicants)

Data description for customer profiles.

Page 22: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 22 EvoNet Summer School 2002

Credit score example (2)

A possible model for classification:IF (NOC = 2) AND (S > 80000) THEN good

In general: IF formula THEN good. Need to find the right formula. Natural representation of formulas is: parse trees

Natural fitness of models: percentage of well classified cases.

AND

S2NOC 80000

>=

Page 23: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 23 EvoNet Summer School 2002

GP: representation

Problem domain: modelling (forecasting, regression, classification, data mining, robot control).

Fitness: the performance on a given (training) data set, e.g. the nr. of hits/matches/good predictions

Representation: implied by problem domain, i.e.

individual = model = parse tree parse trees sometimes viewed as LISP expressions

GP = evolving computer programs parse trees sometimes viewed as just-another-genotype

GP = a GA sub-dialect

Page 24: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 24 EvoNet Summer School 2002

Replace randomly chosen subtree by a randomly generated (sub)tree

GP: mutation

Page 25: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 25 EvoNet Summer School 2002

Exchange randomly selected subtrees in the parents

GP: crossover

Page 26: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 26 EvoNet Summer School 2002

GP: selection Standard GA selection is usual

Sometimes overselection to increase efficiency: rank population by fitness and divide it into two groups:

group 1: best c % of population group 2: other 100-c %

when executing selection 80% of selection operations chooses from group 1 20% from group 2

for pop. size = 1000, 2000, 4000, 8000 the portion c is

c = 32%, 16%, 8%, 4% %’s come from rule of thumb

Page 27: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 27 EvoNet Summer School 2002

Generating random trees

Given a: Function set F and a terminal set T , both satisfying the closure property.

Trees are randomly generated by: Full method:

Each branch is of length Dmax (pre-specified), nodes with depth < Dmax are from F nodes with depth = Dmax are from T

Grow method: maximum branch length is Dmax (pre-specified)

Ramped half-and-half: for each D Dmax an equal nr. of trees

half of them with full method half of them with grow method

Page 28: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 28 EvoNet Summer School 2002

Mutation of trees

Replace randomly chosen subtree by a randomly generated (sub)tree.

Page 29: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 29 EvoNet Summer School 2002

Crossover of trees

Exchange randomly selected subtrees in the parents

Page 30: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 30 EvoNet Summer School 2002

Standard parameters in GP (1)

Qualitative variables Initialisation: ramped half-and-half.Fitness: adjusted fitness is used.

Selection: fitness proportionate, elitist strategy is not used, over-selection is used for populations of M 1000.

Over selection for population size = 1000: rank population by fitness and divide it into two groups: group 1: best c = 32% of pop, group 2 other 68% 80% of selection operations chooses from group 1, 20% from group 2 for pop. size = 2000, 4000, 8000 the portion c is c = 16%, 8%, 4% motivation: to increase efficiency, %’s come from rule of thumb

Page 31: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 31 EvoNet Summer School 2002

Standard parameters in GP (2)

Major numerical parameters Population size M = 500. Maximum number of generations G = 51.

Minor numerical parameters Probability pm of mutation = 0% !!!

Probability pr of reproduction = 10%

Probability pc of crossover = 90%

Probability pip of choosing internal points for xover = 90%

Maximum size Di for initial random S-expressions = 6

Maximum size Dc for S-expressions during the run = 17

… and some “exotic” options usually set at 0 (e.g. permutation, editing, encapsulation, decimation)

Page 32: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 32 EvoNet Summer School 2002

Simple symbolic regression (1)

Given a number of sample points in 2:

(x1, y1), (x2, y2), … , (xn, yn)

Find a one-dimensional numerical function f(x):

i {1, …, n} : f(xi) = yi

In the present test 20 sample points are generated by:

x4 + x3 + x2 + x

Page 33: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 33 EvoNet Summer School 2002

Simple symbolic regression (2)

Specification of the GP for the symbolic regression problem

Page 34: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 34 EvoNet Summer School 2002

Simple symbolic regression (3)

Graphical representation of an individual (and the benchmark formula x4 + x3 + x2 + x)

Page 35: Brief introduction to genetic algorithms and genetic programming A.E. Eiben Free University Amsterdam.

A.E. Eiben, GAs and GP 35 EvoNet Summer School 2002

Simple symbolic regression (5)

Best individual representing a perfect solution:X +(X • (X +(X • (X • (X +(COS(X - X ) - (X - X))))))) =

X + (X • (X + (X • (X • (X + 1))))) =X + X4 + X3 + X2

+

+

X

X

•X

• X

X+

X -

-cos

X X-

XX