An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

23
An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty Steve Rowe June 2006 Srowe -at- cybernet -dot- com Please put "genetic" in the subject if you write.

description

An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty. Steve Rowe June 2006 Srowe -at- cybernet -dot- com Please put "genetic" in the subject if you write. The GA. Binary Chromosome Evolutionary Programming (EP) Approach: 100% Mutation Rate - PowerPoint PPT Presentation

Transcript of An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

Page 1: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

An Exemplar-Based Fitness Function with Dynamic Value

Based on Sub-Problem Difficulty

Steve Rowe

June 2006Srowe -at- cybernet -dot- com

Please put "genetic" in the subject if you write.

Page 2: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

The GA

• Binary Chromosome

• Evolutionary Programming (EP) Approach:– 100% Mutation Rate– 0% Cross-over (Recombination) Rate

• BUT, non-EP approach:– Mutation rate is constant– Mutation is not restricted to “small” changes

Page 3: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

FSM Encoding

• For a state machine with a maximum of N states, on an alphabet with M characters– N*(1+M*ceiling(log2(N))) bits are used.

– Genome is divided into N bit fields, one for each state.

– Each state bitfield is divided into a 1-bit flag (isFinal) and M binary numbers, which is the number of the state to go to when that character is read.

– The first state is always the start state

Page 4: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

FSM Encoding Example

• 2 states max, over the alphabet {a,b,c}:– Bits = 2 * (1 + 3 * log2(2)) = 2 * (1+3*1) = 8

final a b c

1 bit 1 bit 1 bit 1 bit

final a b c

1 bit 1 bit 1 bit 1 bit

State 0 State 1

Page 5: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

FSM Encoding Example

• The 2-state machine over {a,b,c} that accepts strings with an odd number of 'b's

final a b c

1 1 0 1

final a b c

0 0 1 0

State 0 State 1

a,c a,cb

b

Page 6: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

FSM Encoding Example

• 3 states max, over the alphabet {a,b}:– Bits = 3 * (1 + 2 * ceil(log2(3))) – = 3 * (1+2*2) = 15

Final a b

1 bit 2 bits 2 bits

Final a b

1 bit 2 bits 2 bits

Final a b

1 bit 2 bits 2 bits

State 0 State 1 State 2

Page 7: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

FSM Encoding Example

• A machine that accepts only strings in (ab)*.– That is, e, ab, abab, ababab, etc.

Final a b

1 01 10

Final a b

0 10 00

Final a b

0 10 10

State 0 State 1 State 2

a

ba,b

a

b

Page 8: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

Encoding Notes

• If the number of states is not a power of 2, it is possible to code a "broken" FSM where there is a transition to a non-existent state.– My implementation guarantees unbroken FSMs by

taking the destination modulo the number of states.

• Typical inductions are on {a,b} with 12 states (108 bits per genome), so the search space is about 3.25x1032. That's about 1019 years to search at a million genomes per second.

Page 9: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

The Fitness Function

• How do you test if a FSM accepts a given regular language?– RLs are potentially infinite in size

• Answer: Create a representative sample of strings that are in the language, and count the number of those that are accepted.

• Oops: This simple machine accepts all strings over {a,b}. a,b

Page 10: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

The Fitness Function

• So we also need a representative sample of strings that are not in the language.

• So we have a set of strings, each of which has a flag indicating if it is in the language.

Page 11: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

The Fitness Function

• During evaluation, we run the FSM that corresponds to each (unique) member of the population on each string in the test set.

• If the FSM accepts a string in the language or rejects a string that is not in the language, it scores a "correct".

• The score for a FSM = correct/total tests

Page 12: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

Initial Results

• For the language consisting of strings with both "ab" and "ba" as substrings, a solution is:– 010110011110010010111000101100101000101011

Page 13: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

Initial Results

• But it's easier to see this way

• Even with high mutation, it takes on average 66,000 generations to converge.

Page 14: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

Problems

• As with many genetic programming applications, changing 1 bit can turn a successful genome into a complete failure.– Not as much of a problem here because a 1-bit change

only changes the isFinal for a single state, or the state transtion for a single symbol from a single state.

• There can be some big, attractive local maxima with the given fitness scheme (scoring 19 out of 20, but that last test case is nothing like the others).

Page 15: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

One Solution

• Bigger genomes?– Theory: It is sometimes easier to build a Rube

Goldberg FSM than to build an elegant, compact one.

– So: Allow the maximum number of states to be larger and convergence seems to happen faster.

– But that's crap, because increasing the number of states also increases the search space by many times

Page 16: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

Think the Problem Through

• With this fitness paradigm, a local maximum trap happens when some feature of the target language is under-represented in the test set. A FSM that accepts the language sans that feature will score well, but not perfectly, and may be very different in implementation than one that accepts the correct language.

Page 17: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

Missed it by this much.

• The machine shown here accepts strings in b*a+b(a|b)*, which is close to the target so it scores well.

• But it doesn't reject abbbb

• This is a local maximum that is difficult to leave.

Page 18: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

Another Approach

• Pick the tests that are often failing and give them more weight.– Score them higher OR– Score others lower

• I use– weight = incorrect / number of evaluations– =(evaluations - correct) / evaluations

• Weight adjustments are made once per generation, so they are not very expensive.

Page 19: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

Implications

• The elite in the population will have their fitnesses lowered relative to individuals that got the "rare" problems right.– This limits the dominance of dynasties because:

• The more members of a population that get a problem right, the less that problem is worth

• The better a member scores, the more of them there will be

• Therefore, the very problems that a well-scoring member is getting right are the ones that are being devalued the fastest!

Page 20: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

Fitness versus Generation

• The fitness of the best individual (blue trace) and the mean population fitness (red trace) are shown versus generation. Circled are the characteristic steady declines in fitness value of the elite caused by the dynamic fitness function.

Page 21: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

Results

• Over many different languages and thousands of trials, the dynamic fitness function is superior.– 2 orders of magnitude faster convergence– 3 orders of magnitude smaller variance

Page 22: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

Convergence of Dynamic vs Static Fitness

1

10

100

1000

10000

100000

1000000

1 21 41 61 81 101 121 141 161 181 201 221 241 261

Run

Gen

erat

ion

s

Dynamic Fitness

Static Fitness

Page 23: An Exemplar-Based Fitness Function with Dynamic Value Based on Sub-Problem Difficulty

Conclusion

• I have presented a fitness function based on exemplar data rather than a closed form function.

• I have shown that the fitness function can be used to induce FSMs based on example strings from a language.

• I have introduced an optimization technique for varying the value of individual test cases within the test set that causes much faster convergence.