EC and Genetics
Transcript of EC and Genetics
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
1
Part 1 - Natural Genetics
Ben Paechter
with thanks to the EvoNet Training Committee and its “Flying Circus”
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
2
Natural Genetics
The information required to build a living organism is coded in the DNA and other genetic material found in the cells of that organism
Within a species, most of the genetic material is the same
Small changes in the genetic material give rise to small changes in the organism– E.g height, hair colour
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
3
DNA and Genes
DNA is a large molecule made up of fragments. There are several fragment types, each one acting like a letter in a long coded message:
-A-B-A-D-C-B-B-C-C-A-D-B-C-C-A- Certain groups of letters are meaningful together - a
bit like words. These groups are called genes The DNA is made up of genes and rubbish
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
4
Example: Human Reproduction
Human DNA is organised into chromosomes Most human cells contains 23 pairs of chromosomes which together
define the physical attributes of the person:
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
5
Reproductive Cells
Sperm and egg cells contain 23 individual chromosomes rather than 23 pairs
Reproductive cells are formed by one cell splitting into two
During this process the pairs of chromosome undergo an operation called crossover
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
6
Crossover
During crossover the chromosome pairs link up and swap parts of themselves:
Before After
After crossover one of each pair goes into each cell
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
7
Fertilisation
Sperm cell from Father Egg cell from Mother
New person cell
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
8
Mutation
Occasionally some of the genetic material changes very slightly during this process
This means that the child might have genetic material information not inherited from either parent
This is most likely to be catastrophic
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
9
Theory of Evolution From time to time, reproduction, crossover and mutation
produce new genetic material or new combinations of genes Usually this reduces the organism’s ability to survive and so
reproduce Occasionally the new genetic material increases the organism’s
ability survive and so reproduce If it allows the organism to reproduce more then this leads to
more and more organisms have the “new improved” genetic make-up
“Good” sets of genes get reproduced more “Bad” sets of genes get reproduce less
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
10
Theory of Evolution (2)
The organisms as a whole get better and better at surviving in their environment
Evolutionists claim that all the species of plants and animals have been produced by this slow changing of genetic material - with organisms becoming better and better at surviving in their niche, and new organisms evolving to fill any vacant niche
They agree that evolution requires reproduction, selection and mutation
Some say evolution also requires crossover
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
11
Evolution as Search We can think of evolution as a search through the
enormous genetic parameter space for the genetic make-up that best allows an organism to reproduce in its changing environment
Since it seems pretty good at doing this job, we can borrow ideas from nature to help us solve problems that have an equally large search spaces or similarly changing environment
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
12
Dr. Eick’s Transparencies:
Genetics and What EC AlgorithmDesigners can learn from it
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
13
More Genetics: Diploidy and Dominance Diploidy: Most chromosomes in biological systems are double-
stranded(diploid) and not single-standed(haploid) carrying pairs of chromosomes each containing information for the same function.
The primary mechanism to select which genotypical information will be expressed in the phenotype is dominance:
– AbCDe + aBCde ABCDe Diploidy provides a mechanism for remembering alleles and allel
combinations that were previously useful; dominance provides a mechanism to shield those remembered alleles from harmful selection in a current hostile environment (increasing implicitly the richness of the genes expressed in the current population by providing a shield against overselection).
Dominance relationships frequently adapt in biological systems when the need arises.
Hollstien(1971) simulated dominance using a three letter instead of a binary alphabet consisting of: dominant 1, non-dominant 1, and 0 with:
1dom > 0 and 1rec < 0.
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
14
Dominance and Diploidy (Continued)
Other research represents the dominance information separately from the gene and lets it undergo evolution --- a kind of co-evolution approach.
In the late 70s, Smith and Goldberg explored the use of redundancy for the normal knapsack problem with dynamic weight changes:
– Holstein’s triadic scheme showed improvement over a static dominance scheme.
– it turned out that the diploid approach coped better with ascillations in the weight function.
– decreases the probability that desired schemas are lost “forever”. In summary, there seems to be some evidence that exploiting diploidy
can be beneficiary for GAs in dynamically changing environments, especially if scenarios encountered in the past have a tendency to reoccur in the future; on the other hand, diploidy is quite expensive, and not too much research has been performed in the last 15 years that explores its use for GAs.
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
15What can GA-designer learn
from plant genetics and horticulture? polyploidy and dominance gametogenesis is used as the crossover operator use of selfing unusual ways to prevent self fertilization use of intercrossing (create cartesian products of good initial
solutions) preference for heterozygous sources and rich gene pools plant breeders employ complex search strategies to breed the
best possible plant (such as recurrent selection, which will be the topic of this talk).
mutation not very important, because it is hard to control; large population sizes are difficult to handle because of pragmatic reasons.
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
16
Polyploidy
Polyploidy: using two are more complete sets of chromosomes; the
phenotype of an organism is determined through dominance of alleles.
Advantages: adaptation to changing environments, “memorize” alleles that worked successfully in the past, richer gene pool.
Previous Research on Polyploidy: two major approaches to simulate polyploidy in GAs:
using an extra chromosome to represent dominance information [Brindel, this talk]
extending the alphabet to distinguishes between dominant and recessive elements [Holstein, Smith&Goldberg, Ng&Wong]
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
17
Features of our Approach
uses at least 2 sets of chromosomes uses a dominance vector as a tie breaker uses a crossover control vector to restrict possible crossover points dominance vectors and crossover control vectors take part of the
evolution gametogenesis is used as the crossover operator
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
18
3. Experiments
Benchmarks:– Knapsack problem with dynamically changing weight constraints– Schwefel function
Evaluation is performed with respect to the following measure:
M2= (Ti-Xi)2/G
where Ti is the true optimimum for generation i and Xi is the best
solution found in generation i, and G is the number of generations.
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
19
4. Summary
proposed an approach to support polyploidy that uses dominance vectors
demonstrated the benefits of the approach in oscillating environments which cycle among several different states.
crossover control vectors are employed to provide linkage between the dominance vector and the chromosomes themselves.
approach facilitates maintaining diversity in relatively small populations our experiments at least partially explain why diploidy and polyploidy
exist in biological systems.
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
20
Literature
Ben S. Hadad and Christoph F. Eick: Using Recurrent Selection to Improve GA-performance, ISMIS, Charlotte, October 1997.
Ben S. Hadad and Christoph F. Eick: Supporting Polyploidy in Genetic Algorithms Using Dominance Vectors, EP’97, Indianapolis, April 1997.
Ben S. Hadad: Extending Genetic Algorithms Using Ideas Borrowed from Plant Genetics and Horticulture, Master’s Thesis, University of Houston, December 1996.
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
21
Inversion and Other Reordering Operators Reordering operators change the position/location of genes in a
chromosome, but do not change the composition of the chromosome:– consequently, reordering operators do not directly affect the fitness.– however, crossover is effected: namely, the defining length of a schema is
changed by applying reordering operators, which increases or decreases the probability that instances of a particular schema reoccur in the future.
– reordering causes that genes are nolonger lined up corrrectly, which, in many applications, causes problems with the crossover operator:
necessary genes might be missing: non-complete gene combinations can occur. duplicated genes can occur, wbich is usually not desirable.
The most popular reordering operators are inversion and swapping:
1 2 3 | 4 5 6 7 | 8 inversion: 12376548 swap: 12375648 Empirical evidence seem to indicate that at least in some applications
reordering operators are useful “secondary” operator, whose employment induces slight improvements in the overall performance.
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
22
Niche and Speciation We can view a niche as an organism’s job or role in an environment,
and we can think of a species as a class of organisms with common characteristics.
Niche Methods in Genetic Search:– crowding (DeJong(1975)) and sharing functions (Goldberg(1987)).– external schemes (Perry(1984)) which are similarity templates that define
species membership that have be provided by the GA-developer.– Mating restrictions in genetic search:
line breading (breed the champion repeatedly with others) Hollstein’s inbreeding with intermittent crossbreeding (close individuals still bread
as long as their family average fitness continues to improve; otherwise, crossbreeding between different families is used).
Booker introduces mating templates that are mate selection mechamisms that become part of the individual (which themselves undergo evolution) and proposes different mating rules:
– bidirectional match
– unidirectional match
– best partial matches disallow breeding of simimlar indiduals (e.g. incest)
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
23
Example of a Booker Mating Template
Assume we have chromosomes over alphabet A with chromosome length n, and let A’=union(A,{#}).
Extend chromosomes tripling their length to:
ind=a1...anb1...bnc1...cn with aiA, bi and ciA’ (i=1,n) with the meaning:
ind is allowed to mate with ind’: if ind’Schema(b1...bn ) or ind’Schema(c1...cn ).
Example: Let n=4 and A be the binary alphabet:
ind1=0010 0000 1111
ind2=0000 1### 0111
ind3=0111 001# 1111 Bidirectional match requests that “a must want b” and “b must want a”,
whereas in unidirectional match it is sufficient that one partner wants the other. Many other matching schemes are possible; e.g. more complicated ones that
operate on scores and thresholds.
Ch. Eick: What EC-Algorithm Designers can Learn from Genetics
24
Artificial Mating Tags the problem with Booker’s approach is that mating templates have the
same length as the chromosomes themselves, producing a significant overhead. To reduce this overhead Holland proposed to use a three-part strings consisting of:
– a short mating template(used to test suitability of other mates)– a short mating tag(used by others to match, characterizes the string)– the functional substring
Example: #10#:1010:111111000011
#0##:1100:011111110001– mating tags effect the compatibility with other strings, but do not effect the
fitness.– usually, the three-part string is evolved.– Holland’s scheme of using artificial mating tags can also be used to define
mating niches abstractly, similar to Perry’s external schema approach, by freezing particular positions in templates and tags. For example, mating can easily restricted to particular subsets of the population. Mating tags can also be used to simulate distributed GAs.