The Assessment and Application of Lineage Information in Genetic Programs for Producing Better...
-
Upload
alexandre-vant -
Category
Documents
-
view
219 -
download
3
Transcript of The Assessment and Application of Lineage Information in Genetic Programs for Producing Better...
The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models
Gary D. Boetticher [email protected]. of Houston - Clear Lake, Houston, TX, USA
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Kim Kaminsky [email protected]. of Houston - Clear Lake, Houston, TX, USA
About the Author: Gary D. Boetticher
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Ph.D. in Machine Learning and Software Engineering
A neural network-based software reuse economic model Executive member of IEEE Reuse Standard Committees (1990s) Commercial consultant:
U.S. Olympic Committee, LDDS Worldcom, Mellon Mortgage, … Currently: Associate Professor
Department of Comp. Science/Software Engineering
University of Houston - Clear Lake,
Houston, TX, USA
[email protected] Research interests: Data mining, ML, Computational Bioinformatics,
and Software metrics
Motivating Questions
Does chromosome lineage information within a Genetic Program (GP) provide any insight into the effectiveness of solving problems?
If so, how could these insights be utilized to make better breeding decisions?
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
2) Determine the fitness for each (1 /Stand. Error)http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on
Information Reuse and Integration
Genetic Program Overview
X, Y, and Z RESULT?
X Y Z RESULT
2 4 5 30
5 3 2 16
: : : :
1 3 6 24
1) Create a population of equations
Eq# Equation
1 X+Y
2 (Z-X)*Y+X
: :
1000 (X*X)-Z
87
84
:
57
3) Breed Equations
X + Y
(Z-X) * Y+X
(Z-X) + Y
X * Y+X
4) Generate new populations and breed until a solution is found
Genetic Program Overview
Equation Fitness
(X+Y) 87
(X - Z) * (Y * Y) 86
ZY 75
: :
Y 22
Y - X 18
Generation N Generation N+1
Equation Fitness
(X - Z)
(X + Y) * (Y * Y)
Z + Y
:
X
Y + Y
Why discard legacy information?
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Goal: Examine fitness patterns over time
Equation Fitness(X+Y) 87
(X - Z) * (Y * Y) 86
ZY 85
(X - Z) * (Y * Y) 84
Y 79
Y - X 75
Z + Y 75
(X - Z) * (Y * Y) 75
Y 73
Y - X 71
(X - Z) * (Y * Y) + W + W 68
Y - X 67
ZY 66
(X - Z) * (Y * Y) 66
Y 65
Y - X 65
(X - Z) * (Y * Y) + W + W 64
Y - X 64
Z - Y 62
(X - Z) * (Y * Y) 59
Y 58
Y - X 55
(X - Z) * (Y * Y) + W + W 44
Equation Fitness(X+Y) 87
(X - Z) * (Y * Y) 86
ZY 85
(X - Z) * (Y * Y) 84
Y 79
Y - X 75
Z + Y 75
(X - Z) * (Y * Y) 75
Y 73
Y - X 71
(X - Z) * (Y * Y) + W + W 68
Y - X 67
ZY 66
(X - Z) * (Y * Y) 66
Y 65
Y - X 65
(X - Z) * (Y * Y) + W + W 64
Y - X 64
Z - Y 62
(X - Z) * (Y * Y) 59
Y 58
Y - X 55
(X - Z) * (Y * Y) + W + W 44
Equation Fitness(X+Y) 87
(X - Z) * (Y * Y) 86
ZY 85
(X - Z) * (Y * Y) 84
Y 79
Y - X 75
Z + Y 75
(X - Z) * (Y * Y) 75
Y 73
Y - X 71
(X - Z) * (Y * Y) + W + W 68
Y - X 67
ZY 66
(X - Z) * (Y * Y) 66
Y 65
Y - X 65
(X - Z) * (Y * Y) + W + W 64
Y - X 64
Z - Y 62
(X - Z) * (Y * Y) 59
Y 58
Y - X 55
(X - Z) * (Y * Y) + W + W 44
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Generation 1 Generation 2 Generation 3
Localized?
Volatile?
Proof of Concept Experiments - 1
5 experiments using synthetic equations:Z = W + X + Y
Z = 2 * X + Y – W
Z = X / Y
Z = X3
Z = W2 + W * X - Y
Data slightly perturbedto prevent prematureconvergence
Genetic Program1000 Chromosomes (Equations)50 GenerationsBreeding based on fitness rank
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Proof of Concept Experiments - 2
For the 1000 Chromosomes:
Divide into 5 groups of 200(by fitness)
Focus on the best, middle, and worst groups
See where each group’s offspring occur in the next generation
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Results for Z = W + X + Y
Best
Middle
Worst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Results for Z = 2 * X + Y – W
Best
Middle
Worst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Results for Z = X / Y
Best
MiddleWorst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Results for Z = X 3
Best
Middle
Worst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Results for Z = W 2 + W * X - Y
Best
Middle
Worst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Applied ExperimentsBest class produces best offspring. Now what?Compare 2 Genetic Programs (GPs)
1) Use a vanilla-based GP2) Use a GP that breeds only the top 20% of a
population and replicates 5 times.
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Genetic Program1000 Chromosomes (Equations)50 Generations20 Trials
Equations to modelZ = Sin(W) + Sin(X) + Sin(Y)
Z = log10
(WX) + (Y * Z)
Results for Z = Sin(W) + Sin(X) + Sin(Y)
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Vanilla-Based
GP
Lineage-Based
GPAverage Fitness 591.8 740.9
Average r2 0.8734 0.9315
Ave. Generations needed to complete
29.1
28.5
Results for Z = log10
(W X) + (Y * Z)
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Vanilla-Based
GP
Lineage-Based
GPAverage Fitness 210.9 346.5
Average r2 0.7244 0.8069
Ave. Generations needed to complete
50.0
48.6
Conclusions
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Proof of concept experiments demonstrate the viability of considering lineage in GPs
Applied experiments show that lineage-based GP modeling produce better results faster