The Assessment and Application of Lineage Information in Genetic Programs for Producing Better...

17
The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher [email protected] Univ. of Houston - Clear Lake, Houston, TX, USA http://nas.cl.uh.edu/boetticher/ publications.html The 2006 IEEE International Conference on Information Reuse and Integration Kim Kaminsky [email protected] Univ. of Houston - Clear Lake, Houston, TX, USA

Transcript of The Assessment and Application of Lineage Information in Genetic Programs for Producing Better...

Page 1: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston.

The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models

Gary D. Boetticher [email protected]. of Houston - Clear Lake, Houston, TX, USA

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Kim Kaminsky [email protected]. of Houston - Clear Lake, Houston, TX, USA

Page 2: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston.

About the Author: Gary D. Boetticher

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Ph.D. in Machine Learning and Software Engineering

A neural network-based software reuse economic model Executive member of IEEE Reuse Standard Committees (1990s) Commercial consultant:

U.S. Olympic Committee, LDDS Worldcom, Mellon Mortgage, … Currently: Associate Professor

Department of Comp. Science/Software Engineering

University of Houston - Clear Lake,

Houston, TX, USA

[email protected] Research interests: Data mining, ML, Computational Bioinformatics,

and Software metrics

Page 3: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston.

Motivating Questions

Does chromosome lineage information within a Genetic Program (GP) provide any insight into the effectiveness of solving problems?

If so, how could these insights be utilized to make better breeding decisions?

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Page 4: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston.

2) Determine the fitness for each (1 /Stand. Error)http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on

Information Reuse and Integration

Genetic Program Overview

X, Y, and Z RESULT?

X Y Z RESULT

2 4 5 30

5 3 2 16

: : : :

1 3 6 24

1) Create a population of equations

Eq# Equation

1 X+Y

2 (Z-X)*Y+X

: :

1000 (X*X)-Z

87

84

:

57

3) Breed Equations

X + Y

(Z-X) * Y+X

(Z-X) + Y

X * Y+X

4) Generate new populations and breed until a solution is found

Page 5: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston.

Genetic Program Overview

Equation Fitness

(X+Y) 87

(X - Z) * (Y * Y) 86

ZY 75

: :

Y 22

Y - X 18

Generation N Generation N+1

Equation Fitness

(X - Z)

(X + Y) * (Y * Y)

Z + Y

:

X

Y + Y

Why discard legacy information?

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Page 6: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston.

Goal: Examine fitness patterns over time

Equation Fitness(X+Y) 87

(X - Z) * (Y * Y) 86

ZY 85

(X - Z) * (Y * Y) 84

Y 79

Y - X 75

Z + Y 75

(X - Z) * (Y * Y) 75

Y 73

Y - X 71

(X - Z) * (Y * Y) + W + W 68

Y - X 67

ZY 66

(X - Z) * (Y * Y) 66

Y 65

Y - X 65

(X - Z) * (Y * Y) + W + W 64

Y - X 64

Z - Y 62

(X - Z) * (Y * Y) 59

Y 58

Y - X 55

(X - Z) * (Y * Y) + W + W 44

Equation Fitness(X+Y) 87

(X - Z) * (Y * Y) 86

ZY 85

(X - Z) * (Y * Y) 84

Y 79

Y - X 75

Z + Y 75

(X - Z) * (Y * Y) 75

Y 73

Y - X 71

(X - Z) * (Y * Y) + W + W 68

Y - X 67

ZY 66

(X - Z) * (Y * Y) 66

Y 65

Y - X 65

(X - Z) * (Y * Y) + W + W 64

Y - X 64

Z - Y 62

(X - Z) * (Y * Y) 59

Y 58

Y - X 55

(X - Z) * (Y * Y) + W + W 44

Equation Fitness(X+Y) 87

(X - Z) * (Y * Y) 86

ZY 85

(X - Z) * (Y * Y) 84

Y 79

Y - X 75

Z + Y 75

(X - Z) * (Y * Y) 75

Y 73

Y - X 71

(X - Z) * (Y * Y) + W + W 68

Y - X 67

ZY 66

(X - Z) * (Y * Y) 66

Y 65

Y - X 65

(X - Z) * (Y * Y) + W + W 64

Y - X 64

Z - Y 62

(X - Z) * (Y * Y) 59

Y 58

Y - X 55

(X - Z) * (Y * Y) + W + W 44

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Generation 1 Generation 2 Generation 3

Localized?

Volatile?

Page 7: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston.

Proof of Concept Experiments - 1

5 experiments using synthetic equations:Z = W + X + Y

Z = 2 * X + Y – W

Z = X / Y

Z = X3

Z = W2 + W * X - Y

Data slightly perturbedto prevent prematureconvergence

Genetic Program1000 Chromosomes (Equations)50 GenerationsBreeding based on fitness rank

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Page 8: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston.

Proof of Concept Experiments - 2

For the 1000 Chromosomes:

Divide into 5 groups of 200(by fitness)

Focus on the best, middle, and worst groups

See where each group’s offspring occur in the next generation

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Page 9: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston.

Results for Z = W + X + Y

Best

Middle

Worst

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Page 10: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston.

Results for Z = 2 * X + Y – W

Best

Middle

Worst

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Page 11: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston.

Results for Z = X / Y

Best

MiddleWorst

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Page 12: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston.

Results for Z = X 3

Best

Middle

Worst

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Page 13: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston.

Results for Z = W 2 + W * X - Y

Best

Middle

Worst

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Page 14: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston.

Applied ExperimentsBest class produces best offspring. Now what?Compare 2 Genetic Programs (GPs)

1) Use a vanilla-based GP2) Use a GP that breeds only the top 20% of a

population and replicates 5 times.

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Genetic Program1000 Chromosomes (Equations)50 Generations20 Trials

Equations to modelZ = Sin(W) + Sin(X) + Sin(Y)

Z = log10

(WX) + (Y * Z)

Page 15: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston.

Results for Z = Sin(W) + Sin(X) + Sin(Y)

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Vanilla-Based

GP

Lineage-Based

GPAverage Fitness 591.8 740.9

Average r2 0.8734 0.9315

Ave. Generations needed to complete

29.1

28.5

Page 16: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston.

Results for Z = log10

(W X) + (Y * Z)

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Vanilla-Based

GP

Lineage-Based

GPAverage Fitness 210.9 346.5

Average r2 0.7244 0.8069

Ave. Generations needed to complete

50.0

48.6

Page 17: The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models Gary D. Boetticher Boetticher@uhcl.edu Univ. of Houston.

Conclusions

http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration

Proof of concept experiments demonstrate the viability of considering lineage in GPs

Applied experiments show that lineage-based GP modeling produce better results faster