Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

23
Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm Martin Pelikan 1 , Mark W. Hauschild 1 , Dirk Thierens 2 1 Missouri Estimation of Distribution Algorithms Laboratory (MEDAL) University of Missouri, St. Louis, MO [email protected], [email protected] 2 Utrecht University Utrecht, The Netherlands [email protected] Download MEDAL Report No. 2011001 http://medal.cs.umsl.edu/files/2011001.pdf Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

description

The linkage tree genetic algorithm (LTGA) identifies linkages between problem variables using an agglomerative hierarchical clustering algorithm and linkage trees. This enables LTGA to solve many decomposable problems that are difficult with more conventional genetic algorithms. The goal of this paper is two-fold: (1) Present a thorough empirical evaluation of LTGA on a large set of problem instances of additively decomposable problems and (2) speed up the clustering algorithm used to build the linkage trees in LTGA by using a pairwise and a problem-specific metric.http://medal.cs.umsl.edu/files/2011001.pdf

Transcript of Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Page 1: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Pairwise and Problem-Specific Distance Metricsin the Linkage Tree Genetic Algorithm

Martin Pelikan1, Mark W. Hauschild1, Dirk Thierens2

1 Missouri Estimation of Distribution Algorithms Laboratory (MEDAL)University of Missouri, St. Louis, MO

[email protected], [email protected]

2 Utrecht UniversityUtrecht, The [email protected]

Download MEDAL Report No. 2011001

http://medal.cs.umsl.edu/files/2011001.pdf

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 2: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Motivation

Linkage learning

I Standard crossover often ineffective in presence of epistasis.I Linkage learning aims to learn interactions between problem

variables to ensure that crossover does not disrupt importantpartial solutions and it combines them effectively.

I Various evolutionary algorithms capable of linkage learningexist.

This study

I Focuses on linkage tree genetic algorithm (LTGA).I Proposes and analyzes two distance metrics in LTGA.I Analyzes LTGA scalability on a large number of problems.

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 3: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Outline

1. Linkage tree genetic algorithm (LTGA).

2. Distance metrics in LTGA.

3. Experiments.

4. Summary and conclusions.

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 4: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Linkage Tree

Linkage treeI Leaves are individual variables (string positions).I Each internal node has two subtrees.I Each node represents a subset of variables (descendants).I Descendants of any node form a linkage group.I Linkage groups used as masks in LTGA crossover.

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 5: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Linkage Tree Genetic Algorithm

LTGA procedure

I Starts with a random population.I Initial population may undergo local search.I Each generation performs two rounds of crossover to generate

a new population of the same size.

LTGA crossover

I Start with pair (X, Y ) of parents.I For each linkage group [π1, π2, . . . , πk] in T (bottom to top)

I Create X ′ and Y ′ by exchanging bits in positions {π1, . . . , πk}between X and Y .

I If best(X ′, Y ′) is better than best(X, Y ), then replace (X, Y )with (X ′, Y ′).

I The best of the two parents after applying each linkage groupsurvives to the next population.

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 6: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Learning Linkage Tree

Learning linkage treeI Start with each variable being a separate linkage group.I Each step merges two closest groups.I Distance of linkage groups based on variation of information.I Each iteration should merge most strongly interacting groups.

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 7: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Measuring Cluster Distances in LTGA

Distance metric based on variation of informationI Distance of clusters Ci and Cj :

D(Ci, Cj) = 2− H(Ci) + H(Cj)

H(Ci, Cj)

whereI H(Ci, Cj) is the entropy of Ci ∪ Cj

I H(Ci) is the entropy of Ci

I H(Cj) is the entropy of Cj

Bottleneck in learning linkage treeI Most time spent by measuring cluster distances.I Can we alleviate this bottleneck?I We discuss two distance metrics that address this issue.

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 8: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Pairwise Metric

Pairwise metric

I Start by measuring distances between pairs of variables.I Cluster distance computed as average distance between pairs

of variables

D′(Ci, Cj) =1

|Ci| × |Cj |∑

ci∈Ci

∑cj∈Cj

D(ci, cj)

Good news

I We only need pairwise statistics.I This results in much faster distance computation.I Surprisingly, this also helps scalability.

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 9: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Pairwise Metric

Pairwise metric

I Start by measuring distances between pairs of variables.I Cluster distance computed as average distance between pairs

of variables

D′(Ci, Cj) =1

|Ci| × |Cj |∑

ci∈Ci

∑cj∈Cj

D(ci, cj)

Good news

I We only need pairwise statistics.I This results in much faster distance computation.I Surprisingly, this also helps scalability.

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 10: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Problem-Specific Metrics

Basic idea

I If we could estimate distance of clusters without computingstatistics from current population, we could possibly

I save lot of time in learning tree, andI reduce the population sizes and number of generations.

Where to get distances from?

I Problem-specific information.I Learning from optimization runs on similar problems.

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 11: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Additively Decomposable Functions (ADFs)

Additively decomposable function

I Additively decomposable function:

f(X1, . . . , Xn) =m∑

i=1

fi(Si)

I fi is ith subfunctionI Si is subset of variables from {X1, . . . , Xn}

I Variables in located in the same subproblem are expected tointeract more strongly.

I Can we use this fact to create a distance metric for LTGA?

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 12: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Problem-Specific Metric for ADFs

Distance metric for ADFs

I Create graph G = (V,E).I V = {X1, X2, . . . , Xn}.I E = {(i, j) : Xi, Xj ∈ Sk}.I Define weight of each edge from E as d(i, j) = 1.I Define li,j the shortest path between i and j.

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 13: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Problem-Specific Metric for ADFs

Distance metric for ADFs

I Use G to compute distances between variables

D′′(Xi, Xj) =

{li,j if a path between Xi and Xj existsn otherwise

I Cluster distance is defined as an average of pairwise distances

D′′(Ci, Cj) =1

|Ci| × |Cj |∑

ci∈Ci

∑cj∈Cj

D′′(ci, cj)

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 14: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Experiments: Test Problems

Problems

I Concatenated traps of order k.I Nearest-neighbor NK landscapes with wrap-around

neighborhoods.I 2D Ising spin glass.

Why these test problems?

I All test problems require linkage learning.I All test problems are nontrivial.I Yet all test problems are solvable in polynomial time.

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 15: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Experiments: Setup

Test problem parameters, instances

I Traps of order k ∈ {5, 6, 7, 8} were tested.I NK landscapes with k = 5 were tested.I For all problems, n was varied.I For NK landscapes and spin glasses, for each n, 1,000

instances were generated and tested.

LTGA setup

I Bisection was used to find minimum population size forconvergence to the optimum in 10 out of 10 independent runs.

I For traps, bisection is repeated 10 times for each n.I Max. number of generations is set to a sufficiently large value.I Bit-flip local search run on initial population.I Use standard, pairwise, and problem-specific metric.

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 16: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Results: Pairwise Metric on Trap-5

102

103

104

105

106

Problem size, n

Num

ber

of e

valu

atio

ns

LTGA (original), O(n1.27)

LTGA (pairwise), O(n1.25)

I Pairwise metric allows us to solve much larger problems.I Scalability is slightly improved (surprising).I Results for trap-6 and trap-7 similar.

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 17: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Results: Pairwise Metric on NK

20 40 60 80 10010

3

104

105

106

107

Problem size, n

Num

ber

of e

valu

atio

ns

LTGA (original), O(n5.14)

LTGA (pairwise), O(n3.23)

I Pairwise metric allows us to solve much larger problems.I Scalability is significantly improved (surprising).

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 18: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Results: Pairwise Metric on 2D Spin Glass

64 100 144 196 25610

4

105

106

107

Problem size, n

Num

ber

of e

valu

atio

ns

LTGA (original), O(n5.38)

LTGA (pairwise), O(n3.50)

I Pairwise metric allows us to solve much larger problems.I Scalability is significantly improved (surprising).

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 19: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Results: Problem-Specific Metric on Trap-5

102

103

104

105

106

Problem size, n

Num

ber

of e

valu

atio

ns

LTGA (pairwise), O(n1.25)

LTGA (problem), O(n1.26)

I Problem-specific metric similar to pairwise metric.I CPU slightly decreased though with problem-specific metric.

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 20: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Results: Problem-Specific Metric on NK

20 40 60 80 10010

3

104

105

106

107

Problem size, n

Num

ber

of e

valu

atio

ns

LTGA (pairwise), O(n3.23)

LTGA (problem), O(n2.87)

I Problem-specific metric slightly better than pairwise one.I So problem-specific metric pays off.

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 21: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Results: Problem-Specific Metric on 2D Spin Glass

64 100 144 196 25610

4

105

106

107

108

Problem size, n

Num

ber

of e

valu

atio

ns

LTGA (problem), O(n4.05)

LTGA (pairwise), O(n3.50)

I Problem-specific metric scales worse than pairwise one!I Problem-specific metric is not that great for 2D spin glass.

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 22: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Conclusions and Future Work

Conclusions

I LTGA provides opportunities for efficiency enhancements.I LTGA also provides promising tool for using problem-specific

knowledge and learning from experience whensolving many instancesof similar problems.

I Pairwise metric provides important improvement.I Problem-specific metric demonstrates the ability of LTGA to exploit

problem-specific knowledge on additively decomposable functions.I But the results based on problem-specific information are mixed.

Future work

I Design more robust and effective problem-specific metrics.I Design methods to learn distance metrics for specific problem classes.I Improve performance of LTGA on problems of complex structure.I Adopt efficiency enhancement techniques for other evolutionary

algorithms to LTGA, including model-directed local search, fitnessmodeling, parallelization, and others.

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA

Page 23: Pairwise and Problem-Specific Distance Metrics in the Linkage Tree Genetic Algorithm

Acknowledgments

Acknowledgments

I NSF; NSF CAREER grant ECS-0547013.

I University of Missouri; High Performance ComputingCollaboratory sponsored by Information Technology Services;Research Award; Research Board.

Martin Pelikan, Mark W. Hauschild, Dirk Thierens Pairwise and Problem-Specific Metrics in LTGA