Benchmarking Gaussian Processes and Random Forests on the ... · Surrogate CMA-ES Surrogate models...

Surrogate CMA-ESSurrogate models

Experimental results

Benchmarking Gaussian Processes andRandom Forests on the BBOB Noiseless

Testbed

Lukáš Bajer1,2, Zbynek Pitra3,4, Martin Holena2

1Faculty of Mathematics and Physics, Charles University,2Institute of Computer Science, Czech Academy of Sciences, and

3National Institute of Mental Health4Faculty of Nuclear Sciences and Physical Engineering

Prague, Czech Republic

July 2015

Lukáš Bajer, Zbynek Pitra, Martin Holena Benchmarking GP and RF Surrogates for the CMA-ES 1



Contents

1 Surrogate CMA-ES

2 Surrogate modelsGaussian ProcessesRandom Forests

3 Experimental results




The CMA-ESInput: m ∈ Rn, σ ∈ R+, λ ∈ NInitialize: C = I (and several other parameters)Set the weights w1, . . . wλ appropriately

while not terminate

1 xi = m + σyi, yi ∼ N(0,C), for i = 1, . . . , λ {sampling}

2 evaluate xi with the original fitness

3 m←∑µ

i=1 wi xi:λ = m + σyw, yw =∑µ

i=1 wi yi:λ {update mean}

4 update step-size σ

5 update C


m2

σ2,C2



The Surrogate CMA-ESInput: m ∈ Rn, σ ∈ R+, λ ∈ NInitialize: C = I (and several other parameters)Set the weights w1, . . . wλ appropriately

while not terminate

1 xi = m + σyi, yi ∼ N(0,C), for i = 1, . . . , λ {sampling}

2 evaluate xi with the original fitness f & build a model fM /evaluate xi with the model fM

3 m←∑µ

i=1 wi xi:λ = m + σyw, yw =∑µ

i=1 wi yi:λ {update mean}

4 update step-size σ

5 update C


m2, σ2

3rd

1st2nd3rd

6 evaluation by modeland ranking



The Surrogate CMA-ES

Input: g (generation), fM (model), A (archive), nREQ, σ, λ, m, C1: xk ∼ N

(m, σ2C

)k = 1, . . . , λ {CMA-ES sampling}

2: if g is original-evaluated then3: yk ← f (xk) k = 1, . . . , λ {fitness evaluation}4: A = A ∪ {(xk, yk)}λk=15: if |X| ≥ nREQ then6: X← TransformToTheEigenvectorBasis(X, σ, C)7: fM ← trainModel(X, y)8: end if9: else

10: X← TransformToTheEigenvectorBasis(X, σ, C)11: yk ← fM(xk) k = 1, . . . , λ {model evaluation}12: end if




Gaussian ProcessesRandom Forests

Gaussian Process

GP is a stochastic approximation method based on Gaussiandistributions

GP can express uncertainty of the prediction in a new point x:it gives a probability distribution of the output value





Gaussian Process

given a set of N training points XN = (x1 . . . xN)>, xi ∈ Rd,

and measured values yN = (y1, . . . , yN)>

of a function f being approximated

yi = f (xi), i = 1, . . . ,N

GP considers vector of these function values as a samplefrom N-variate Gaussian distribution

yN ∼ N(0,CN)





Gaussian Process prediction

Making predictionsLet CN+1 be extended covariance matrix – extended by entriesbelonging to an unseen point (x, y∗). Because yN is known and

the inverse C−1N+1 can be expressed using inverse of the training

covariance CN−1,

the density in a new point marginalize to 1D Gaussian density

p(y∗ |XN+1, yN) ∝ exp

(−1

2(y∗ − yN+1)

2

s2yN+1

)with the mean and variancegiven by

yN+1 = k>CN−1yN ,

s2yN+1

= κ− k>CN−1k.





Decision tree

A decision tree is a tree where each split node stores a testfunction to be applied to the incoming data and each leaf storesa predictor.





Decision treeAdvantages and disadvantages

Advantages:Relatively fastEasy to interpretAdaptive — structure and parameters learned from trainingdata

Disadvantages:Sharp decision boundariesNot the best predictive accuracy





Random forests

A collection of randomly trained decision treesOverall prediction determined by averagingAll advantages of decision trees




Experimental results on BBOB (5 D)

0 1 2 3log10 of (# f-evals / dimension)

0.0

0.2

0.4

0.6

0.8

1.0Pr

opor

tion

of fu

nctio

n+ta

rget

pai

rs

RF5-CMAES

RF1-CMAES

GP5-CMAES

GP1-CMAES

CMA-ES

best 2009f1-24,5-D






0.0

0.2

0.4

0.6

0.8

1.0Pr

opor

tion

of fu

nctio

n+ta

rget

pai

rs

RF5-CMAES

GP5-CMAES

RF1-CMAES

CMA-ES

GP1-CMAES

best 2009f1-24,10-D






0.0

0.2

0.4

0.6

0.8

1.0Pr

opor

tion

of fu

nctio

n+ta

rget

pai

rs

RF5-CMAES

GP5-CMAES

RF1-CMAES

CMA-ES

GP1-CMAES

best 2009f1-24,20-D




ECDF results on the whole BBOB (5 D)

separable moderate ill-conditional


0.0

0.2

0.4

0.6

0.8

1.0

Prop

ortio

n of

func

tion+

targ

et p

airs

RF5-CMAES

RF1-CMAES

CMA-ES

GP5-CMAES

GP1-CMAES

best 2009f1-5,5-D


0.0

0.2

0.4

0.6

0.8

1.0

Prop

ortio

n of

func

tion+

targ

et p

airs

RF5-CMAES

RF1-CMAES

GP5-CMAES

GP1-CMAES

CMA-ES

best 2009f6-9,5-D


0.0

0.2

0.4

0.6

0.8

1.0

Prop

ortio

n of

func

tion+

targ

et p

airs

RF5-CMAES

RF1-CMAES

GP5-CMAES

CMA-ES

GP1-CMAES

best 2009f10-14,5-D

multi-modal weakly structured multi-modal ill-conditional


0.0

0.2

0.4

0.6

0.8

1.0

Prop

ortio

n of

func

tion+

targ

et p

airs

RF5-CMAES

GP5-CMAES

CMA-ES

RF1-CMAES

GP1-CMAES

best 2009f15-19,5-D


0.0

0.2

0.4

0.6

0.8

1.0

Prop

ortio

n of

func

tion+

targ

et p

airs

RF5-CMAES

RF1-CMAES

CMA-ES

GP5-CMAES

GP1-CMAES

best 2009f20-24,5-D




ECDF results on the whole BBOB (20 D)

separable moderate ill-conditional


0.0

0.2

0.4

0.6

0.8

1.0

Prop

ortio

n of

func

tion+

targ

et p

airs

RF5-CMAES

GP5-CMAES

RF1-CMAES

GP1-CMAES

CMA-ES

best 2009f1-5,20-D


0.0

0.2

0.4

0.6

0.8

1.0

Prop

ortio

n of

func

tion+

targ

et p

airs

RF5-CMAES

GP5-CMAES

RF1-CMAES

CMA-ES

GP1-CMAES

best 2009f6-9,20-D


0.0

0.2

0.4

0.6

0.8

1.0

Prop

ortio

n of

func

tion+

targ

et p

airs

RF5-CMAES

GP5-CMAES

RF1-CMAES

GP1-CMAES

CMA-ES

best 2009f10-14,20-D

multi-modal weakly structured multi-modal ill-conditional


0.0

0.2

0.4

0.6

0.8

1.0

Prop

ortio

n of

func

tion+

targ

et p

airs

RF5-CMAES

GP5-CMAES

RF1-CMAES

GP1-CMAES

CMA-ES

best 2009f15-19,20-D


0.0

0.2

0.4

0.6

0.8

1.0

Prop

ortio

n of

func

tion+

targ

et p

airs

RF5-CMAES

RF1-CMAES

CMA-ES

GP5-CMAES

GP1-CMAES

best 2009f20-24,20-D




Results on separable BBOB functions (1–5)

2 3 5 10 20 400

1

2

3

4

target RL/dim: 10

1 Sphere

CMA-ESGP1-CMAESGP5-CMAESRF1-CMAESRF5-CMAES

2 3 5 10 20 400

1

2

3

4

target RL/dim: 10

2 Ellipsoid separable

2 3 5 10 20 400

1

2

3

4

target RL/dim: 10

3 Rastrigin separable

2 3 5 10 20 400

1

2

3

4

target RL/dim: 10

4 Skew Rastrigin-Bueche separ

2 3 5 10 20 40

0

1

2

3

target RL/dim: 10

5 Linear slope




Results on ill conditional BBOB functions (10–14)

2 3 5 10 20 400

1

2

3

4

target RL/dim: 10

10 Ellipsoid

2 3 5 10 20 400

1

2

3

4

target RL/dim: 10

11 Discus

2 3 5 10 20 400

1

2

3

4

target RL/dim: 10

12 Bent cigar

2 3 5 10 20 400

1

2

3

4

target RL/dim: 10

13 Sharp ridge

2 3 5 10 20 400

1

2

3

4

target RL/dim: 10

14 Sum of different powers




Results on weakly structured multi-modal fcts (20–24)

2 3 5 10 20 400

1

2

3

4

target RL/dim: 10

20 Schwefel x*sin(x)

2 3 5 10 20 40

0

1

2

3

target RL/dim: 10

21 Gallagher 101 peaks

2 3 5 10 20 40

0

1

2

3

target RL/dim: 10

22 Gallagher 21 peaks

2 3 5 10 20 400

1

2

3

4

target RL/dim: 10

23 Katsuuras

2 3 5 10 20 40

0

1

2

3

target RL/dim: 10

24 Lunacek bi-Rastrigin

CMA-ESGP1-CMAESGP5-CMAESRF1-CMAESRF5-CMAES




Conclusions

S-CMA-ES speeded-up CMA-ES on several BBOBfunctionsGaussian processes usually exhibit better performancethan random forestsRandom forests’ performance is rather balanced in 20Dwhere Gaussian processes looses because of the highdimensionalityFurther investigation:

number of model generations adaptivityreduction of the model training phase by starting from oldparametersrandom forest model precision




Thank you!

bajer at cs dot cas dot cz pitra dot z at gmail dot com


Benchmarking Gaussian Processes and Random Forests on the ... · Surrogate CMA-ES Surrogate models...

Documents

Transcript of Benchmarking Gaussian Processes and Random Forests on the ... · Surrogate CMA-ES Surrogate models...