Fast Evolutionary Optimization: Comparison-Based Optimizers Need Comparison-Based Surrogates. PPSN...
-
Upload
ilya-loshchilov -
Category
Education
-
view
527 -
download
0
description
Transcript of Fast Evolutionary Optimization: Comparison-Based Optimizers Need Comparison-Based Surrogates. PPSN...
MotivationBackground
ACM-ES
Comparison-Based Optimizers NeedComparison-Based Surrogates
Ilya Loshchilov1,2, Marc Schoenauer1,2, Michèle Sebag2,1
1TAO Project-team, INRIA Saclay - Île-de-France
2and Laboratoire de Recherche en Informatique (UMR CNRS 8623)Université Paris-Sud, 91128 Orsay Cedex, France
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 1/ 20
MotivationBackground
ACM-ES
Content
1 MotivationWhy Comparison-Based Surrogates ?Previous Work
2 BackgroundCovariance Matrix Adaptation Evolution Strategy (CMA-ES)Support Vector Machine (SVM)
3 ACM-ESAlgorithmExperiments
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 2/ 20
MotivationBackground
ACM-ES
Why Comparison-Based Surrogates ?Previous Work
Why Comparison-Based Surrogates ?
Surrogate Model (Meta-Model) assisted optimization
Construct the approximation model M(x) of f(x).
Optimize the model M(x) in lieu of f(x) to reducethe number of costly evaluations of the function f(x).
Example
f(x) = x21 + (x1 + x2)
2.
An efficient Evolutionary Algorithm (EA) withsurrogate models may be 4.3 faster on f(x).
But the same EA is only 2.4 faster on f ′(x) = f(x)1/4! a
aCMA-ES with quadratic meta-model (lmm-CMA-ES) on fSchwefel 2-D
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 3/ 20
MotivationBackground
ACM-ES
Why Comparison-Based Surrogates ?Previous Work
Ordinal Regression in Evolutionary Computation
Goal: Find the function F (x) which preserves the ordering of thetraining points xi (xi has rank i):
xi � xj ⇔ F (xi) > F (xj)
F (x) is invariant to any rank-preserving transformation.
CMA-ES with Rank Support Vector Machine on Rosenbrock: 1
200 400 60010
6
104
102
100
200 600 100010
1
100
101
102
1000 2000 300010
0
101
102
103
# function evaluations
me
an
fitn
ess
(μI ,λ )CMA-ES
Poly, d =2Poly, d =4RBF, γ =1
n =2 n =5 n =10
# function evaluations# function evaluations
1T. Runarsson (2006). "Ordinal Regression in Evolutionary Computation"
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 4/ 20
MotivationBackground
ACM-ES
Why Comparison-Based Surrogates ?Previous Work
Exploit the local topography of the function
CMA-ES adapts the covariance matrix C which describes thelocal structure of the function.
Mahalanobis (fully weighted Euclidean) distance:
d(xi, xj) =√
(xi − xj)TC−1(xi − xj)
Results of CMA-ES with quadratic meta-models: 2
2 4 8 16
102
104
106
108
n
O(F
LO
P)/
saved function e
valu
ation
lmm-CMA-ES
Speed-up: a factor of 2-4 for n ≥ 4
Complexity: from O(n4) to O(n6)
Rank-preserving invariance: NO
becomes intractable for n>16
2S. Kern et al. (2006). "Local Meta-Models for Optimization Using Evolution Strategies"
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 5/ 20
MotivationBackground
ACM-ES
Why Comparison-Based Surrogates ?Previous Work
Tractable or Efficient ?
Answer: Tractable and Efficient and Invariant.
Ingredients: CMA-ES (Adaptive Encoding) and Rank SVM.
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 6/ 20
MotivationBackground
ACM-ES
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)Support Vector Machine (SVM)
Covariance Matrix Adaptation Evolution StrategyDecompose to understand
While CMA-ES by definition is CMA and ES, only recently thealgorithmic decomposition has been presented. 3
Algorithm 1 CMA-ES = Adaptive Encoding + ES1: xi ← m+ σNi(0, I), for i = 1 . . . λ2: fi ← f(Bxi), for i = 1 . . . λ3: if Evolution Strategy (ES) then4: σ ← σexp∝( success rate
expected success rate−1)
5: if Cumulative Step-Size Adaptation ES (CSA-ES) then
6: σ ← σexp∝(‖evolution path‖
‖expected evolution path‖−1)
7: B ←AECMA-Update(Bx1, . . . , Bxµ)
3N. Hansen (2008). "Adaptive Encoding: How to Render Search Coordinate System Invariant"
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 7/ 20
MotivationBackground
ACM-ES
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)Support Vector Machine (SVM)
Adaptive EncodingInspired by Principal Component Analysis (PCA)
Principal Component Analysis
Adaptive Encoding Update
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 8/ 20
MotivationBackground
ACM-ES
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)Support Vector Machine (SVM)
Support Vector Machine for ClassificationLinear Classifier
L1
2
3
L
L
b
w
w
w2/ || ||
-1
b
b+1
0b
xi
supportvector
Main Idea
Training Data:D = {(xi, yi)|xi ∈ IRp, yi ∈ {−1,+1}}ni=1
〈w, xi〉 ≥ b+ ε⇒ yi = +1;〈w, xi〉 ≤ b− ε⇒ yi = −1;Dividing by ε > 0:〈w, xi〉 − b ≥ +1⇒ yi = +1;〈w, xi〉 − b ≤ −1⇒ yi = −1;
Optimization Problem: Primal Form
Minimize{w, ξ}12 ||w||2 + C
∑ni=1 ξi
subject to: yi(〈w, xi〉 − b) ≥ 1− ξi, ξi ≥ 0
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 9/ 20
MotivationBackground
ACM-ES
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)Support Vector Machine (SVM)
Support Vector Machine for ClassificationLinear Classifier
L1
2
3
L
L
b
w
w
w2/ || ||
-1
b
b+1
0b
xi
supportvector
Optimization Problem: Dual Form
From Lagrange Theorem, instead of minimize F :Minimize{α,G}F −
∑
i αiGi
subject to: αi ≥ 0, Gi ≥ 0Leaving the details, Dual form:Maximize{α}
∑ni αi − 1
2
∑ni,j=1 αiαjyiyj 〈xi, xj〉
subject to: 0 ≤ αi ≤ C,∑n
i αiyi = 0
Properties
Decision Function:F (x) = sign(
∑ni αiyi 〈xi, x〉 − b)
The Dual form may be solved using standardquadratic programming solver.
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 10/ 20
MotivationBackground
ACM-ES
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)Support Vector Machine (SVM)
Support Vector Machine for ClassificationNon-Linear Classifier
ww, (x)F -b = +1
w, (x)F -b = -1
w2/ || ||
a) b) c)
support vector
xi
F
Non-linear classification with the "Kernel trick"
Maximize{α}
∑ni αi − 1
2
∑ni,j=1 αiαjyiyjK(xi, xj)
subject to: ai ≥ 0,∑n
i αiyi = 0,where K(x, x′) =def < Φ(x),Φ(x′) > is the Kernel functionDecision Function: F (x) = sign(
∑ni αiyiK(xi, x)− b)
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 11/ 20
MotivationBackground
ACM-ES
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)Support Vector Machine (SVM)
Support Vector Machine for ClassificationNon-Linear Classifier: Kernels
Polynomial: k(xi, xj) = (〈xi, xj〉+ 1)d
Gaussian or Radial Basis Function: k(xi, xj) = exp(‖xi−xj‖
2
2σ2 )
Hyperbolic tangent: k(xi, xj) = tanh(k 〈xi, xj〉+ c)
Examples for Polynomial (left) and Gaussian (right) Kernels:
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 12/ 20
MotivationBackground
ACM-ES
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)Support Vector Machine (SVM)
Ranking Support Vector Machine
Find F (x) which preserves the ordering of the training points.
w
L( 1r )
L( 2r)
xx
x
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 13/ 20
MotivationBackground
ACM-ES
Covariance Matrix Adaptation Evolution Strategy (CMA-ES)Support Vector Machine (SVM)
Ranking Support Vector Machine
Primal problem
Minimize{w, ξ}12 ||w||2 +
∑Ni=1 Ciξi
subject to{
〈 w,Φ(xi)− Φ(xi+1) 〉 ≥ 1− ξi (i = 1 . . . N − 1)ξi ≥ 0 (i = 1 . . . N − 1)
Dual problem
Maximize{α}
∑N−1i αi −
∑N−1i,j αijK(xi − xi+1, xj − xj+1))
subject to 0 ≤ αi ≤ Ci (i = 1 . . . N − 1)
Rank Surrogate Function in the case 1 rank = 1 point
F(x) = ∑N−1i=1 αi(K(xi, x)−K(xi+1, x))
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 14/ 20
MotivationBackground
ACM-ES
AlgorithmExperiments
Model LearningNon-separable Ellipsoid problem
K(xi, xj) = e−(xi−xj)
T (xi−xj)
2σ2 ; KC(xi, xj) = e−(xi−xj)
T C−1(xi−xj)
2σ2
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 15/ 20
MotivationBackground
ACM-ES
AlgorithmExperiments
Optimization or filtering?Don’t be too greedy
Optimization: Significant Potential Speed-Up if the surrogatemodel is global and accurate enoughFiltering: "Guaranteed" Speed-Up with the local surrogate model
0 100 200 300 400 500
0.2
0.3
0.4
0.5
0.6
Rank
Pro
babili
ty D
ensity
Results
~Retain with rank ,Prescreen
(λ )
Evaluate
(λ′ )Retain with rank ,λ ~λ
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 16/ 20
MotivationBackground
ACM-ES
AlgorithmExperiments
ACM-ES Optimization Loop
4. Generate pre-children and rank themaccording to surrogate fitness function.
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
X1X
2
-1 -0.5 0 0.5 1
-1
-0.5
0
0.5
1
X1
X2
A Select training points. B Build a surrogate model.
C Generate pre-children.D Select most promising children.
1. Select best training points.
3. Build a surrogate model using Rank SVM.
7. Add new training points andupdate parameters of CMA-ES.
λ′
k
2
[4]
. The change of coordinates, definedfrom the current covariance matrixand the current mean value , reads :
SurrogateModel
Rank-based
0 100 200 300 400 500
0.2
0.3
0.4
0.5
0.6
Rank
Pro
ba
bili
ty D
en
sity
Results
~Retain with rank ,Prescreen
(λ )
Evaluate
(λ′ )Retain with rank ,λ ~
5.
6.λ
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 17/ 20
MotivationBackground
ACM-ES
AlgorithmExperiments
ResultsSpeed-up
0 5 10 15 20 25 30 35 400
1
2
3
4
5
Problem Dimension
Speedup
Schwefel
Schwefel1/4
Rosenbrock
Noisy Sphere
Ackley
0
1
2
3
4
5
Ellipsoid
Function n λ λ′ e ACM-ES spu CMA-ES
Schwefel 10 10 3 801 36 3.3 2667 8720 12 4 3531 179 2.0 7042 17240 15 5 13440 281 1.7 22400 289
Schwefel 10 10 3 1774 37 4.1 7220 206
20 12 4 6138 82 2.5 15600 29440 15 5 22658 390 1.8 41534 466
Rosenbrock 10 10 3 2059 143 (0.95) 3.7 7669 691 (0.90)
20 12 4 11793 574 (0.75) 1.8 21794 152940 15 5 49750 2412 (0.9) 1.6 82043 3991
NoisySphere 10 10 3 0.15 766 90 (0.95) 2.7 2058 14820 12 4 0.11 1361 212 2.8 3777 12740 15 5 0.08 2409 120 2.9 7023 173
Ackley 10 10 3 892 28 4.1 3641 15420 12 4 1884 50 3.5 6641 10840 15 5 3690 80 3.3 12084 247
Ellipsoid 10 10 3 1628 95 3.8 6211 26420 12 4 8250 393 2.3 19060 50140 15 5 33602 548 2.1 69642 644
Rastrigin 5 140 70 23293 1374 (0.3) 0.5 12310 1098 (0.75)
1/4
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 18/ 20
MotivationBackground
ACM-ES
AlgorithmExperiments
ResultsLearning Time
100
101
102
103
101
102
103
10
Problem Dimension
Learn
ing tim
e (
ms) Slope=1.13
Cost of model learning/testing increasesquasi-linearly with on Sphere function:d
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 19/ 20
MotivationBackground
ACM-ES
AlgorithmExperiments
Summary
ACM-ES
ACM-ES is from 2 to 4 times faster on Uni-Modal Problems.
Invariant to rank-preserving transformation: Yes
The computation complexity (the cost of speed-up) is O(n)comparing to O(n6)
The source code is available online:http://www.lri.fr/~ilya/publications/ACMESppsn2010.zip
Open Questions
Extention to multi-modal optimization
Adaptation of selection pressure and surrogate model complexity
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag ACM-ES = CMA-ES + RankSVM 20/ 20
MotivationBackground
ACM-ES
AlgorithmExperiments
Summary
Thank you for your attention!
Questions?
MotivationBackground
ACM-ES
AlgorithmExperiments
Parameters
SVM Learning:
Number of training points: Ntraining = 30√d for all problems,
except Rosenbrock and Rastrigin, where Ntraining = 70√
(d)
Number of iterations: Niter = 50000√d
Kernel function: RBF function with σ equal to the averagedistance of the training points
The cost of constraint violation: Ci = 106(Ntraining − i)2.0
Offspring Selection Procedure:
Number of test points: Ntest = 500
Number of evaluated offsprings: λ′ = λ3
Offspring selection pressure parameters: σ2sel0 = 2σ2
sel1 = 0.8