From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP...
Transcript of From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP...
![Page 1: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/1.jpg)
From Gradient-Based to Evolutionary Optimization
Nikolaus Hansen Inria
CMAP, CNRS, Ecole Polytechnique, Institut Polytechnique de Paris
![Page 2: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/2.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
Outline
• Why is optimization difficult?
• From gradient descent to evolution strategy
• From first order (gradient descent) to second order (variable metric): CMA-ES
• Practical advice and code examples
2
…feel free to ask questions…
![Page 3: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/3.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
Objective
• in theory: convergence to the global optimum
• in practice: find a good solution iteratively as quickly as possible
3
f : ℝn → ℝ, x ↦ f(x)
minimize an objective function
![Page 4: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/4.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
Objective: Important Scenarios
• evaluating f is expensive and/or dominates the costshence quick means small number of � evaluations
• search space dimension n is large
• we can (inexpensively) evaluated the gradient of f � , while �
• we can parallelize the evaluations of f
f
f(x) ∈ ℝ ∇f(x) ∈ ℝn
4
minimize an objective function
f : ℝn → ℝ, x ↦ f(x)
• evaluating f is expensive and/or dominates the costshence quick means small number of � evaluations
• search space dimension n is large
• we can (inexpensively) evaluated the gradient of f � , while �
• we can parallelize the evaluations of f
f
f(x) ∈ ℝ ∇f(x) ∈ ℝn
![Page 5: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/5.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
Objective: Important Scenarios
• evaluating f is expensive and/or dominates the costshence quick means small number of � evaluations
• search space dimension n is large
• we can (inexpensively) evaluated the gradient of f � , while �
• we can parallelize the evaluations of f
f
f(x) ∈ ℝ ∇f(x) ∈ ℝn
5
minimize an objective function
f : ℝn → ℝ, x ↦ f(x)
• evaluating f is expensive and/or dominates the costshence quick means small number of � evaluations
• search space dimension n is large
• we can (inexpensively) evaluated the gradient of f � , while �
• we can parallelize the evaluations of f
f
f(x) ∈ ℝ ∇f(x) ∈ ℝn
![Page 6: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/6.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
Objective
6
f : ℝn → ℝ, x ↦ f(x)
minimize an objective function
What Makes an Optimization Problem Difficult?
![Page 7: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/7.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
• non-linear, non-quadraticon linear and quadratic functions
specialized search policies are available
• non-convexity
• dimensionality (size of search space) and non-separabilitydimension considerably larger than three with
dependencies between the variables
• multimodality
• ruggednesshigh frequency modality, non-smooth, discontinuous
• ill-conditioningvarying sensitivities,
worst case: non-smooth concave level sets
7
In any case, the objective function must be highly regular
What Makes an Optimization Problem Difficult?• non-linear, non-quadratic
on linear and quadratic functions specialized search policies are available
• non-convexity
• dimensionality (size of search space) and non-separabilitydimension considerably larger than three with
dependencies between the variables
• multimodality
• ruggednesshigh frequency modality, non-smooth, discontinuous
• ill-conditioningvarying sensitivities,
worst case: non-smooth concave level sets
![Page 8: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/8.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
dimensionality: On Separable Functions• Separable functions: for all � , for all � , � is independent of �
• Additively decomposable functions:
�
are separable
can be solved with � one-dimensional optimizations
x ∈ ℝn i
arg minxi
f(x) x
f(x) =n
∑i=1
gi(xi), x = (x1, …, xn)
n
8
![Page 9: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/9.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization 9
• non-linear, non-quadratic, non-convexon linear and quadratic functions
specialized search policies are available
• dimensionality (size of search space) and non-separability
dimension considerably larger than three with dependencies between the variables
• multimodality
• ruggednesshigh frequency modality, non-smooth, discontinuous
• ill-conditioningvarying sensitivities,
In any case, the objective function must be highly regular
What Makes an Optimization Problem Difficult?• non-linear, non-quadratic
on linear and quadratic functions specialized search policies are available
• non-convexity
• dimensionality (size of search space) and non-separabilitydimension considerably larger than three with
dependencies between the variables
• multimodality
• ruggednesshigh frequency modality, non-smooth, discontinuous
• ill-conditioningvarying sensitivities,
worst case: non-smooth concave level sets
• non-linear, non-quadraticon linear and quadratic functions
specialized search policies are available
• non-convexity
• dimensionality (size of search space) and non-separabilitydimension considerably larger than three with
dependencies between the variables
• multimodality
• ruggednesshigh frequency modality, non-smooth, discontinuous
• ill-conditioningvarying sensitivities,
worst case: non-smooth concave level sets
![Page 10: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/10.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary OptimizationNikolaus Hansen INRIA, University Paris-Sud
Section Through a 5-Dimensional Rugged Landscape
10
func
tion
valu
e � f(
x)
f : ℝn → ℝ, x ↦ f(x), n = 5
f : Rn ! R, x 7! f(x), n = 5
� (component of � )xi x
![Page 11: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/11.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization 11
In any case, the objective function must be highly regular
What Makes an Optimization Problem Difficult?• non-linear, non-quadratic
on linear and quadratic functions specialized search policies are available
• non-convexity
• dimensionality (size of search space) and non-separabilitydimension considerably larger than three with
dependencies between the variables
• multimodality
• ruggednesshigh frequency modality, non-smooth, discontinuous
• ill-conditioningvarying sensitivities,
worst case: non-smooth concave level sets
![Page 12: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/12.jpg)
!12
Flexible Muscle-Based Locomotion for Bipedal Creatures T. Geijtenbeek, M van de Panne, F van der Stappen
https://youtu.be/pgaEE27nsQw
![Page 13: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/13.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
Landscape of Continuous Search Methods
13
Gradient-based (Taylor, local)
• Conjugate gradient methods [Fletcher & Reeves 1964]
• Quasi-Newton methods (BFGS) [Broyden et al 1970]
Derivative-free optimization (DFO)
• Trust-region methods (NEWUOA, BOBYQA) [Powell 2006, 2009]
• Simplex downhill [Nelder & Mead 1965]
• Pattern search [Hooke & Jeeves 1961, Audet & Dennis 2006]
Stochastic (randomized) search methods
• Evolutionary algorithms (broader sense, continuous domain)
– Differential Evolution [Storn & Price 1997]
– Particle Swarm Optimization [Kennedy & Eberhart 1995]
– Evolution Strategies [Rechenberg 1965, Hansen & Ostermeier 2001]
• Simulated annealing [Kirkpatrick et al 1983]
• Simultaneous perturbation stochastic approximation (SPSA) [Spall 2000]
![Page 14: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/14.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization 14
Basic Approach: Gradient DescentThe gradient is the local direction of the maximal f increase
e1
e2
f(x) = const
⚐
−∇f(x)⚐⚐⚐
ℝ2
tangent space
here we
![Page 15: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/15.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization 15
Basic Approach: Gradient DescentThe gradient is the local direction of the maximal f increase
e1
e2
f(x) = const
⚐
−∇f(x)⚐⚐⚐
ℝ2
tangent space
optimal gradient step length
here we
![Page 16: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/16.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
Basic Approach: Gradient Descent
16
x ← x − σ∇f(x)
= x + σn
∑i=1
wiei
small test step
e1
e2
The gradient is the local direction of the maximal f increase
⚐
−∇f(x)⚐⚐⚐
ℝ2
∇f(x) = −n
∑i=1
wiei −wi = limδ→0f(x + δei) − f(x)
δ
![Page 17: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/17.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
Basic Approach: Gradient Descent
17
x ← x − σ∇f(x)
= x + σn
∑i=1
wiei
small test steppartial derivative �
∂f∂xi
(x)
e1
e2
The gradient is the local direction of the maximal f increase
⚐
−∇f(x)⚐⚐⚐
ℝ2
∇f(x) = −n
∑i=1
wiei −wi = limδ→0f(x + δei) − f(x)
δ
![Page 18: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/18.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
Basic Approach: Gradient Descent
18
∇f(x) ≈ −n
∑i=1
wiei −wi = limδ→0f(x + δei) − f(x)
δ
x ← x − σ∇f(x)
≈ x + σn
∑i=1
wiei
small test step
e1
e2
The gradient is the local direction of the maximal f increase
⚐
−∇f(x)⚐⚐⚐
ℝ2
X
![Page 19: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/19.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
Basic Approach: Approximated Gradient Descent
19
y1
y2
y3
⚐
−∇f(x)⚐⚐⚐
ℝ2
We modify the gradient equation…
∇f(x) ≈ −m
∑i=1
wiyi −wi = limδ→0f(x + δyi) − f(x)
δX
x ← x − σ∇f(x)
≈ x + σm
∑i=1
wiyi
![Page 20: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/20.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
∇f(x) ≈ −n
∑i=1
wiei −wi = limδ→0f(x + δyi) − f(x)
δX
Basic Approach: Approximated Gradient Descent
20
x ← x − σ∇f(x)
≈ x + σm
∑i=1
wiyi
yi ∼ 𝒩(0,I)
small test step
y1
y2
y3
⚐
−∇f(x)⚐⚐⚐
ℝ2
We modify the gradient equation…
Evolutionary Gradient Search (EGS) [Salmon 1998, Arnold & Salomon 2007]
![Page 21: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/21.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization 21
∇f(x) ≈ −n
∑i=1
wiei −wi ∝ ranki( f(x + δyi)) −m + 1
2
x ← x − σ∇f(x)
≈ x + σm
∑i=1
wiyi
small test step
yi ∼ 𝒩(0,I)
y1
y2
y3
Rank-Based Approximated Gradient Descent
⚐
−∇f(x)⚐⚐⚐
ℝ2
Using ranks introduces invariance to order-preserving � -transformations.f
Evolution Strategy (ES) [Rechenberg 1973, Schwefel 1981, Rudolph 1997]
![Page 22: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/22.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization 22
∇f(x) ≈ −n
∑i=1
wiei −wi =ln(ranki( f(x + δyi))) − ln m + 1
212 ∑m
j=1 ln( j)
x ← x − σ∇f(x)
≈ x + σm
∑i=1
wiyi
yi ∼ 𝒩(0,I)
y1
y2
y3
small test step
Rank-Based Approximated Gradient Descent
⚐
−∇f(x)⚐⚐⚐
ℝ2
Using ranks introduces invariance to order-preserving � -transformations.f
Evolution Strategy (ES) [Rechenberg 1973, Schwefel 1981, Rudolph 1997, Hansen & Ostermeier 2001]
![Page 23: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/23.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
Invariance from Rank-Based Weights
Three functions belonging to the same equivalence class
23
f = h f = g1 ¶ h f = g2 ¶ h
A rank-based search algorithm is invariant under thetransformation with any order preserving (strictly increasing) g.
Invariances make• observations meaningful as a rigorous notion of generalization
• algorithms predictable and/or ”robust”
![Page 24: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/24.jpg)
Comparing Experiments
Comparison to BFGS, NEWUOA, PSO and DEf convex quadratic, separable with varying condition number ↵
0
102
104
106
108
1010
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
Ellipsoid dimension 20, 21 trials, tolerance 1e−09, eval max 1e+07
Condition number
SP
1
NEWUOA BFGS DE2 PSO CMAES
BFGS (Broyden et al 1970)NEWUAO (Powell 2004)DE (Storn & Price 1996)PSO (Kennedy & Eberhart 1995)CMA-ES (Hansen & Ostermeier2001)
f (x) = g(xTHx) with
H diagonalg identity (for BFGS andNEWUOA)g any order-preserving = strictlyincreasing function (for all other)
SP1 = average number of objective function evaluations14 to reach the target functionvalue of g�1(10�9)
14Auger et.al. (2009): Experimental comparisons of derivative free optimization algorithms, SEA
Anne Auger & Nikolaus Hansen CMA-ES July, 2014 70 / 81 24
obje
ctiv
e fu
nctio
n ev
alua
tions
![Page 25: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/25.jpg)
Comparing Experiments
Comparison to BFGS, NEWUOA, PSO and DEf convex quadratic, non-separable (rotated) with varying condition number ↵
0
102
104
106
108
1010
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
Rotated Ellipsoid dimension 20, 21 trials, tolerance 1e−09, eval max 1e+07
Condition number
SP
1
NEWUOA BFGS DE2 PSO CMAES
BFGS (Broyden et al 1970)NEWUAO (Powell 2004)DE (Storn & Price 1996)PSO (Kennedy & Eberhart 1995)CMA-ES (Hansen & Ostermeier2001)
f (x) = g(xTHx) with
H fullg identity (for BFGS andNEWUOA)g any order-preserving = strictlyincreasing function (for all other)
SP1 = average number of objective function evaluations15 to reach the target functionvalue of g�1(10�9)
15Auger et.al. (2009): Experimental comparisons of derivative free optimization algorithms, SEA
Anne Auger & Nikolaus Hansen CMA-ES July, 2014 71 / 81 25
obje
ctiv
e fu
nctio
n ev
alua
tions
![Page 26: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/26.jpg)
Comparing Experiments
Comparison to BFGS, NEWUOA, PSO and DEf non-convex, non-separable (rotated) with varying condition number ↵
0
102
104
106
108
1010
10
1
10
2
10
3
10
4
10
5
10
6
10
7
10
Sqrt of sqrt of rotated ellipsoid dimension 20, 21 trials, tolerance 1e−09, eval max 1e+07
Condition number
SP
1
NEWUOA BFGS DE2 PSO CMAES
BFGS (Broyden et al 1970)NEWUAO (Powell 2004)DE (Storn & Price 1996)PSO (Kennedy & Eberhart 1995)CMA-ES (Hansen & Ostermeier2001)
f (x) = g(xTHx) with
H fullg : x 7! x1/4 (for BFGS andNEWUOA)g any order-preserving = strictlyincreasing function (for all other)
SP1 = average number of objective function evaluations16 to reach the target functionvalue of g�1(10�9)
16Auger et.al. (2009): Experimental comparisons of derivative free optimization algorithms, SEA
Anne Auger & Nikolaus Hansen CMA-ES July, 2014 72 / 81 26
obje
ctiv
e fu
nctio
n ev
alua
tions
![Page 27: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/27.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
From Gradient Search to Evolution Strategies
27
Gradient Search Evolution Strategy
Test Steps: unit vectors random vectors
dimension n any number > 1
small large
Weights: partial derivatives fixed rank-based
Realized Step Length: line search step-size control (non-trivial)
![Page 28: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/28.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
Problem Statement Ill-Conditioned Problems
Ill-Conditioned ProblemsCurvature of level setsConsider the convex-quadratic functionf (x) = 1
2(x�x⇤)T
H(x�x⇤) = 1
2P
i hi,i (xi�x⇤i )2+ 1
2P
i 6=j hi,j (xi�x⇤i )(xj�x⇤j )H is Hessian matrix of f and symmetric positive definite
gradient direction �f 0(x)T
Newton direction �H�1f 0(x)T
Ill-conditioning means squeezed level sets (high curvature).Condition number equals nine here. Condition numbers up to 1010
are not unusual in real world problems.
If H ⇡ I (small condition number of H) first order information (e.g. thegradient) is sufficient. Otherwise second order information (estimationof H
�1) is necessary.Anne Auger & Nikolaus Hansen CMA-ES July, 2014 11 / 81 28
In any case, the objective function must be highly regular
What Makes an Optimization Problem Difficult?
From Gradient-Based to Evolutionary Optimization
From Gradient-Based to Evolutionary Optimization
• non-linear, non-quadraticon linear and quadratic functions
specialized search policies are available
• non-convexity
• dimensionality (size of search space) and non-separabilitydimension considerably larger than three with
dependencies between the variables
• multimodality
• ruggednesshigh frequency modality, non-smooth, discontinuous
• ill-conditioningvarying sensitivities,
worst case: non-smooth concave level sets
![Page 29: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/29.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
Title TextProblem Statement Ill-Conditioned Problems
Ill-Conditioned ProblemsCurvature of level setsConsider the convex-quadratic functionf (x) = 1
2(x�x⇤)T
H(x�x⇤) = 1
2P
i hi,i (xi�x⇤i )2+ 1
2P
i 6=j hi,j (xi�x⇤i )(xj�x⇤j )H is Hessian matrix of f and symmetric positive definite
gradient direction �f 0(x)T
Newton direction �H�1f 0(x)T
Ill-conditioning means squeezed level sets (high curvature).Condition number equals nine here. Condition numbers up to 1010
are not unusual in real world problems.
If H ⇡ I (small condition number of H) first order information (e.g. thegradient) is sufficient. Otherwise second order information (estimationof H
�1) is necessary.Anne Auger & Nikolaus Hansen CMA-ES July, 2014 11 / 81 29
Ill-Conditioned ProblemsProblem Statement Ill-Conditioned Problems
Ill-Conditioned ProblemsCurvature of level setsConsider the convex-quadratic functionf (x) = 1
2(x�x⇤)T
H(x�x⇤) = 1
2P
i hi,i (xi�x⇤i )2+ 1
2P
i 6=j hi,j (xi�x⇤i )(xj�x⇤j )H is Hessian matrix of f and symmetric positive definite
gradient direction �f 0(x)T
Newton direction �H�1f 0(x)T
Ill-conditioning means squeezed level sets (high curvature).Condition number equals nine here. Condition numbers up to 1010
are not unusual in real world problems.
If H ⇡ I (small condition number of H) first order information (e.g. thegradient) is sufficient. Otherwise second order information (estimationof H
�1) is necessary.Anne Auger & Nikolaus Hansen CMA-ES July, 2014 11 / 81
![Page 30: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/30.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
∇f(x) ≈ −n
∑i=1
wiei −wi =ln(ranki( f(x + δyi))) − ln m + 1
212 ∑m
j=1 ln( j)
30
x ← x − σ∇f(x)
≈ x + σm
∑i=1
wiyi
yi ∼ 𝒩(0,
y1
y2
y3
Rank-Based Approximated Gradient Descent
⚐
−∇f(x)⚐⚐⚐
ℝ2
yi ∼ 𝒩(0,I)
Evolution Strategy (ES) [Rechenberg 1973, Schwefel 1981, Rudolph 1997]
Using ranks introduces invariance to order-preserving � -transformations.f
![Page 31: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/31.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
∇f(x) ≈ −n
∑i=1
wiei −wi =ln(ranki( f(x + δyi))) − ln m + 1
212 ∑m
j=1 ln( j)
31
x ← x − σ∇f(x)
≈ x + σm
∑i=1
wiyi
yi ∼ 𝒩(0,C)
y1
y2
y3
Rank-Based Approximated Gradient Descent
⚐
−∇f(x)⚐⚐⚐
ℝ2
variable metric, updated to estimate �H−1
yi ∼ 𝒩(0,
Covariance Matrix Adaptation Evolution Strategy (CMA-ES) [Hansen & Ostermeier 2001, Hansen et al 2003]
![Page 32: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/32.jpg)
Nikolaus Hansen, Inria, Institute Polytechnique de Paris (IP Paris) A Global Surrogate Assisted CMA-ES
𝖫𝖾𝗍 m ∈ ℝn, σ > 0, C = In, y0 = 0xk ∼ 𝒩(m , σ2C) = m + σ 𝒩(0,C) ∈ ℝn, k = 1…λ
yk =xpermuteλ(k) − m
σsorted by fitness yk ∼ 𝒩(0,C)
m ← m + cm σμ
∑k=1
wkyk, cm ≈μ
∑k=1
wk ≈ 1, μ ≈ λ /2
y0 ← (1 − cc) y0 + cc(2 − cc)μw
μ
∑k=1
wkyk μw =(∑μ
i=1 wk)2
∑μi=1 wk
2
C ← C + cμ
λ
∑k=0
wk(yky⊤k − C), cμ ≈ λ /n2,
λ
∑k=0
wk ≈ 0
σ ← σ × exp (…)
CMA-ES
32
population size
![Page 33: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/33.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization 33
David Ha (2017). A Visual Guide to Evolution Strategies, http://blog.otoro.net/2017/10/29/visual-evolution-strategies/
![Page 34: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/34.jpg)
Nikolaus Hansen, Inria, Institute Polytechnique de Paris (IP Paris) A Global Surrogate Assisted CMA-ES
CMA-ES
• Strive to sample the optimal (multi-variate) Gaussian distribution at any given iteration
• “optimal” mean (best estimate of the optimum)given the available information
• optimal covariance matrix �given the available information
• optimal step-size �given the covariance matrix
• A natural gradient update of mean and covariance matrixprovides a theoretical framework/justification [JMLR 18(18), 2017]
�
• Convergence speed is almost independent of the number of samples (not � )
property of multi-recombinative Evolution Strategies
C
σ1. Sample distribution P (x|◊t) æ x1, . . . , x⁄ œ X2. Evaluate samples on f æ f(x1), . . . , f(x⁄)3. Update parameters
◊t+1 = ◊t + ÷1
Z(⁄)
⁄ÿ
k=1
!⁄/2 ≠ rank(f(xk))
"ÂÒ◊ ln p(xk|◊)--◊=◊t
≫ n
34
Covariance Matrix Adaptation Evolution Strategy
![Page 35: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/35.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
Practical Advice
35
![Page 36: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/36.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
Approaching an Unknown Optimization Problem• Objective formulation
for example � and � have the same optimal (minimal) solution but may be very differently “optimizable”
• Problem/variable encodingfor example log scale vs linear scale vs quadratic transformation
• Create section plots (� vs � on a line)one-dimensional grid search is cheap, may reveal ill-conditioning or multi-modality
• Try to locally improve a given (good) solution
• Start local search from different initial solutionsEnding up always in different solutions? Or always in the same?
• Apply “global search” setting
• see also http://cma.gforge.inria.fr/cmaes_sourcecode_page.html#practical
∑i x2i ∑i |xi |
f(x) x
36
![Page 37: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/37.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization 37
![Page 38: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/38.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization 38
Python Example in Jupyter-Lab
f(x) =n
∑i=1
αix2i
![Page 39: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/39.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization 39
f(x) =n
∑i=1
αix2i
![Page 40: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/40.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization 40
![Page 41: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/41.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization 41
![Page 42: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/42.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization 42
On Object-Oriented Programming of Optimizers [Collette et al 2010]
A Transparent Interface
![Page 43: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/43.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization 43
![Page 44: From Gradient-Based to Evolutionary Optimization · 2019. 7. 29. · Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization Landscape of Continuous Search](https://reader033.fdocuments.net/reader033/viewer/2022060922/60ad7d809034e307332a1e86/html5/thumbnails/44.jpg)
Nikolaus Hansen, Inria, IP Paris From Gradient-Based to Evolutionary Optimization
Thank You
44