Optimizationusingsurrogatemodels - Inria
Transcript of Optimizationusingsurrogatemodels - Inria
R. Duvigneau - Optimization using surrogate models 1
May 27th 2015
Optimization using surrogate modelsR. Duvigneau
Outline
I PrinciplesI Gaussian Process modelsI Efficient Global Optimization approachI Noisy observationsI A realistic application
R. Duvigneau - Optimization using surrogate models 2
R. Duvigneau - Optimization using surrogate models 3
Principles
Why surrogate modeling ?
I For some applications, the evaluation of the cost function is computationallyprohibitive
I For some applications, the evaluation of the gradient is too complex
→ Replace the complex and/or expensive evaluation process by a simpler andcheaper model
R. Duvigneau - Optimization using surrogate models 4
Surrogate models for optimization
Main idea:I Build a posteriori a database of cost function values (xi , fi )i=1,...,N
I Construct a surrogate model f̂ using this database (polynomial interpolation,regression, artificial neural networks, support vector machine, radial basisfunctions, etc)
I Solve the optimization problem using the surrogate model f̂I Possibly enrich the database with interesting data
→ Converge to the minimum of f by using evaluations of f̂ as most as possible
R. Duvigneau - Optimization using surrogate models 5
R. Duvigneau - Optimization using surrogate models 6
Gaussian Process models (Kriging)
Principle of Kriging
Main idea:I Set of observed values FN = {f1, f2, . . . , fN} at some points
XN = {x1, x2, . . . , xN} xi ∈ Rd
I Assumption: these values correspond to a realization of a multivariate GaussianProcess:
p(FN|XN) =exp
(− 1
2FN>C−1N FN
)√
(2π)N det(CN )
I CN is the N × N covariance matrixI Its coefficients are expressed in terms of a correlation function:
Ckl = C(xk , xl ,Θ)where Θ are calibrated according to a likelihood maximizationprinciple
R. Duvigneau - Optimization using surrogate models 7
Principle of Kriging
I The density for the function value fN+1 at any new point xN+1 is:
p(FN+1|XN+1) =exp
(− 1
2FN+1>C−1N+1FN+1
)√
(2π)N+1 det(CN+1)
I According to conditional probability rule p(A|B) = p(A,B)/p(B):
p(fN+1|FN ,XN+1) =p(FN+1|XN+1)
p(FN |XN )
I One obtains:
p(fN+1|FN ,XN+1) =exp
(− 1
2 (FN+1>C−1N+1FN+1 − FN
>C−1N FN))
√2Π
det(CN+1)
det(CN )
R. Duvigneau - Optimization using surrogate models 8
Principle of Kriging
I CN+1 can be written as:
CN+1 =
[CN kN+1
k>N+1 κ
]I It can be shown that:
C−1N+1 =
[M m
m> µ
]with M = C−1N + 1
µmm>, m = µC−1N kN+1 and µ = (κ− k>N+1C−1N kN+1)−1
I and that:det(CN+1) = det(CN )
1µ
R. Duvigneau - Optimization using surrogate models 9
Principle of Kriging
I Finally, one obtains:
p(fN+1|FN ,XN+1) =1
√2πσ̂fN+1
exp[−
(fN+1 − f̂N+1)2
2σ̂2fN+1
]
withf̂N+1 = k>C−1N FN σ̂2fN+1
= κ− k>C−1N k,
R. Duvigneau - Optimization using surrogate models 10
R. Duvigneau - Optimization using surrogate models 11
Efficient Global Optimization
Algorithm overview
An iterative optimization strategy based on Gaussian Processes:I Build a posteriori a database of cost function values (xi , fi )i=1,...,N
I Construct a Gaussian Process model f̂ and variance σ̂2
I Determine (x?p )p=1,...,P that maximize or minimize a set of merit functionsI Enrich the database with (x?p , f (x?p ))p=1,...,P
R. Duvigneau - Optimization using surrogate models 12
Merit functionsSome classical merit functions:
I Lower Bound (minimization):
jLB(x) = f̂ (x)− ρσ̂(x)
with ρ user defined parameter;I Probability of Improvement (maximize):
fmin being the best value found so far and T ≤ fmin a target value, the probabilityto obtain a value lower than T is:
jPI (x) = φ
(T − f̂ (x)
σ̂(x)
)
where φ is the cumulative distribution functionI Expected Improvement (maximize):
An improvement is defined as I(x) = max(fmin − f (x), 0). Its expected value is:
jEI (x) =
∫ I=∞
I=0I(
1√2Πσ̂(x)
exp(−
(jmin − I − f̂ (x))2
2σ̂2(x)
))dI
= σ̂(x) (uφ(u) +N (u))
where u = jmin−f̂ (x)σ̂(x)
.
R. Duvigneau - Optimization using surrogate models 13
R. Duvigneau - Optimization using surrogate models 14
Illustrations
1D function
f (x) = 12
(sin(20x)1+x + 3x3 cos(5x) + 10(x − 0.5)2 − 0.6
)R. Duvigneau - Optimization using surrogate models 15
1D function
f (x) = 12
(sin(20x)1+x + 3x3 cos(5x) + 10(x − 0.5)2 − 0.6
)
R. Duvigneau - Optimization using surrogate models 16
1D function
f (x) = 12
(sin(20x)1+x + 3x3 cos(5x) + 10(x − 0.5)2 − 0.6
)
R. Duvigneau - Optimization using surrogate models 17
1D function
f (x) = 12
(sin(20x)1+x + 3x3 cos(5x) + 10(x − 0.5)2 − 0.6
)
R. Duvigneau - Optimization using surrogate models 18
1D function
f (x) = 12
(sin(20x)1+x + 3x3 cos(5x) + 10(x − 0.5)2 − 0.6
)
R. Duvigneau - Optimization using surrogate models 19
Branin function
f (x , y) =
(y −
5.14π2
x2 +5π
x − 6)2
+ 10(1−
18π
)cos(x) + 10
R. Duvigneau - Optimization using surrogate models 20
Branin function
I Initial design of experiments including 5 pointsI Initial Gaussian ProcessI Branin function (left) and Gaussian model obtained and evaluations (right)
R. Duvigneau - Optimization using surrogate models 21
Branin function
I 20 iterations (25 evaluations)I Branin function (left) and Gaussian model obtained and evaluations (right)
R. Duvigneau - Optimization using surrogate models 22
R. Duvigneau - Optimization using surrogate models 23
Gaussian Process with noisy observations
Necessity of noise treatment
The evaluations are often subject to uncertainty:I Computations are performed with a finite accuracy (spatial, temporal, etc)I Computations can depend on hardware or software configurations
R. Duvigneau - Optimization using surrogate models 24
Impact of noise (it 0)
f (x) = 12
(sin(20x)1+x + 3x3 cos(5x) + 10(x − 0.5)2 − 0.6
)+N (0, 0.1)
R. Duvigneau - Optimization using surrogate models 25
Impact of noise (it 4)
f (x) = 12
(sin(20x)1+x + 3x3 cos(5x) + 10(x − 0.5)2 − 0.6
)+N (0, 0.1)
R. Duvigneau - Optimization using surrogate models 26
Impact of noise (it 14)
f (x) = 12
(sin(20x)1+x + 3x3 cos(5x) + 10(x − 0.5)2 − 0.6
)+N (0, 0.1)
R. Duvigneau - Optimization using surrogate models 27
Modification of Gaussian Process model
I Assumption: evaluation fi is an observation of the function value f (xi ) plus aGaussian noise N (0, τ2i )
I Modification of the model:
f̂N+1 = k>N+1(CN + ∆)−1FN , σ2fN+1= κ− k>N+1(CN + ∆)−1kN+1.
with ∆ = diag(τ2i ), i ∈ [1,N]
I Non-interpolating model
R. Duvigneau - Optimization using surrogate models 28
Optimization with noisy observation (It 0)
f (x) = 12
(sin(20x)1+x + 3x3 cos(5x) + 10(x − 0.5)2 − 0.6
)+N (0, 0.1)
R. Duvigneau - Optimization using surrogate models 29
Optimization with noisy observation (It 4)
f (x) = 12
(sin(20x)1+x + 3x3 cos(5x) + 10(x − 0.5)2 − 0.6
)+N (0, 0.1)
R. Duvigneau - Optimization using surrogate models 30
Optimization with noisy observation (It 14)
f (x) = 12
(sin(20x)1+x + 3x3 cos(5x) + 10(x − 0.5)2 − 0.6
)+N (0, 0.1)
R. Duvigneau - Optimization using surrogate models 31
R. Duvigneau - Optimization using surrogate models 32
A realistic application
Optimization of oscillatory jet for stall control
I 2D Navier-Stokes with turbulencemodel
I Oscillatory jet at 12% chordI Optimization of frequencyI Maximization of time-averaged lift
0 10 20 30 40 50non-dimensional time
1.4
1.6
1.8
2
lift c
oeffi
cien
t
Freq: 0.5Freq: 1.0Freq: 1.5Freq: 2.0
lift history for different frequencies
without actuation
(too) high-frequency actuation
R. Duvigneau - Optimization using surrogate models 33
Gaussian Process-based optimization with adaptive simulation time
I Gaussian Process model with noisyobservations
I Variance estimated using a movingaverage procedure
I Enrichment driven by ExpectedImprovement criterion
I Adaptive truncation criteriondepending on the improvementobtained and variance observed so far
optimal frequency
0 0.5 1 1.5 2actuation frequency
1.4
1.5
1.6
1.7
1.8
1.9
time-
aver
aged
lift
coef
ficie
nt
0
0.01
0.02
0.03
0.04
0.05
expe
cted
imrp
ovem
ent
database (it=0)model expectationmodel +/- 3 std. dev.expect. impr.
initial sampling
0 0.5 1 1.5 2actuation frequency
1.4
1.5
1.6
1.7
1.8
1.9
time-
aver
aged
lift
coef
ficie
nt
0
1x10-3
2x10-3
3x10-3
4x10-3
5x10-3
expe
cted
imrp
ovem
ent
database (it=7)model expectationmodel +/- 3 std. dev.expect. impr.
iteration 7
R. Duvigneau - Optimization using surrogate models 34
References
I D. Jones. A taxonomy of global optimization methods based on responsesurfaces. Journal of Global Optimization, 21:345–383, 2001.
I D. J. MacKay. Bayesian interpolation. Neural Computation, 4(415–447), 1991.
R. Duvigneau - Optimization using surrogate models 35