Optimizationusingsurrogatemodels - Inria

R. Duvigneau - Optimization using surrogate models 1

May 27th 2015

Optimization using surrogate modelsR. Duvigneau

Outline

I PrinciplesI Gaussian Process modelsI Efficient Global Optimization approachI Noisy observationsI A realistic application



Principles

Why surrogate modeling ?

I For some applications, the evaluation of the cost function is computationallyprohibitive

I For some applications, the evaluation of the gradient is too complex

→ Replace the complex and/or expensive evaluation process by a simpler andcheaper model


Surrogate models for optimization

Main idea:I Build a posteriori a database of cost function values (xi , fi )i=1,...,N

I Construct a surrogate model f̂ using this database (polynomial interpolation,regression, artificial neural networks, support vector machine, radial basisfunctions, etc)

I Solve the optimization problem using the surrogate model f̂I Possibly enrich the database with interesting data

→ Converge to the minimum of f by using evaluations of f̂ as most as possible



Gaussian Process models (Kriging)

Principle of Kriging

Main idea:I Set of observed values FN = {f1, f2, . . . , fN} at some points

XN = {x1, x2, . . . , xN} xi ∈ Rd

I Assumption: these values correspond to a realization of a multivariate GaussianProcess:

p(FN|XN) =exp

(− 1

2FN>C−1N FN

)√

(2π)N det(CN )

I CN is the N × N covariance matrixI Its coefficients are expressed in terms of a correlation function:

Ckl = C(xk , xl ,Θ)where Θ are calibrated according to a likelihood maximizationprinciple



I The density for the function value fN+1 at any new point xN+1 is:

p(FN+1|XN+1) =exp

(− 1

2FN+1>C−1N+1FN+1

)√

(2π)N+1 det(CN+1)

I According to conditional probability rule p(A|B) = p(A,B)/p(B):

p(fN+1|FN ,XN+1) =p(FN+1|XN+1)

p(FN |XN )

I One obtains:

p(fN+1|FN ,XN+1) =exp

(− 1

2 (FN+1>C−1N+1FN+1 − FN

>C−1N FN))

√2Π

det(CN+1)

det(CN )



I CN+1 can be written as:

CN+1 =

[CN kN+1

k>N+1 κ

]I It can be shown that:

C−1N+1 =

[M m

m> µ

]with M = C−1N + 1

µmm>, m = µC−1N kN+1 and µ = (κ− k>N+1C−1N kN+1)−1

I and that:det(CN+1) = det(CN )

1µ



I Finally, one obtains:

p(fN+1|FN ,XN+1) =1

√2πσ̂fN+1

exp[−

(fN+1 − f̂N+1)2

2σ̂2fN+1

]

withf̂N+1 = k>C−1N FN σ̂2fN+1

= κ− k>C−1N k,



Efficient Global Optimization

Algorithm overview

An iterative optimization strategy based on Gaussian Processes:I Build a posteriori a database of cost function values (xi , fi )i=1,...,N

I Construct a Gaussian Process model f̂ and variance σ̂2

I Determine (x?p )p=1,...,P that maximize or minimize a set of merit functionsI Enrich the database with (x?p , f (x?p ))p=1,...,P


Merit functionsSome classical merit functions:

I Lower Bound (minimization):

jLB(x) = f̂ (x)− ρσ̂(x)

with ρ user defined parameter;I Probability of Improvement (maximize):

fmin being the best value found so far and T ≤ fmin a target value, the probabilityto obtain a value lower than T is:

jPI (x) = φ

(T − f̂ (x)

σ̂(x)

)

where φ is the cumulative distribution functionI Expected Improvement (maximize):

An improvement is defined as I(x) = max(fmin − f (x), 0). Its expected value is:

jEI (x) =

∫ I=∞

I=0I(

1√2Πσ̂(x)

exp(−

(jmin − I − f̂ (x))2

2σ̂2(x)

))dI

= σ̂(x) (uφ(u) +N (u))

where u = jmin−f̂ (x)σ̂(x)

.



Illustrations

1D function

f (x) = 12

(sin(20x)1+x + 3x3 cos(5x) + 10(x − 0.5)2 − 0.6

)R. Duvigneau - Optimization using surrogate models 15

1D function

f (x) = 12

(sin(20x)1+x + 3x3 cos(5x) + 10(x − 0.5)2 − 0.6

)


Branin function

f (x , y) =

(y −

5.14π2

x2 +5π

x − 6)2

+ 10(1−

18π

)cos(x) + 10


Branin function

I Initial design of experiments including 5 pointsI Initial Gaussian ProcessI Branin function (left) and Gaussian model obtained and evaluations (right)


Branin function

I 20 iterations (25 evaluations)I Branin function (left) and Gaussian model obtained and evaluations (right)



Gaussian Process with noisy observations

Necessity of noise treatment

The evaluations are often subject to uncertainty:I Computations are performed with a finite accuracy (spatial, temporal, etc)I Computations can depend on hardware or software configurations


Impact of noise (it 0)

f (x) = 12

(sin(20x)1+x + 3x3 cos(5x) + 10(x − 0.5)2 − 0.6

)+N (0, 0.1)



f (x) = 12

(sin(20x)1+x + 3x3 cos(5x) + 10(x − 0.5)2 − 0.6

)+N (0, 0.1)


Modification of Gaussian Process model

I Assumption: evaluation fi is an observation of the function value f (xi ) plus aGaussian noise N (0, τ2i )

I Modification of the model:

f̂N+1 = k>N+1(CN + ∆)−1FN , σ2fN+1= κ− k>N+1(CN + ∆)−1kN+1.

with ∆ = diag(τ2i ), i ∈ [1,N]

I Non-interpolating model


Optimization with noisy observation (It 0)

f (x) = 12

(sin(20x)1+x + 3x3 cos(5x) + 10(x − 0.5)2 − 0.6

)+N (0, 0.1)



f (x) = 12

(sin(20x)1+x + 3x3 cos(5x) + 10(x − 0.5)2 − 0.6

)+N (0, 0.1)



A realistic application

Optimization of oscillatory jet for stall control

I 2D Navier-Stokes with turbulencemodel

I Oscillatory jet at 12% chordI Optimization of frequencyI Maximization of time-averaged lift

0 10 20 30 40 50non-dimensional time

1.4

1.6

1.8

2

lift c

oeffi

cien

t

Freq: 0.5Freq: 1.0Freq: 1.5Freq: 2.0

lift history for different frequencies

without actuation

(too) high-frequency actuation


Gaussian Process-based optimization with adaptive simulation time

I Gaussian Process model with noisyobservations

I Variance estimated using a movingaverage procedure

I Enrichment driven by ExpectedImprovement criterion

I Adaptive truncation criteriondepending on the improvementobtained and variance observed so far

optimal frequency

0 0.5 1 1.5 2actuation frequency

1.4

1.5

1.6

1.7

1.8

1.9

time-

aver

aged

lift

coef

ficie

nt

0

0.01

0.02

0.03

0.04

0.05

expe

cted

imrp

ovem

ent

database (it=0)model expectationmodel +/- 3 std. dev.expect. impr.

initial sampling

0 0.5 1 1.5 2actuation frequency

1.4

1.5

1.6

1.7

1.8

1.9

time-

aver

aged

lift

coef

ficie

nt

0

1x10-3

2x10-3

3x10-3

4x10-3

5x10-3

expe

cted

imrp

ovem

ent

database (it=7)model expectationmodel +/- 3 std. dev.expect. impr.

iteration 7


References

I D. Jones. A taxonomy of global optimization methods based on responsesurfaces. Journal of Global Optimization, 21:345–383, 2001.

I D. J. MacKay. Bayesian interpolation. Neural Computation, 4(415–447), 1991.


Optimizationusingsurrogatemodels - Inria

Documents

Transcript of Optimizationusingsurrogatemodels - Inria