Numerical optimization and adjoint state methods for large ... · Numerical optimization and...
Transcript of Numerical optimization and adjoint state methods for large ... · Numerical optimization and...
![Page 1: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/1.jpg)
Numerical optimization and adjoint state methods for large-scalenonlinear least-squares problems
Ludovic Metivier1 and the SEISCOPE group1,2,3
1 LJK, Univ. Grenoble Alpes, CNRS, Grenoble, France2 ISTerre, Univ. Grenoble Alpes, CNRS, Grenoble, France
3 Geoazur, Univ. Nice Sophia Antipolis, CNRS, Valbonne, France
http://seiscope2.osug.fr
Joint Inversion Summer School Barcelonnette, France, 15-19, June, 2015
SEISCOPE
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 1 / 31
![Page 2: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/2.jpg)
Outline
1 Numerical optimization methods for large-scale smooth unconstrained minimization problems
2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation
3 Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 2 / 31
![Page 3: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/3.jpg)
Outline
1 Numerical optimization methods for large-scale smooth unconstrained minimization problemsNumerical optimization for nonlinear least-squares problemsSteepest descent methodNewton methodQuasi-Newton methodsWhat about the nonlinear conjugate gradient?Summary
2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation
3 Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 3 / 31
![Page 4: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/4.jpg)
Outline
1 Numerical optimization methods for large-scale smooth unconstrained minimization problemsNumerical optimization for nonlinear least-squares problemsSteepest descent methodNewton methodQuasi-Newton methodsWhat about the nonlinear conjugate gradient?Summary
2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation
3 Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 4 / 31
![Page 5: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/5.jpg)
Numerical optimization for inverse problems in geosciences
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 5 / 31
Nonlinear least-squares problem
In this presentation, we will consider the inverse problem
minm
f (m) =1
2‖dcal (m)− dobs‖2
where
dobs are data associated with a physical phenomenon and a measurementprotocol: seismic waves, electromagnetic waves, gravimeter, ultrasounds,x-ray,...
m is the parameter of interest we want to reconstruct: P and S-wavevelocities, density, anisotropy parameters, attenuation, or a collection of theseparameters
dcal (m) are synthetic data, computed numerically, often through the solutionof partial differential equations
f (m) is a misfit function which measures the discrepancy between observedand synthetic data
![Page 6: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/6.jpg)
Numerical optimization for inverse problems in geosciences
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 5 / 31
Nonlinear least-squares problem
In this presentation, we will consider the inverse problem
minm
f (m) =1
2‖dcal (m)− dobs‖2
Of course, in joint inversion, we may consider a misfit function as a sum of thesefunctions associated with different measurements: the theory remains the same
![Page 7: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/7.jpg)
Numerical optimization for inverse problems in geosciences
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 5 / 31
Nonlinear least-squares problem
In this presentation, we will consider the inverse problem
minm
f (m) =1
2‖dcal (m)− dobs‖2
We will also assume that f (m) is a continuous and twice differentiablefunction: the gradient is continuous, and the second-order derivatives matrixH(m) (Hessian matrix) is also continuous
The methods we are going to review are local optimization method: we putaside the global optimization methods and stochastic/genetic algorithmswhich are unaffordable for large-scale optimization problems
All the methods we review are presented in (Nocedal and Wright, 2006)
![Page 8: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/8.jpg)
Local methods to find the minimum of a function
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 6 / 31
Necessary condition
To detect the extremum of a differentiable function f (m), we have the necessarycondition
∇f (m) = 0
![Page 9: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/9.jpg)
Local methods to find the minimum of a function
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 6 / 31
Necessary condition
To detect the extremum of a differentiable function f (m), we have the necessarycondition
∇f (m) = 0
This is not enough: is it a minimum or maximum?
![Page 10: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/10.jpg)
Local methods to find the minimum of a function
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 6 / 31
Necessary and sufficient conditions
In a local minimum, the function is locally convex: the Hessian is definite positive
∇f (m) = 0, ∇2f (m) > 0
![Page 11: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/11.jpg)
Local methods to find the minimum of a function
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 6 / 31
Practical implementation
However, this not what we implement in practice. From an initial guess m0, asequence mk is built such that
the limit m∗ should satisfy the necessary condition
∇f (m∗) = 0
at each iterationf (mk+1) < f (mk )
We have to guarantee the decrease of the misfit function at each iteration
![Page 12: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/12.jpg)
Outline
1 Numerical optimization methods for large-scale smooth unconstrained minimization problemsNumerical optimization for nonlinear least-squares problemsSteepest descent methodNewton methodQuasi-Newton methodsWhat about the nonlinear conjugate gradient?Summary
2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation
3 Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 7 / 31
![Page 13: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/13.jpg)
How to find the zero of the gradient: first-order method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 8 / 31
The fixed-point method
We want to find m∗ such that
∇f (m∗) = 0 (1)
The simplest method is to apply the fixed point iteration on I − α∇f
mk+1 = (I − α∇f )mk = mk − α∇f (mk ), α ∈ R+∗
At convergence we should have
m∗ = (I − α∇f )m∗ = m∗ − α∇f (m∗) =⇒ ∇f (m∗) = 0
![Page 14: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/14.jpg)
How to find the zero of the gradient: first-order method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 8 / 31
Ensuring the decrease of the misfit function
We need to ensuref (mk+1) < f (mk )
We havef (m + dm) = f (m) +∇f (m)T dm + o(||dm||2)
Therefore, ifmk+1 = mk − α∇f (mk ),
we have
f (mk+1) = f (mk − α∇f (mk )) = f (mk )− α∇f (mk )T∇f (mk ) + α2o(||∇f (mk )‖2
that isf (mk+1) = f (mk )− α||∇f (mk )T ||2 + α2o(||∇f (mk )‖2
Therefore for α small enough, we can ensure the decrease condition
![Page 15: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/15.jpg)
How to find the zero of the gradient: first-order method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 8 / 31
Fixed point on I − αF = steepest-descent method
To summarize, using the fixed-point iteration on I − α∇f (m) yields the sequence
mk+1 = mk − α∇f (mk ),
We have just rediscovered the steepest-decent iteration
![Page 16: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/16.jpg)
Outline
1 Numerical optimization methods for large-scale smooth unconstrained minimization problemsNumerical optimization for nonlinear least-squares problemsSteepest descent methodNewton methodQuasi-Newton methodsWhat about the nonlinear conjugate gradient?Summary
2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation
3 Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 9 / 31
![Page 17: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/17.jpg)
How to find the zero of the gradient: second-order method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 10 / 31
Newton method: graphical interpretation
A faster (quadratic) convergence can be achieved for finding the zero ∇f (m) if weuse the Newton method.
![Page 18: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/18.jpg)
How to find the zero of the gradient: second-order method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 10 / 31
Newton method
We approximate ∇f (mk+1) as its first-order Taylor development mk
∇f (mk+1) ' ∇f (mk ) +
„∂∇f (mk )
∂mk
«(mk+1 −mk ) (1)
We look for the zero of this approximation
∇f (mk ) +
„∂∇f (mk )
∂mk
«(mk+1 −mk ) = 0 (2)
which yields
mk+1 = mk −„∂∇f (mk )
∂mk
«−1
∇f (mk )
![Page 19: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/19.jpg)
How to find the zero of the gradient: second-order method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 10 / 31
Notations
In the following, we use the notation
∂∇f (mk )
∂mk= H(mk ) (1)
for the Hessian operator (second-order derivatives of the misfit function).
![Page 20: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/20.jpg)
How to find the zero of the gradient: second-order method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 10 / 31
Decrease of the misfit function
Do we ensure the decrease of the misfit function?
f (mk+1) = f (mk − αkH(mk )−1∇f (mk ))= f (mk )− αk∇f (mk )T H(mk )−1∇f (mk ) + α2
ko(||H(mk )−1∇f (mk )‖2
We have∇f (mk )T H(mk )−1∇f (mk ) > 0 (1)
if and only if H(mk )−1 > 0.
![Page 21: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/21.jpg)
How to find the zero of the gradient: second-order method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 10 / 31
Difficulties
The Hessian operator may not be necessary strictly positive: the functionf (m) may not be strictly convex as the forward problem is nonlinear (f (m) isnot quadratic)
For large-scale application, how to compute H(m) and its inverse H(m−1)?
![Page 22: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/22.jpg)
Outline
1 Numerical optimization methods for large-scale smooth unconstrained minimization problemsNumerical optimization for nonlinear least-squares problemsSteepest descent methodNewton methodQuasi-Newton methodsWhat about the nonlinear conjugate gradient?Summary
2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation
3 Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 11 / 31
![Page 23: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/23.jpg)
The l-BFGS method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 12 / 31
Principle
l-BFGS method (Nocedal, 1980) relies on the iterative scheme
mk+1 = mk − αkQk∇f (mk ) (1)
whereQk ' H(mk )−1, sym > 0 (2)
andαk ∈ R+
∗ (3)
is a scalar parameter computed through a linesearch process
![Page 24: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/24.jpg)
The l-BFGS method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 12 / 31
l-BFGS approximation
The l-BFGS approximation consists in defining Qk as
Qk =`V T
k−1 . . .VTk−l
´Q0
k (Vk−l . . .Vk−1)+ρk−l
`V T
k−1 . . .VTk−l+1
´sk−ls
Tk−l (Vk−l+1 . . .Vk−1)
+ρk−l+1
`V T
k−1 . . .VTk−l+2
´sk−l+1s
Tk−l+1 (Vk−l+2 . . .Vk−1)
+ . . .
+ρk−1sk−1sTk−1,
(1)
where the pairs sk , yk are
sk = mk+1 −mk , yk = ∇f (mk+1)−∇f (mk ), (2)
the scalar ρk are
ρk =1
yTk sk
, (3)
and the matrices Vk are defined by
Vk = I − ρkyksTk . (4)
![Page 25: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/25.jpg)
The l-BFGS method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 12 / 31
Implementation: two-loops recursion
![Page 26: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/26.jpg)
Truncated Newton method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 13 / 31
Principle
The truncated Newton method (Nash, 2000) relies on the iterative scheme
mk+1 = mk + αk ∆mk (1)
where ∆mk is computed as an approximate solution of the linear system
H(mk )∆mk = −∇f (mk ) (2)
![Page 27: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/27.jpg)
Truncated Newton method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 13 / 31
Principle
The truncated Newton method (Nash, 2000) relies on the iterative scheme
mk+1 = mk + αk ∆mk (1)
where ∆mk is computed as an approximate solution of the linear system
H(mk )∆mk = −∇f (mk ) (2)
Implementation
A matrix-free conjugate gradient is used to solve this linear system (Saad,2003)
This only requires the capability to compute matrix-vector products H(mk )vfor given vectors v : the full Hessian matrix needs not to be formed explicitly
The resulting approximation of the Hessian only accounts for positiveeigenvalues of H(mk ): ∆mk is ensured to be a descent direction
![Page 28: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/28.jpg)
Outline
1 Numerical optimization methods for large-scale smooth unconstrained minimization problemsNumerical optimization for nonlinear least-squares problemsSteepest descent methodNewton methodQuasi-Newton methodsWhat about the nonlinear conjugate gradient?Summary
2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation
3 Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 14 / 31
![Page 29: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/29.jpg)
Conjugate gradient
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 15 / 31
Conjugate gradient for symmetric positive linear systems
The conjugate gradient is an iterative method for the solution of symmetricpositive definite linear systems
Am = b (3)
The method enjoys several interesting properties
Convergence in at most n iterations for a system of size n (ok)
Fast convergence rate possible depending on the eigenvalues distribution ofA: in practice, an acceptable approximation of the solution can be obtainedin k iterations with k << n
![Page 30: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/30.jpg)
Conjugate gradient
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 15 / 31
Only matrix-vector products to perform
Implementation
![Page 31: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/31.jpg)
Conjugate gradient
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 15 / 31
Nonlinear conjugate gradient
How to extend the conjugate gradient for the solution of nonlinear minimizationproblems? There is a link: solving
Am = b (3)
where A is symmetric positive definite is equivalent to solve
minm
f (m) = mT Am −mT b (4)
because∇f (m) = Am − b (5)
and f is strictly convex (a single extremum which is a minimum)
![Page 32: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/32.jpg)
Conjugate gradient
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 15 / 31
Implementation
Simply replace r in the preceding algorithm by ∇f (m)
![Page 33: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/33.jpg)
Outline
1 Numerical optimization methods for large-scale smooth unconstrained minimization problemsNumerical optimization for nonlinear least-squares problemsSteepest descent methodNewton methodQuasi-Newton methodsWhat about the nonlinear conjugate gradient?Summary
2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation
3 Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 16 / 31
![Page 34: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/34.jpg)
Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 17 / 31
An iterative scheme for local optimization
We have seen 4 different methods all based on the same iterative scheme
mk+1 = mk + αk ∆mk (3)
![Page 35: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/35.jpg)
Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 17 / 31
An iterative scheme for local optimization
We have seen 4 different methods all based on the same iterative scheme
mk+1 = mk + αk ∆mk (3)
Nonlinear optimization methods
The four method only differ in the way to compute ∆mk
Steepest descent ∆mk = −∇f (mk )Nonlinear CG ∆mk = −∇f (mk ) + βk ∆mk−1
l-BFGS ∆mk = −Qk∇f (mk ), Qk ' H−1k
Truncated Newton H(mk )∆mk = −∇f (mk ) (solved with CG)
![Page 36: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/36.jpg)
Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 17 / 31
Large-scale applications
From this quick overview we see that the two key quantities to be estimated forthe solution of
minm
f (m) =1
2‖dcal (m)− dobs‖2 (3)
are
The gradient of the misfit function ∇f (m)
Hessian vector products H(m)v for a given v (only for the truncated Newtonmethod)
We shall see in the next part how to compute it at a reasonablecomputational cost (memory imprint and flops) for large-scale applications
using adjoint state methods
![Page 37: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/37.jpg)
Outline
1 Numerical optimization methods for large-scale smooth unconstrained minimization problems
2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation
Gradient computation of a nonlinear least-squares functionFirst-order adjoint state methodSecond-order adjoint state method
3 Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 18 / 31
![Page 38: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/38.jpg)
Outline
1 Numerical optimization methods for large-scale smooth unconstrained minimization problems
2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation
Gradient computation of a nonlinear least-squares functionFirst-order adjoint state methodSecond-order adjoint state method
3 Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 19 / 31
![Page 39: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/39.jpg)
Gradient computation of a nonlinear least-squares function
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 20 / 31
Framework
We consider the problem
minm
f (m) =1
2‖dcal (m)− dobs‖2
![Page 40: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/40.jpg)
Gradient computation of a nonlinear least-squares function
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 20 / 31
For a perturbation dm we have
f (m + dm) =1
2‖dcal (m + dm)− dobs‖2
1
2‖dcal (m)− dobs + J(m)dm + o(‖dm‖2)‖2
where
J(m) =∂dcal
∂m
is the Jacobian matrix
![Page 41: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/41.jpg)
Gradient computation of a nonlinear least-squares function
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 20 / 31
For a perturbation dm we have
f (m + dm) =1
2‖dcal (m + dm)− dobs‖2
1
2‖dcal (m)− dobs + J(m)dm + o(‖dm‖2)‖2
where
J(m) =∂dcal
∂m
is the Jacobian matrix
f (m + dm) =1
2‖dcal (m)− dobs‖2 + (dcal − dobs , J(m)dm) + o(‖dm‖2)
1
2‖dcal (m)− dobs‖2 +
“J(m)T (dcal − dobs ) , dm
”+ o(‖dm‖2)
![Page 42: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/42.jpg)
Gradient computation of a nonlinear least-squares function
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 20 / 31
For a perturbation dm we have
f (m + dm) =1
2‖dcal (m + dm)− dobs‖2
1
2‖dcal (m)− dobs + J(m)dm + o(‖dm‖2)‖2
where
J(m) =∂dcal
∂m
is the Jacobian matrix
f (m + dm) =1
2‖dcal (m)− dobs‖2 + (dcal − dobs , J(m)dm) + o(‖dm‖2)
1
2‖dcal (m)− dobs‖2 +
“J(m)T (dcal − dobs ) , dm
”+ o(‖dm‖2)
Therefore
f (m + dm)− f (m) =“J(m)T (dcal − dobs ) , dm
”+ o(‖dm‖2)
![Page 43: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/43.jpg)
Gradient computation of a nonlinear least-squares function
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 20 / 31
For a perturbation dm we have
f (m + dm) =1
2‖dcal (m + dm)− dobs‖2
1
2‖dcal (m)− dobs + J(m)dm + o(‖dm‖2)‖2
where
J(m) =∂dcal
∂m
is the Jacobian matrix
f (m + dm) =1
2‖dcal (m)− dobs‖2 + (dcal − dobs , J(m)dm) + o(‖dm‖2)
1
2‖dcal (m)− dobs‖2 +
“J(m)T (dcal − dobs ) , dm
”+ o(‖dm‖2)
Therefore
f (m + dm)− f (m) =“J(m)T (dcal − dobs ) , dm
”+ o(‖dm‖2)
∇f (m) = J(m)T (dcal − dobs )
![Page 44: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/44.jpg)
Gradient computation of a nonlinear least-squares function
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 20 / 31
Implementation for large-scale applications
The size of J(m) can be problematic for large scale applications
After discretization it is a matrix with N rows and M columns where
1. N is the number of discrete data
2. M is the number of discrete model parameters
For Full Waveform Inversion for instance, we can have approximately
N ' 1010, M ' 109
This prevents from
1. Computing J(m) at each iteration of the inversion
2. Storing J(m) (or on disk but then expensive I/O and the performanceseverely decreases)
![Page 45: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/45.jpg)
Gradient computation of a nonlinear least-squares function
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 20 / 31
Can we avoid computing the Jacobian matrix?
Yes, using adjoint state methods
![Page 46: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/46.jpg)
Outline
1 Numerical optimization methods for large-scale smooth unconstrained minimization problems
2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation
Gradient computation of a nonlinear least-squares functionFirst-order adjoint state methodSecond-order adjoint state method
3 Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 21 / 31
![Page 47: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/47.jpg)
First-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 22 / 31
Specializing the forward problem
Now the problem is specialized such that
dcal (m) = Ru(m)
where u(m) satisfiesA(m, ∂x , ∂y , ∂z )u = s,
u is the solution of the PDE (wavefield for instance) in all the volume
R is an extraction operator as, most of the time, only partial measurementsare available
![Page 48: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/48.jpg)
First-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 22 / 31
References
Adjoint state method come from optimal control theory and preliminary workof (Lions, 1968)
It has been first applied to seismic imaging by (Chavent, 1974)
A nice review of its application in this field has been proposed by (Plessix,2006)
![Page 49: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/49.jpg)
First-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31
The Lagrangian function
From constrained optimization, we introduce the function
L(m, u, λ) =1
2‖Ru − d‖2 + (A(m, ∂x , ∂y , ∂z )u − s, λ)
![Page 50: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/50.jpg)
First-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31
The Lagrangian function
From constrained optimization, we introduce the function
L(m, u, λ) =1
2‖Ru − d‖2 + (A(m, ∂x , ∂y , ∂z )u − s, λ)
Link with the misfit function
Let u(m) be the solution of the forward problem for a given m, then
L(m, u(m), λ) =1
2‖Ru(m)− d‖2 = f (m)
![Page 51: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/51.jpg)
First-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31
The Lagrangian function
From constrained optimization, we introduce the function
L(m, u, λ) =1
2‖Ru − d‖2 + (A(m, ∂x , ∂y , ∂z )u − s, λ)
Link with the misfit function
Let u(m) be the solution of the forward problem for a given m, then
L(m, u(m), λ) =1
2‖Ru(m)− d‖2 = f (m)
Link with the gradient of the misfit function
Therefore∂L(m, u(m), λ)
∂m= ∇f (m)
![Page 52: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/52.jpg)
First-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31
Expending
This means that„∂A(m, ∂x , ∂y , ∂z )
∂mu(m), λ
«+∂L(m, u(m), λ)
∂u
∂u(m)
∂m= ∇f (m)
![Page 53: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/53.jpg)
First-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31
Expending
This means that„∂A(m, ∂x , ∂y , ∂z )
∂mu(m), λ
«+∂L(m, u(m), λ)
∂u
∂u(m)
∂m= ∇f (m)
Potential simplification
Therefore, if we define λ(m) such that
∂L`m, u(m), λ(m)
´∂u
= 0
we have „∂A(m, ∂x , ∂y , ∂z )
∂mu(m), λ(m)
«= ∇f (m)
![Page 54: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/54.jpg)
First-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31
Adjoint state formula
What does mean∂L`m, u(m), λ(m)
´∂u
= 0?
![Page 55: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/55.jpg)
First-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31
Consider a perturbation du. We have
L(m, u + du, λ) =1
2‖Ru − dobs + Rdu‖2 + (A(m)u − s + A(m)du, λ)
=1
2‖Ru − dobs‖2 + (Ru − dobs ,Rdu) + (A(m)u − s, , λ)
+ (A(m)du, λ) + o(‖du‖2)
= L(m, u, λ) +“RT (Ru − dobs ), du
”+“du,A(m)Tλ
”+ o(‖du‖2)
= L(m, u, λ) +“A(m)Tλ+ RT (Ru − dobs ), du
”+ o(‖du‖2)
![Page 56: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/56.jpg)
First-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31
Consider a perturbation du. We have
L(m, u + du, λ) =1
2‖Ru − dobs + Rdu‖2 + (A(m)u − s + A(m)du, λ)
=1
2‖Ru − dobs‖2 + (Ru − dobs ,Rdu) + (A(m)u − s, , λ)
+ (A(m)du, λ) + o(‖du‖2)
= L(m, u, λ) +“RT (Ru − dobs ), du
”+“du,A(m)Tλ
”+ o(‖du‖2)
= L(m, u, λ) +“A(m)Tλ+ RT (Ru − dobs ), du
”+ o(‖du‖2)
Therefore∂L`m, u(m), λ(m)
´∂u
= A(m)Tλ+ RT (Ru − dobs )
![Page 57: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/57.jpg)
First-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31
Adjoint state equation
Remember we are looking for λ(m) such that
∂L`m, u(m), λ(m)
´∂u
= 0
This simply means that λ(m) should be the solution of the adjoint PDE
A(m)Tλ+ RT (Ru(m)− dobs ) = 0
![Page 58: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/58.jpg)
First-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31
Adjoint state equation
Remember we are looking for λ(m) such that
∂L`m, u(m), λ(m)
´∂u
= 0
This simply means that λ(m) should be the solution of the adjoint PDE
A(m)Tλ+ RT (Ru(m)− dobs ) = 0
Self-adjoint case
In some cases, the forward problem is self adjoint, and the adjoint state λ(m)is the solution of the same equation than u(m) except that the source term isdifferent
In addition, the source term implies u(m) has been computed before hand, asit depends on this field
![Page 59: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/59.jpg)
First-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31
Summary
We have seen that we can compute the gradient of the misfit functionfollowing the formula
∇f (m) =
„∂A(m, ∂x , ∂y , ∂z )
∂mu(m), λ(m)
«where u(m) satisfies
A(m, ∂x , ∂y , ∂z )u = s,
and λ(m) satisfies
A(m, ∂x , ∂y , ∂z )Tλ+ RT (Ru(m)− dobs ) = 0
![Page 60: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/60.jpg)
First-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31
Implementation issues
What are the benefits of the adjoint-state approach?
To compute the gradient, we first have to compute u(m): first PDE solve
Then we compute λ(m): second PDE solve
Finally we form the gradient through the formula
∇f (m) =
„∂A(m, ∂x , ∂y , ∂z )
∂mu(m), λ(m)
«
![Page 61: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/61.jpg)
First-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 23 / 31
Implementation issues
What are the benefits of the adjoint-state approach?
To compute the gradient, we first have to compute u(m): first PDE solve
Then we compute λ(m): second PDE solve
Finally we form the gradient through the formula
∇f (m) =
„∂A(m, ∂x , ∂y , ∂z )
∂mu(m), λ(m)
«
The Jacobian matrix has never to be formed nor stored explicitly!
![Page 62: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/62.jpg)
Outline
1 Numerical optimization methods for large-scale smooth unconstrained minimization problems
2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation
Gradient computation of a nonlinear least-squares functionFirst-order adjoint state methodSecond-order adjoint state method
3 Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 24 / 31
![Page 63: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/63.jpg)
Second-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 25 / 31
Computing Hessian-vector product
We have seen that in the particular case of the truncated Newton method, it isrequired to know how to compute, for any v , the Hessian-matrix product
H(m)v ,
However, as it is the case for the Jacobian matrix J(m) the size of matrix H(m)for large-scale application is such that it cannot be computed explicitly nor stored
Again, the adjoint-state method should allow us to overcome this difficultysee (Fichtner and Trampert, 2011; Epanomeritakis et al., 2008; Metivier et al.,
2013)
![Page 64: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/64.jpg)
Second-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 25 / 31
Principle of the method
Consider the functionhv (m) = (∇f (m), v)
![Page 65: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/65.jpg)
Second-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 25 / 31
Principle of the method
Consider the functionhv (m) = (∇f (m), v)
For a perturbation dm we have
hv (m + dm) = (∇f (m + dm), v)
= (∇f (m) + H(m)dm, v) + o(‖dm‖2)
= (∇f (m), v) + (H(m)dm, v) + o(‖dm‖2)
= (∇f (m), v) + (dm,H(m)v) + o(‖dm‖2)
= hv (m) + (dm,H(m)v) + o(‖dm‖2)
![Page 66: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/66.jpg)
Second-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 25 / 31
Principle of the method
Consider the functionhv (m) = (∇f (m), v)
For a perturbation dm we have
hv (m + dm) = (∇f (m + dm), v)
= (∇f (m) + H(m)dm, v) + o(‖dm‖2)
= (∇f (m), v) + (H(m)dm, v) + o(‖dm‖2)
= (∇f (m), v) + (dm,H(m)v) + o(‖dm‖2)
= hv (m) + (dm,H(m)v) + o(‖dm‖2)
Hv through the gradient of hv
Therefore∇hv (m) = H(m)v
![Page 67: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/67.jpg)
Second-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 25 / 31
Principle of the method
Consider the functionhv (m) = (∇f (m), v)
For a perturbation dm we have
hv (m + dm) = (∇f (m + dm), v)
= (∇f (m) + H(m)dm, v) + o(‖dm‖2)
= (∇f (m), v) + (H(m)dm, v) + o(‖dm‖2)
= (∇f (m), v) + (dm,H(m)v) + o(‖dm‖2)
= hv (m) + (dm,H(m)v) + o(‖dm‖2)
Hv through the gradient of hv
Therefore∇hv (m) = H(m)v
All we have to do is to apply the previous strategy to the function hv (m)!
![Page 68: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/68.jpg)
Second-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31
Consider the new Lagrangian function
Lv (m, u, λ, g , µ1, µ2, µ3) = (g , v) +
„∂A(m)
∂mu
«T
λ− g , µ1
!+
(A(m)Tλ− RT (Ru − d), µ2)+
(A(m)u − s, µ3)
![Page 69: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/69.jpg)
Second-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31
Consider the new Lagrangian function
Lv (m, u, λ, g , µ1, µ2, µ3) = (g , v) +
„∂A(m)
∂mu
«T
λ− g , µ1
!+
(A(m)Tλ− RT (Ru − d), µ2)+
(A(m)u − s, µ3)
For u = u(m), λ = λ(m), g = g(m) respectively solutions of
A(m)u = s, A(m)Tλ = RT (Ru(m)− dobs ), g(m) =
„∂A(m)
∂mu(m)
«T
λ(m)
we haveLv (m, u(m), λ(m), g(m), µ1, µ2, µ3) = hv (m)
![Page 70: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/70.jpg)
Second-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31
Consider the new Lagrangian function
Lv (m, u, λ, g , µ1, µ2, µ3) = (g , v) +
„∂A(m)
∂mu
«T
λ− g , µ1
!+
(A(m)Tλ− RT (Ru − d), µ2)+
(A(m)u − s, µ3)
For u = u(m), λ = λ(m), g = g(m) respectively solutions of
A(m)u = s, A(m)Tλ = RT (Ru(m)− dobs ), g(m) =
„∂A(m)
∂mu(m)
«T
λ(m)
we haveLv (m, u(m), λ(m), g(m), µ1, µ2, µ3) = hv (m)
Hence∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)
∂m= ∇hv (m) = H(m)v
![Page 71: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/71.jpg)
Second-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31
Again, we develop the previous expression
∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)
∂m= „
∂2A(m)
∂m2u(m)
«T
λ(m), µ1
!+
„∂A(m)T
∂mλ(m), µ2
«+
„∂A(m)
∂mu(m), µ3
«+
∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)
∂u
∂u
∂m+
∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)
∂λ
∂λ
∂m+
∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)
∂g
∂g
∂m
![Page 72: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/72.jpg)
Second-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31
Now we look for µ1, µ2, µ3 such that8>>>>>>>><>>>>>>>>:
∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)
∂u= 0
∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)
∂λ= 0
∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)
∂g= 0
![Page 73: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/73.jpg)
Second-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31
Now we look for µ1, µ2, µ3 such that8>>>>>>>><>>>>>>>>:
∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)
∂u= 0
∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)
∂λ= 0
∂Lv (m, u(m), λ(m), g(m), µ1, µ2, µ3)
∂g= 0
This is equivalent to8>>>>>>>><>>>>>>>>:
„∂A
∂mµ1
«T
λ(m) + RT Rµ2 + A(m)Tµ3 = 0
„∂A
∂mu(m)
«T
µ1 + A(m)µ2 = 0
v − µ1 = 0
![Page 74: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/74.jpg)
Second-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31
Reorganizing these equations, we find that8>>>>>>>><>>>>>>>>:
µ1 = v
A(m)µ2 = −„∂A
∂mu(m)
«T
v
A(m)Tµ3 = −„∂A
∂mv
«T
λ(m) + RT Rµ2
![Page 75: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/75.jpg)
Second-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31
Reorganizing these equations, we find that8>>>>>>>><>>>>>>>>:
µ1 = v
A(m)µ2 = −„∂A
∂mu(m)
«T
v
A(m)Tµ3 = −„∂A
∂mv
«T
λ(m) + RT Rµ2
Implementation
µ1 is given for free: it is v
µ2 is the solution of a forward problem involving a new source term whichdepends on v and u(m)
µ3 is the solution of an adjoint problem involving a new source term whichdepends on b, λ(m) and µ2
![Page 76: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/76.jpg)
Second-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31
Summary
The computation of H(m)v for a given v can be obtained through the formula
H(m)v =
„∂2A(m)
∂m2u(m)
«T
λ(m), µ1
!+„
∂A(m)T
∂mλ(m), µ2
«+
„∂A(m)
∂mu(m), µ3
« (4)
![Page 77: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/77.jpg)
Second-order adjoint state method
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 26 / 31
Summary
The computation of H(m)v for a given v can be obtained through the formula
H(m)v =
„∂2A(m)
∂m2u(m)
«T
λ(m), µ1
!+„
∂A(m)T
∂mλ(m), µ2
«+
„∂A(m)
∂mu(m), µ3
« (4)
where
Forward and adjoint simulations
u(m) is computed as a solution of the forward problem
λ(m) is computed as a solution of the adjoint problem
µ2 is computed as a solution of the forward problem for a new source term
µ3 is computed as a solution of the adjoint problem for a new source term
![Page 78: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/78.jpg)
Outline
1 Numerical optimization methods for large-scale smooth unconstrained minimization problems
2 First-order and second-order adjoint state methods for gradient and Hessian-vector productscomputation
3 Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 27 / 31
![Page 79: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/79.jpg)
Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 28 / 31
Optimization methods for nonlinear least-squares problems
minm
f (m) =1
2‖dcal (m)− dobs‖2
![Page 80: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/80.jpg)
Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 28 / 31
Optimization methods for nonlinear least-squares problems
minm
f (m) =1
2‖dcal (m)− dobs‖2
An iterative scheme for local optimization
Local optimization methods are all based on the same iterative scheme
mk+1 = mk + αk ∆mk (5)
![Page 81: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/81.jpg)
Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 28 / 31
Optimization methods for nonlinear least-squares problems
minm
f (m) =1
2‖dcal (m)− dobs‖2
An iterative scheme for local optimization
Local optimization methods are all based on the same iterative scheme
mk+1 = mk + αk ∆mk (5)
Four Nonlinear optimization methods
The differences come from the computation of ∆mk
Steepest descent ∆mk = −∇f (mk )Nonlinear CG ∆mk = −∇f (mk ) + βk ∆mk−1
l-BFGS ∆mk = −Qk∇f (mk ), Qk ' H−1k
Truncated Newton H(mk )∆mk = −∇f (mk ) (solved with CG)
![Page 82: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/82.jpg)
Summary
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 28 / 31
Adjoint methods
The gradient can be computed through the first-order adjoint method at the price
1 forward modeling
1 adjoint modeling
The Hessian-vector product (only required for truncated Newton) can becomputed through second-order adjoint method at the price
1 additional forward modeling
1 additional adjoint modeling
![Page 83: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/83.jpg)
SEISCOPE Toolbox
A set of optimization routines in FORTRAN90
Optimization routines for differentiable functions
Steepest-descent, nonlinear conjugate gradient
l-BFGS, truncated Newton
Implemented using a reverse communication protocol: the user is in charge forcomputing gradient and Hessian-vector product
Open-source code available here
https://seiscope2.obs.ujf-grenoble.fr/SEISCOPE-OPTIMIZATION-TOOLBOX
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 29 / 31
![Page 84: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/84.jpg)
Acknowledgments
Thank you for your attention
National HPC facilities of GENCI-IDRIS-CINES under grant Grant 046091
Local HPC facilities of CIMENT-SCCI (Univ. Grenoble) and SIGAMM (Obs. Nice)
SEICOPE sponsors : http://seiscope2.osug.fr
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 30 / 31
![Page 85: Numerical optimization and adjoint state methods for large ... · Numerical optimization and adjoint state methods for large-scale nonlinear least-squares problems Ludovic M etivier](https://reader030.fdocuments.net/reader030/viewer/2022040504/5e32baf2cc69e731f0297f6b/html5/thumbnails/85.jpg)
Few references
Chavent, G. (1974). Identification of parameter distributed systems. In Goodson, R. and Polis, M., editors,Identification of function parameters in partial differential equations, pages 31–48. American Society ofMechanical Engineers, New York.
Epanomeritakis, I., Akcelik, V., Ghattas, O., and Bielak, J. (2008). A Newton-CG method for large-scalethree-dimensional elastic full waveform seismic inversion. Inverse Problems, 24:1–26.
Fichtner, A. and Trampert, J. (2011). Hessian kernels of seismic data functionals based upon adjointtechniques. Geophysical Journal International, 185(2):775–798.
Lions, J. L. (1968). Controle optimal de systemes gouvernes par des equations aux derivees partielles. Dunod,Paris.
Metivier, L., Brossier, R., Virieux, J., and Operto, S. (2013). Full Waveform Inversion and the truncatedNewton method. SIAM Journal On Scientific Computing, 35(2):B401–B437.
Nash, S. G. (2000). A survey of truncated Newton methods. Journal of Computational and AppliedMathematics, 124:45–59.
Nocedal, J. (1980). Updating Quasi-Newton Matrices With Limited Storage. Mathematics of Computation,35(151):773–782.
Nocedal, J. and Wright, S. J. (2006). Numerical Optimization. Springer, 2nd edition.
Plessix, R. E. (2006). A review of the adjoint-state method for computing the gradient of a functional withgeophysical applications. Geophysical Journal International, 167(2):495–503.
Saad, Y. (2003). Iterative methods for sparse linear systems. SIAM, Philadelphia.
L. Metivier (LJK, CNRS) Numerical optimization 06/16/2015 Joint Inversion School 31 / 31