FEniCS Course - folk.uio.no · FEniCS Course Lecture 13: Introduction to dol n-adjoint Contributors...

50
FEniCS Course Lecture 13: Introduction to doln-adjoint Contributors Simon Funke Patrick Farrell Marie E. Rognes Magne Nordaas 1 / 34

Transcript of FEniCS Course - folk.uio.no · FEniCS Course Lecture 13: Introduction to dol n-adjoint Contributors...

FEniCS CourseLecture 13: Introduction to dolfin-adjoint

ContributorsSimon FunkePatrick FarrellMarie E. RognesMagne Nordaas

1 / 34

Adjoints are key ingredients for sensitivityanalysis, PDE-constrained optimization, ...

So far we have focused on solving forward PDEs.

But we want to do (and can do) more than that!

Maybe we are interested in ...• the sensitivity with respect to problem parameters

• initial conditions,• forcing terms,• coefficients (constant or with spatial/temporal dependence)

• Uncertainty quantification• PDE-constrained optimization

• data assimilation• optimal control

• goal-oriented error control

For this we want to compute functional derivatives and adjointsprovide an efficient way of doing so.

2 / 34

The Hello World of functional derivatives

Consider the Poisson’s equation

−ν∆u = m in Ω,

u = 0 on ∂Ω,

together with the objective functional

J(u) =1

2

∫Ω‖u− ud‖2 dx,

where ud is a known function.

GoalCompute the sensitivity of J with respect to the parameter m:dJ/dm.

3 / 34

Computing functional derivatives (Part 1/3)

Given

• Parameter m,

• PDE F (u,m) = 0 with solution u.

• Objective functional J(u,m)→ R,

GoalCompute the (total) derivative dJ/dm.

Reduced functionalConsider u as an implicit function of m by solving the PDE.With that we define the reduced functional R:

R(m) = J(u(m),m)

4 / 34

Computing functional derivatives (Part 1/3)

Given

• Parameter m,

• PDE F (u,m) = 0 with solution u.

• Objective functional J(u,m)→ R,

GoalCompute the (total) derivative dJ/dm.

Reduced functionalConsider u as an implicit function of m by solving the PDE.With that we define the reduced functional R:

R(m) = J(u(m),m)

4 / 34

Computing functional derivatives (Part 2/3)

Reduced functional:

R(m) ≡ J(u(m),m).

Taking the derivative of with respect to m yields:

dR

dm=

dJ

dm=∂J

∂u

du

dm+∂J

∂m.

Computing ∂J∂u and ∂J

∂m is straight-forward, but how handle dudm?

5 / 34

Computing functional derivatives (Part 2/3)

Reduced functional:

R(m) ≡ J(u(m),m).

Taking the derivative of with respect to m yields:

dR

dm=

dJ

dm=∂J

∂u

du

dm+∂J

∂m.

Computing ∂J∂u and ∂J

∂m is straight-forward, but how handle dudm?

5 / 34

Computing functional derivatives (Part 2/3)

Reduced functional:

R(m) ≡ J(u(m),m).

Taking the derivative of with respect to m yields:

dR

dm=

dJ

dm=∂J

∂u

du

dm+∂J

∂m.

Computing ∂J∂u and ∂J

∂m is straight-forward, but how handle dudm?

5 / 34

Computing functional derivatives (Part 3/3)

Taking the derivative of F (u,m) = 0 with respect to m yields:

dF

dm=∂F

∂u

du

dm+∂F

∂m= 0

Hence:

du

dm= −

(∂F

∂u

)−1 ∂F

∂m

Final formula for functional derivative

dJ

dm= −

adjoint PDE︷ ︸︸ ︷∂J

∂u

(∂F

∂u

)−1 ∂F

∂m︸ ︷︷ ︸tangent linear PDE

+∂J

∂m,

6 / 34

Computing functional derivatives (Part 3/3)

Taking the derivative of F (u,m) = 0 with respect to m yields:

dF

dm=∂F

∂u

du

dm+∂F

∂m= 0

Hence:

du

dm= −

(∂F

∂u

)−1 ∂F

∂m

Final formula for functional derivative

dJ

dm= −

adjoint PDE︷ ︸︸ ︷∂J

∂u

(∂F

∂u

)−1 ∂F

∂m︸ ︷︷ ︸tangent linear PDE

+∂J

∂m,

6 / 34

Computing functional derivatives (Part 3/3)

Taking the derivative of F (u,m) = 0 with respect to m yields:

dF

dm=∂F

∂u

du

dm+∂F

∂m= 0

Hence:

du

dm= −

(∂F

∂u

)−1 ∂F

∂m

Final formula for functional derivative

dJ

dm= −

adjoint PDE︷ ︸︸ ︷∂J

∂u

(∂F

∂u

)−1 ∂F

∂m︸ ︷︷ ︸tangent linear PDE

+∂J

∂m,

6 / 34

Dimensions of a finite dimensional example

dJ

dm=

discretised adjoint PDE︷ ︸︸ ︷−∂J

∂u ×(∂F∂u

)−1 × ∂F∂m

︸ ︷︷ ︸discretised tangent linear PDE

+ ∂J∂m

The tangent linear solution is a matrix of dimension |u| × |m|and requires the solution of m linear systems.

The adjoint solution is a vector of dimension |u| and requiresthe solution of one linear system.

7 / 34

Adjoint approach

1 Solve the adjoint equation for λ

∂F

∂u

∗λ = −∂J

∂u.

2 ComputedJ

dm= λ∗

∂F

∂m+∂J

∂m.

The computational expensive part is (1). It requires solving the(linear) adjoint PDE, and its cost is independent of the choiceof parameter m.

8 / 34

What is PDE-constrained optimisation?

Optimisation problems where at least one constrained is apartial differential equation

Applications

• Data assimilation.Example: Weather modelling.

• Shape and topology optimisation.Example: Optimal shape of an aerfoil.

• Parameter estimation.

• Optimal control.

• ...

9 / 34

What is PDE-constrained optimisation?

Optimisation problems where at least one constrained is apartial differential equation

Applications

• Data assimilation.Example: Weather modelling.

• Shape and topology optimisation.Example: Optimal shape of an aerfoil.

• Parameter estimation.

• Optimal control.

• ...

9 / 34

Hello World of PDE-constrained optimisation!

We will solve the optimal control of the Poisson equation:

minu,m

1

2

∫Ω‖u− ud‖2 dx+

α

2

∫Ω‖m‖2 dx

subject to

−∆u = m in Ω

u = u0 on ∂Ω

• This problem can be physically interpreted as: Find theheating/cooling term m for which u best approximates thedesired heat distribution ud.

• The second term in the objective functional, known asThikhonov regularisation, ensures existence and uniquenessfor α > 0.

10 / 34

Hello World of PDE-constrained optimisation!

We will solve the optimal control of the Poisson equation:

minu,m

1

2

∫Ω‖u− ud‖2 dx+

α

2

∫Ω‖m‖2 dx

subject to

−∆u = m in Ω

u = u0 on ∂Ω

• This problem can be physically interpreted as: Find theheating/cooling term m for which u best approximates thedesired heat distribution ud.

• The second term in the objective functional, known asThikhonov regularisation, ensures existence and uniquenessfor α > 0.

10 / 34

The canconical abstract form

minu,m

J(u,m)

subject to:

F (u,m) = 0,

with

• the objective functional J .

• the parameter m.

• the PDE operator F with solution u, parametrised by m.

11 / 34

The reduced problem

minm

J(m) = J(u(m),m)

with

• the reduced functional J .

• the parameter m.

How do we solve this problem?

• Gradient descent.

• Newton method.

• Quasi-Newton methods.

12 / 34

The reduced problem

minm

J(m) = J(u(m),m)

with

• the reduced functional J .

• the parameter m.

How do we solve this problem?

• Gradient descent.

• Newton method.

• Quasi-Newton methods.

12 / 34

Gradient descent

Algorithm

1 Choose initial parameter value m0 and γ > 0.

2 For i = 0, 1, . . . :• mi+1 = mi − γ∇J(mi)

Features

+ Easy to implement.

– Slow convergence.

13 / 34

Newton methodOptimisation problem: minm J(m).Optimality condition:

∇J(m) = 0. (1)

Newton method applied to (1):

1 Choose initial parameter value m0.2 For i = 0, 1, . . . :

• H(J)δm = −∇J(mi), where H denotes the Hessian.• mi+1 = mi + δm

Features

+ Fast (locally quadratic) convergence.

– Requires iteratively solving a linear system withthe Hessian, which might require many Hessianaction computations.

– Hessian might not be positive definite, resulting inan update δm which is not a descent direction.

14 / 34

Newton methodOptimisation problem: minm J(m).Optimality condition:

∇J(m) = 0. (1)

Newton method applied to (1):

1 Choose initial parameter value m0.2 For i = 0, 1, . . . :

• H(J)δm = −∇J(mi), where H denotes the Hessian.• mi+1 = mi + δm

Features

+ Fast (locally quadratic) convergence.

– Requires iteratively solving a linear system withthe Hessian, which might require many Hessianaction computations.

– Hessian might not be positive definite, resulting inan update δm which is not a descent direction.

14 / 34

Newton methodOptimisation problem: minm J(m).Optimality condition:

∇J(m) = 0. (1)

Newton method applied to (1):

1 Choose initial parameter value m0.2 For i = 0, 1, . . . :

• H(J)δm = −∇J(mi), where H denotes the Hessian.• mi+1 = mi + δm

Features

+ Fast (locally quadratic) convergence.

– Requires iteratively solving a linear system withthe Hessian, which might require many Hessianaction computations.

– Hessian might not be positive definite, resulting inan update δm which is not a descent direction.

14 / 34

Newton methodOptimisation problem: minm J(m).Optimality condition:

∇J(m) = 0. (1)

Newton method applied to (1):

1 Choose initial parameter value m0.2 For i = 0, 1, . . . :

• H(J)δm = −∇J(mi), where H denotes the Hessian.• mi+1 = mi + δm

Features

+ Fast (locally quadratic) convergence.

– Requires iteratively solving a linear system withthe Hessian, which might require many Hessianaction computations.

– Hessian might not be positive definite, resulting inan update δm which is not a descent direction.

14 / 34

Quasi-Newton methods

Like Newton method, but use approximate, low-rank Hessianapproximation using gradient information only. A commonapproximation method is BFGS.

Features

+ Robust: Hessian approximation is always positivedefinite.

+ Cheap: No Hessian computation required, onlygradient computations.

– Only superlinear convergence rate.

15 / 34

Quasi-Newton methods

Like Newton method, but use approximate, low-rank Hessianapproximation using gradient information only. A commonapproximation method is BFGS.

Features

+ Robust: Hessian approximation is always positivedefinite.

+ Cheap: No Hessian computation required, onlygradient computations.

– Only superlinear convergence rate.

15 / 34

Solving the optimal Poisson problem

from fenics import *

from dolfin_adjoint import *

# Solve Poisson problem

# ...

J = Functional(inner(s, s)*dx)

m = SteadyParameter(f)

rf = ReducedFunctional(J, m)

m_opt = minimize(rf , method="L-BFGS -B", tol=1e-2)

Tipps

• You can call print optimization methods() to list allavailable methods.

• Use maximize if you want to solve a maximisationproblem.

16 / 34

Solving the optimal Poisson problem

from fenics import *

from dolfin_adjoint import *

# Solve Poisson problem

# ...

J = Functional(inner(s, s)*dx)

m = SteadyParameter(f)

rf = ReducedFunctional(J, m)

m_opt = minimize(rf , method="L-BFGS -B", tol=1e-2)

Tipps

• You can call print optimization methods() to list allavailable methods.

• Use maximize if you want to solve a maximisationproblem.

16 / 34

Bound constraints

Sometimes it is usefull to specify lower and upper bounds forparameters:

lb ≤ m ≤ ub. (2)

Example:

lb = interpolate(0, V)

ub = interpolate(Expression("x[0]", degree=1), V)

m_opt = minimize(rf , method="L-BFGS -B",

bounds=[lb , ub])

Note: Not all optimisation algorithms support boundconstraints.

17 / 34

Inequality constraints

Sometimes it is usefull to specify (in-)equality constraints onthe parameters:

g(m) ≤ 0. (3)

You can do that by overloading the InequalityConstraintclass.For more information visit the Example section ondolfin-adjoint.org.

18 / 34

The FEniCS challenge!

1 Solve the ”Hello world” PDE-constrained optimisationproblem on the unit square with ud(x, y) = sin(πx) sin(πy),homogenous boundary conditions and α = 10−6.

2 Compute the difference between optimised heat profile andud before and after the optimisation.

3 Use the optimisation algorithms SLSQP, Newton-CG andL-BFGS-B and compare them.

4 What happens if you increase α?

19 / 34

What is dolfin-adjoint?

Dolfin-adjoint is an extension of FEniCS for: solving adjointand tangent linear equations; generalised stability analysis;PDE-constrained optimisation.

Main features

• Automated derivation of first and second order adjoint andtangent linear models.

• Discretely consistent derivatives.

• Parallel support and near theoretically optimalperformance.

• Interface to optimisation algorithms for PDE-constrainedoptimisation.

• Documentation and examples onwww.dolfin-adjoint.org.

20 / 34

What has dolfin-adjoint been used for?Layout optimisation of tidal turbines

• Up to 400 tidal turbines in one farm.

• What are the optimal locations to maximise powerproduction?

21 / 34

What has dolfin-adjoint been used for?Layout optimisation of tidal turbines

0 20 40 60 80 100 120 140Optimisation iteration

600

650

700

750

800

850

Pow

er[M

W]

22 / 34

What has dolfin-adjoint been used for?Layout optimisation of tidal turbines

from dolfin import *

from dolfin_adjoint import *

# FEniCS model

# ...

J = Functional(turbines*inner(u, u)**(3/2)*dx*dt)

m = Control(turbine_positions)

R = ReducedFunctional(J, m)

maximize(R)

23 / 34

Other applications

Dolfin-adjoint has been applied to lots of other cases, andworks for many PDEs:

Some PDEs we have adjoined

• Burgers

• Navier-Stokes

• Stokes + mantle rheology

• Stokes + ice rheology

• Saint Venant +wetting/drying

• Cahn-Hilliard

• Gray-Scott

• Shallow ice

• Blatter-Pattyn

• Quasi-geostrophic

• Viscoelasticity

• Gross-Pitaevskii

• Yamabe

• Image registration

• Bidomain

• . . .

24 / 34

Consider again our first example

Compute the sensitivity ∂J/∂m of

J(u) =

∫Ω‖u− ud‖2 dx

with known ud and the Poisson equation:

−ν∆u = m in Ω

u = 0 on ∂Ω.

with respect to m.

25 / 34

Poisson solver in FEniCSAn implementation of the Poisson’s equation might look like this:

from fenics import *

# Define mesh and finite element space

mesh = UnitSquareMesh(50, 50)

V = FunctionSpace(mesh , "Lagrange", 1)

# Define basis functions and parameters

u = TrialFunction(V)

v = TestFunction(V)

m = interpolate(Constant(1.0), V)

nu = Constant(1.0)

# Define variational problem

a = nu*inner(grad(u), grad(v))*dx

L = m*v*dx

bc = DirichletBC(V, 0.0, "on_boundary")

# Solve variational problem

u = Function(V)

solve(a == L, u, bc)

plot(u, title="u")

26 / 34

Dolfin-adjoint records all relevant steps of yourFEniCS code

The first change necessary to adjoin this code is to import thedolfin-adjoint module after importing DOLFIN:

from fenics import *

from dolfin_adjoint import *

With this, dolfin-adjoint will record each step of the model,building an annotation. The annotation is used to symbolicallymanipulate the recorded equations to derive the tangent linearand adjoint models.

In this particular example, the solve function method will berecorded.

27 / 34

Dolfin-adjoint extends the FEniCS syntax fordefining objective functionals

Next, we implement the objective functional, the squareL2-norm of u− ud:

J(u) =

∫Ω‖u− ud‖2 dx

or in code

j = inner(u - u_d , u - u_d)*dx

J = Functional(j)

28 / 34

Specify which parameter you want todifferentiate with respect to using Controls

Next we need to decide which parameter we are interested in.Here, we would like to investigate the sensitivity with respect tothe source term m.

We inform dolfin-adjoint of this:

mc = Control(m)

29 / 34

One line of code for efficiently computinggradients

Now, we can compute the gradient with:

dJdm = compute_gradient(J, mc, project=True)

Dolfin-adjoint derives and solves the adjoint equations for usand returns the gradient.

NoteIf you call compute gradient more than once, you need topass forget=False as a parameter. Otherwise you get an error:Need a value for u 1:0:0:Forward, but don’t have one recorded.

Computational cost

Computing the gradient requires one adjoint solve.

30 / 34

One line of code for efficiently computinggradients

Now, we can compute the gradient with:

dJdm = compute_gradient(J, mc, project=True)

Dolfin-adjoint derives and solves the adjoint equations for usand returns the gradient.

NoteIf you call compute gradient more than once, you need topass forget=False as a parameter. Otherwise you get an error:Need a value for u 1:0:0:Forward, but don’t have one recorded.

Computational cost

Computing the gradient requires one adjoint solve.

30 / 34

One line of code for efficiently computinggradients

Now, we can compute the gradient with:

dJdm = compute_gradient(J, mc, project=True)

Dolfin-adjoint derives and solves the adjoint equations for usand returns the gradient.

NoteIf you call compute gradient more than once, you need topass forget=False as a parameter. Otherwise you get an error:Need a value for u 1:0:0:Forward, but don’t have one recorded.

Computational cost

Computing the gradient requires one adjoint solve.

30 / 34

One line of code for efficiently computingHessians

Dolfin-adjoint can also compute the second derivatives(Hessians):

H = hessian(J, mc)

direction = interpolate(Constant(1), V)

plot(H(direction))

Computational cost

Computing the directional second derivative requires onetangent linear and two adjoint solves.

31 / 34

One line of code for efficiently computingHessians

Dolfin-adjoint can also compute the second derivatives(Hessians):

H = hessian(J, mc)

direction = interpolate(Constant(1), V)

plot(H(direction))

Computational cost

Computing the directional second derivative requires onetangent linear and two adjoint solves.

31 / 34

Verification: How can you check that thegradient is correct?

Taylor expansion of the reduced functional R in a perturbationδm yields:

|R(m+ εδm)−R(m)| → 0 at O(ε)

but

|R(m+ εδm)−R(m)− ε∇R · δm| → 0 at O(ε2)

Taylor test

Choose m, δm and determine the convergence rate by reducingε. If the convergence order with gradient is ≈ 2, your gradient isprobably correct.

R = ReducedFunctional(J, mc)

R.taylor_test(m)

32 / 34

The next generation of dolfin-adjoint is underdevelopment

• Complete rewrite of dolfin-adjoint

• Implemented purely in python

• Codebase reduced by ∼ 90%

• Simplified interface

• Compute derivatives with respect to boundary conditions

• ... and more

bitbucket.org/dolfin-adjoint/pyadjoint

33 / 34

Dolfin-adjoint Exercise 1

1 Install dolfin-adjoint

$ fenicsproject notebookbook myproject \

> quay.io/dolfinadjoint/dolfin-adjoint

2 Compute the gradient and Hessian of the Poisson example

with respect to m.

3 Run the Taylor test to check that the gradient is correct.

Notebook tip

Dolfin-adjoint and Notebooks are somewhat orthogonal. If

mysterious messages appear, try ’Kernel - Restart & Run All’

34 / 34