Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory)

43
Challenges and Opportunities in Challenges and Opportunities in Using Automatic Differentiation Using Automatic Differentiation with Object-Oriented Toolkits for with Object-Oriented Toolkits for Scientific Computing Scientific Computing Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory) Lois McInnes (ANL) Boyana Norris (ANL) Barry Smith (ANL) The Computational Differentiation Project at Argonne National Laboratory

description

Challenges and Opportunities in Using Automatic Differentiation with Object-Oriented Toolkits for Scientific Computing. Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory) Lois McInnes (ANL) Boyana Norris (ANL) Barry Smith (ANL). - PowerPoint PPT Presentation

Transcript of Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory)

Page 1: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Challenges and Opportunities in Using Challenges and Opportunities in Using Automatic Differentiation with Object-Automatic Differentiation with Object-Oriented Toolkits for Scientific ComputingOriented Toolkits for Scientific Computing

Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory)Lois McInnes (ANL)Boyana Norris (ANL)Barry Smith (ANL)

The Computational Differentiation Project at Argonne National Laboratory

Page 2: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

AcknowledgmentsAcknowledgments

Jason Abate Satish Balay Steve Benson Peter Brown Omar Ghattas Lisa Grignon William Gropp Alan Hindmarsh David Keyes Jorge Moré Linda Petzold Widodo Samyono

Page 3: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

OutlineOutline

Intro to AD Survey of Toolkits

SensPVODE PETSc TAO

Using AD with Toolkits Toolkit Level Parallel Function Level Subdomain Level Element/Vertex Function Level

Experimental Results Conclusions and Expectations

Page 4: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Automatic DifferentiationAutomatic Differentiation

Technique for augmenting code for computing a function with code for computing derivatives

Analytic differentiation of elementary operations/functions, propagation by chain rule

Can be implemented using source transformation or operator overloading

Two main modes Forward: propagates derivatives from independent

to dependent variables Reverse (adjoint): propagates derivatives from

dependent to independent variables

Page 5: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Comparison of MethodsComparison of Methods

Finite Differences Advantages: cheap Jv, easy Disadvantages: inaccurate (not robust)

Hand Differentiation Advantages: accurate; cheap Jv, JTv, Hv, … Disadvantages: hard; difficult to maintain

consistency Automatic Differentiation

Advantages: cheap JTv, Hv; easy? Disadvantages: Jv costs ~2 function evals; hard?

Page 6: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

PVODE: Parallel ODE-IVP PVODE: Parallel ODE-IVP solversolver

Algorithm developers:

Hindmarsh, Byrne, Brown and Cohen

ODE Initial-Value Problems

Stiff and non-stiff integrators

Written in C

MPI calls for communication

Page 7: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

PVODE for ODE PVODE for ODE simulationssimulations

ODE Initial-Value Problem (standard form):

Implicit time-stepping using BDF methods for y(tn)

Solve nonlinear system for y(tn) via Inexact Newton Solve update to Newton iterate using Krylov methods

. , ,

.)( with ),,,( 00

mNN RpRyRy

ytypytfy

Page 8: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

ObjectiveObjective

ODE Solver(PVODE)

F(y,p)

y|t=0

py|t=t1, t2, ...

SensitivitySolver

y|t=0

p y|t=t1, t2, ...

dy/dp|t=t1, t2, ...

automatically

Page 9: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Possible ApproachesPossible Approaches

ad_PVODE

ad_F(y,ad_y,p,ad_p)

y, ad_y|t=0

p,ad_py, ad_y|t=t1, t2, ...

SensPVODEy|t=0

py, dy/dp |t=t1, t2, ...

ad_F(y,ad_y,p,ad_p)

Apply AD to PVODE:

Solve sensitivity eqns:

Page 10: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Sensitivity Differential Sensitivity Differential EquationsEquations

Differentiate y= f(t, y, p) with respect to pi:

A linear ODE-IVP for the sensitivity vector si(t) :

.iii p

f

p

y

y

f

p

y

.0)( with ,)()( 0

tsp

fts

y

fts i

iii

Page 11: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

PETScPETSc

Portable, Extensible Toolkit for Scientific computing

Parallel Object-oriented Free Supported (manuals, email) Interfaces with Fortran 77/90, C/C++ Available for virtually all UNIX platforms, as well

as Windows 95/NT Flexible: many options for solver algorithms and

parameters

Page 12: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

PETSc codeUser code

ApplicationInitialization

FunctionEvaluation

JacobianEvaluation

Post-Processing

PC KSPPETSc

Linear Solvers (SLES)

Nonlinear Solvers (SNES)

SolveF(u) = 0

Nonlinear PDE SolutionNonlinear PDE Solution

AD-generated code

Main Routine

Page 13: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

TAOTAO

Object-oriented techniques Component-based (CCA) interaction Leverage existing parallel computing infrastructure Reuse of external toolkits

The Right Way

The process of nature by which all things change and which is to be followed for a life of harmony

Toolkit for advanced optimization

Page 14: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

TAO GoalsTAO Goals

Portability

Performance

Scalable parallelism

An interface independent of architecture

Page 15: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Unconstrained optimization Limited-memory variable-metric method Trust region/line search Newton method Conjugate-gradient method Levenberg-Marquardt method

Bound-constrained optimization Trust region Newton method Gradient projection/conjugate gradient method

Linearly-constrained optimization Interior-point method with iterative solvers

TAO AlgorithmsTAO Algorithms

Page 16: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

ApplicationInitialization

Function & GradientEvaluation

HessianEvaluation

Post-Processing

Application Driver

Toolkit for Advanced

Optimization(TAO)PC KSP

Linear SolversMatrices

Vectors

Optimization Tools

TAO codeUser code PETSc code

PETSc and TAOPETSc and TAO

Page 17: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Using AD with ToolkitsUsing AD with Toolkits

Apply AD to toolkit to produce derivative-enhanced toolkit

Use AD to provide Jacobian/Hessian/gradient for use by toolkit. Apply AD at Parallel Function Level Subdomain Function Level Element/Vertex Function Level

Page 18: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Differentiated Version of ToolkitDifferentiated Version of Toolkit

Makes possible sensitivity analysis, black-box optimization of models constructed using toolkit

Can take advantage of high-level structure of algorithms, providing better performance: see Andreas’ and Linda’s talks

Ongoing work with PETSc and PVODE

Page 19: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Levels of Function EvaluationLevels of Function Evaluationint FormFunction(SNES snes,Vec X,Vec F,void *ptr){ Parallel Function Level /* Variable declarations omitted */ mx = user->mx; my = user->my; lambda = user->param; Subdomain Function Level hx = one/(double)(mx-1); hy = one/(double)(my-1); sc = hx*hy*lambda; hxdhy = hx/hy; hydhx = hy/hx;

ierr = DAGlobalToLocalBegin(user->da,X,INSERT_VALUES,localX);CHKERRQ(ierr); ierr = DAGlobalToLocalEnd(user->da,X,INSERT_VALUES,localX);CHKERRQ(ierr);

ierr = VecGetArray(localX,&x);CHKERRQ(ierr); ierr = VecGetArray(localF,&f);CHKERRQ(ierr);

ierr = DAGetCorners(user->da,&xs,&ys,PETSC_NULL,&xm,&ym,PETSC_NULL);CHKERRQ(ierr); ierr = DAGetGhostCorners(user->da,&gxs,&gys,PETSC_NULL,&gxm,&gym,PETSC_NULL);CHKERRQ(ierr);

for (j=ys; j<ys+ym; j++) { row = (j - gys)*gxm + xs - gxs - 1; for (i=xs; i<xs+xm; i++) { row++; if (i == 0 || j == 0 || i == mx-1 || j == my-1) {f[row] = x[row]; continue;} Vertex/Element Function Level u = x[row]; uxx = (two*u - x[row-1] - x[row+1])*hydhx; uyy = (two*u - x[row-gxm] - x[row+gxm])*hxdhy; f[row] = uxx + uyy - sc*PetscExpScalar(u); } }

ierr = VecRestoreArray(localX,&x);CHKERRQ(ierr); ierr = VecRestoreArray(localF,&f);CHKERRQ(ierr);

ierr = DALocalToGlobal(user->da,localF,INSERT_VALUES,F);CHKERRQ(ierr); ierr = PLogFlops(11*ym*xm);CHKERRQ(ierr); return 0; }

Page 20: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Parallel Function LevelParallel Function Level

Advantages Well-defined interface:

int Function(SNES, Vec, Vec, void *);void function(integer, Real, N_Vector, N_Vector, void *);

No changes to function Disadvantages

Differentiation of toolkit support functions (may result in unnecessary work)

AD of parallel code (MPI, OpenMP) May need global coloring

Page 21: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Subdomain Function Subdomain Function LevelLevel

Advantages No need to differentiate communication

functions Interface may be well defined

Disadvantages May need local coloring May need to extract from parallel function

Page 22: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Using AD with PETScUsing AD with PETSc

Global-to-local scatter of ghost values

Parallel functionassembly

Local Function computation

Parallel Jacobian assembly

Global-to-local scatter of ghost values

Local Jacobiancomputation

Local Function computation

ADIFOR or ADIC

Local Jacobiancomputation

Script file

Coded manually; can be automated

Seed matrix initialization

Page 23: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Using AD with TAOUsing AD with TAO

Global-to-local scatter of ghost values

Parallel functionassembly

Local Function computation

Parallel gradient assembly

Global-to-local scatter of ghost values

Local gradientcomputation

Local Function computation

ADIFOR or ADIC

Local gradientcomputation

Script file

Coded manually; can be automated

Seed matrix initialization

Page 24: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Element/Vertex Function LevelElement/Vertex Function Level

Advantages Reduced memory requirements No need for matrix coloring

Disadvantages May be difficult to decompose function to this level

(boundary conditions, other special cases) Decomposition to this level may impede efficiency

Page 25: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

ExampleExample int localfunction2d(Field **x,Field **f,int xs, int xm, int ys, int ym, int mx,int my, void *ptr) {

xints = xs; xinte = xs+xm; yints = ys; yinte = ys+ym;

if (yints == 0) { j = 0; yints = yints + 1; for (i=xs; i<xs+xm; i++) { f[j][i].u = x[j][i].u; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega + (x[j+1][i].u - x[j][i].u)*dhy; f[j][i].temp = x[j][i].temp-x[j+1][i].temp; } }

if (yinte == my) { j = my - 1; yinte = yinte - 1; for (i=xs; i<xs+xm; i++) { f[j][i].u = x[j][i].u - lid; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega + (x[j][i].u - x[j-1][i].u)*dhy; f[j][i].temp = x[j][i].temp-x[j-1][i].temp; } }

if (xints == 0) { i = 0; xints = xints + 1; for (j=ys; j<ys+ym; j++) { f[j][i].u = x[j][i].u; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega - (x[j][i+1].v - x[j][i].v)*dhx; f[j][i].temp = x[j][i].temp; } }

if (xinte == mx) { i = mx - 1; xinte = xinte - 1; for (j=ys; j<ys+ym; j++) { f[j][i].u = x[j][i].u; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega - (x[j][i].v - x[j][i-1].v)*dhx; f[j][i].temp = x[j][i].temp - (double)(grashof>0); } }

for (j=yints; j<yinte; j++) { for (i=xints; i<xinte; i++) {

vx = x[j][i].u; avx = PetscAbsScalar(vx); vxp = p5*(vx+avx); vxm = p5*(vx-avx); vy = x[j][i].v; avy = PetscAbsScalar(vy); vyp = p5*(vy+avy); vym = p5*(vy-avy);

u = x[j][i].u; uxx = (two*u - x[j][i-1].u - x[j][i+1].u)*hydhx; uyy = (two*u - x[j-1][i].u - x[j+1][i].u)*hxdhy; f[j][i].u = uxx + uyy - p5*(x[j+1][i].omega-x[j-1][i].omega)*hx;

u = x[j][i].v; uxx = (two*u - x[j][i-1].v - x[j][i+1].v)*hydhx; uyy = (two*u - x[j-1][i].v - x[j+1][i].v)*hxdhy; f[j][i].v = uxx + uyy + p5*(x[j][i+1].omega-x[j][i-1].omega)*hy;

u = x[j][i].omega; uxx = (two*u - x[j][i-1].omega - x[j][i+1].omega)*hydhx; uyy = (two*u - x[j-1][i].omega - x[j+1][i].omega)*hxdhy; f[j][i].omega = uxx + uyy + (vxp*(u - x[j][i-1].omega) + vxm*(x[j][i+1].omega - u)) * hy +(vyp*(u - x[j-1][i].omega) + vym*(x[j+1][i].omega - u)) * hx -p5 * grashof * (x[j][i+1].temp - x[j][i-1].temp) * hy;

u = x[j][i].temp; uxx = (two*u - x[j][i-1].temp - x[j][i+1].temp)*hydhx; uyy = (two*u - x[j-1][i].temp - x[j+1][i].temp)*hxdhy; f[j][i].temp = uxx + uyy + prandtl * ((vxp*(u - x[j][i-1].temp) + vxm*(x[j][i+1].temp - u)) * hy + (vyp*(u - x[j-1][i].temp) + vym*(x[j+1][i].temp - u)) * hx); } }

ierr = PetscLogFlops(84*ym*xm);CHKERRQ(ierr); return 0; }

Page 26: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Experimental ResultsExperimental Results

Toolkit Level – Differentiated PETSc Linear Solver Parallel Nonlinear Function Level – SensPVODE Local Subdomain Function Level

PETSc TAO

Element Function Level – PETSc

Page 27: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Differentiated Linear Differentiated Linear Equation SolverEquation Solver

Page 28: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Increased AccuracyIncreased Accuracy

Page 29: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Increased AccuracyIncreased Accuracy

Page 30: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

SensPVODE: ProblemSensPVODE: Problem

Diurnl kinetics advection-diffusion equation 100x100 structured grid 16 processors of a Linux cluster with 550 MHz

processors and Myrinet interconnect

Page 31: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

SensPVODE: TimeSensPVODE: Time

Page 32: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

SensPVODE: Number of TimestepsSensPVODE: Number of Timesteps

Page 33: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

SensPVODE: Time/TimestepSensPVODE: Time/Timestep

Page 34: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

PETSc ApplicationsPETSc Applications

Toy problems Solid fuel ignition: finite difference discretization; Fortran & C

variants; differentiated using ADIFOR, ADIC Driven cavity: finite difference discretization; C

implementation; differentiated using ADIC Euler code

Based on legacy F77 code from D. Whitfield (MSU) Finite volume discretization Up to 1,121,320 unknowns Mapped C-H grid Fully implicit steady-state Tools: SNES, DA, ADIFOR

Page 35: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

C-H Structured GridC-H Structured Grid

Page 36: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Algorithmic PerformanceAlgorithmic Performance

Page 37: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Real PerformanceReal Performance

Page 38: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Hybrid MethodHybrid Method

Page 39: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

Hybrid Method (cont.)Hybrid Method (cont.)

Page 40: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

TAO: Preliminary ResultsTAO: Preliminary Results

Page 41: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

For More InformationFor More Information

PETSc: http://www.mcs.anl.gov/petsc/ TAO: http://www.mcs.anl.gov/tao/ Automatic Differentiation at Argonne

http://www.mcs.anl.gov/autodiff/ ADIFOR: http://www.mcs.anl.gov/adifor/ ADIC: http://www.mcs.anl.gov/adic/

http://www.autodiff.org

Page 42: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

10 challenges for PDE 10 challenges for PDE optimization algorithmsoptimization algorithms

1.1. Problem sizeProblem size2.2. Efficiency vs. intrusivenessEfficiency vs. intrusiveness3.3. ““Physics-based” globalizationsPhysics-based” globalizations4.4. Inexact PDE solversInexact PDE solvers5.5. Approximate JacobiansApproximate Jacobians6.6. Implicitly-defined PDE residualsImplicitly-defined PDE residuals7.7. Non-smooth solutionsNon-smooth solutions8.8. Pointwise inequality constraintsPointwise inequality constraints9.9. Scalability of adjoint methods to large numbers Scalability of adjoint methods to large numbers

of inequalities of inequalities 10.10. Time-dependent PDE optimization Time-dependent PDE optimization

Page 43: Paul Hovland (Argonne National Laboratory)  Steven Lee (Lawrence Livermore National Laboratory)

7. Non-smoothness7. Non-smoothness

PDE residual may not depend smoothly on state PDE residual may not depend smoothly on state variables (maybe not even continuously)variables (maybe not even continuously) Solution-adaptivitySolution-adaptivity Discontinuity-capturing, front-trackingDiscontinuity-capturing, front-tracking Subgrid-scale modelsSubgrid-scale models Material property evaluationMaterial property evaluation Contact problemsContact problems Elasto(visco)plasticityElasto(visco)plasticity

PDE residual may not depend smoothly or PDE residual may not depend smoothly or continuously on decision variablescontinuously on decision variables Solid modeler- and mesh generator-inducedSolid modeler- and mesh generator-induced