Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory)

Challenges and Opportunities in Using Challenges and Opportunities in Using Automatic Differentiation with Object-Automatic Differentiation with Object-Oriented Toolkits for Scientific ComputingOriented Toolkits for Scientific Computing

Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory)Lois McInnes (ANL)Boyana Norris (ANL)Barry Smith (ANL)

The Computational Differentiation Project at Argonne National Laboratory

AcknowledgmentsAcknowledgments

Jason Abate Satish Balay Steve Benson Peter Brown Omar Ghattas Lisa Grignon William Gropp Alan Hindmarsh David Keyes Jorge Moré Linda Petzold Widodo Samyono

OutlineOutline

Intro to AD Survey of Toolkits

SensPVODE PETSc TAO

Using AD with Toolkits Toolkit Level Parallel Function Level Subdomain Level Element/Vertex Function Level

Experimental Results Conclusions and Expectations

Automatic DifferentiationAutomatic Differentiation

Technique for augmenting code for computing a function with code for computing derivatives

Analytic differentiation of elementary operations/functions, propagation by chain rule

Can be implemented using source transformation or operator overloading

Two main modes Forward: propagates derivatives from independent

to dependent variables Reverse (adjoint): propagates derivatives from

dependent to independent variables

Comparison of MethodsComparison of Methods

Finite Differences Advantages: cheap Jv, easy Disadvantages: inaccurate (not robust)

Hand Differentiation Advantages: accurate; cheap Jv, JTv, Hv, … Disadvantages: hard; difficult to maintain

consistency Automatic Differentiation

Advantages: cheap JTv, Hv; easy? Disadvantages: Jv costs ~2 function evals; hard?

PVODE: Parallel ODE-IVP PVODE: Parallel ODE-IVP solversolver

Algorithm developers:

Hindmarsh, Byrne, Brown and Cohen

ODE Initial-Value Problems

Stiff and non-stiff integrators

Written in C

MPI calls for communication

PVODE for ODE PVODE for ODE simulationssimulations

ODE Initial-Value Problem (standard form):

Implicit time-stepping using BDF methods for y(tn)

Solve nonlinear system for y(tn) via Inexact Newton Solve update to Newton iterate using Krylov methods

. , ,

.)( with ),,,( 00

mNN RpRyRy

ytypytfy

Possible ApproachesPossible Approaches

ad_PVODE

ad_F(y,ad_y,p,ad_p)

y, ad_y|t=0

p,ad_py, ad_y|t=t1, t2, ...

SensPVODEy|t=0

py, dy/dp |t=t1, t2, ...

ad_F(y,ad_y,p,ad_p)

Apply AD to PVODE:

Solve sensitivity eqns:

Sensitivity Differential Sensitivity Differential EquationsEquations

Differentiate y= f(t, y, p) with respect to pi:

A linear ODE-IVP for the sensitivity vector si(t) :

.iii p

f

p

y

y

f

p

y

.0)( with ,)()( 0

tsp

fts

y

fts i

iii

PETScPETSc

Portable, Extensible Toolkit for Scientific computing

Parallel Object-oriented Free Supported (manuals, email) Interfaces with Fortran 77/90, C/C++ Available for virtually all UNIX platforms, as well

as Windows 95/NT Flexible: many options for solver algorithms and

parameters

PETSc codeUser code

ApplicationInitialization

FunctionEvaluation

JacobianEvaluation

Post-Processing

PC KSPPETSc

Linear Solvers (SLES)

Nonlinear Solvers (SNES)

SolveF(u) = 0

Nonlinear PDE SolutionNonlinear PDE Solution

AD-generated code

Main Routine

TAOTAO

Object-oriented techniques Component-based (CCA) interaction Leverage existing parallel computing infrastructure Reuse of external toolkits

The Right Way

The process of nature by which all things change and which is to be followed for a life of harmony

Toolkit for advanced optimization

TAO GoalsTAO Goals

Portability

Performance

Scalable parallelism

An interface independent of architecture

Unconstrained optimization Limited-memory variable-metric method Trust region/line search Newton method Conjugate-gradient method Levenberg-Marquardt method

Bound-constrained optimization Trust region Newton method Gradient projection/conjugate gradient method

Linearly-constrained optimization Interior-point method with iterative solvers

TAO AlgorithmsTAO Algorithms

ApplicationInitialization

Function & GradientEvaluation

HessianEvaluation

Post-Processing

Application Driver

Toolkit for Advanced

Optimization(TAO)PC KSP

Linear SolversMatrices

Vectors

Optimization Tools

TAO codeUser code PETSc code

PETSc and TAOPETSc and TAO

Using AD with ToolkitsUsing AD with Toolkits

Apply AD to toolkit to produce derivative-enhanced toolkit

Use AD to provide Jacobian/Hessian/gradient for use by toolkit. Apply AD at Parallel Function Level Subdomain Function Level Element/Vertex Function Level

Differentiated Version of ToolkitDifferentiated Version of Toolkit

Makes possible sensitivity analysis, black-box optimization of models constructed using toolkit

Can take advantage of high-level structure of algorithms, providing better performance: see Andreas’ and Linda’s talks

Ongoing work with PETSc and PVODE

Levels of Function EvaluationLevels of Function Evaluationint FormFunction(SNES snes,Vec X,Vec F,void *ptr){ Parallel Function Level /* Variable declarations omitted */ mx = user->mx; my = user->my; lambda = user->param; Subdomain Function Level hx = one/(double)(mx-1); hy = one/(double)(my-1); sc = hx*hy*lambda; hxdhy = hx/hy; hydhx = hy/hx;

ierr = DAGlobalToLocalBegin(user->da,X,INSERT_VALUES,localX);CHKERRQ(ierr); ierr = DAGlobalToLocalEnd(user->da,X,INSERT_VALUES,localX);CHKERRQ(ierr);

ierr = VecGetArray(localX,&x);CHKERRQ(ierr); ierr = VecGetArray(localF,&f);CHKERRQ(ierr);

ierr = DAGetCorners(user->da,&xs,&ys,PETSC_NULL,&xm,&ym,PETSC_NULL);CHKERRQ(ierr); ierr = DAGetGhostCorners(user->da,&gxs,&gys,PETSC_NULL,&gxm,&gym,PETSC_NULL);CHKERRQ(ierr);

for (j=ys; j<ys+ym; j++) { row = (j - gys)*gxm + xs - gxs - 1; for (i=xs; i<xs+xm; i++) { row++; if (i == 0 || j == 0 || i == mx-1 || j == my-1) {f[row] = x[row]; continue;} Vertex/Element Function Level u = x[row]; uxx = (two*u - x[row-1] - x[row+1])*hydhx; uyy = (two*u - x[row-gxm] - x[row+gxm])*hxdhy; f[row] = uxx + uyy - sc*PetscExpScalar(u); } }

ierr = VecRestoreArray(localX,&x);CHKERRQ(ierr); ierr = VecRestoreArray(localF,&f);CHKERRQ(ierr);

ierr = DALocalToGlobal(user->da,localF,INSERT_VALUES,F);CHKERRQ(ierr); ierr = PLogFlops(11*ym*xm);CHKERRQ(ierr); return 0; }

Parallel Function LevelParallel Function Level

Advantages Well-defined interface:

int Function(SNES, Vec, Vec, void *);void function(integer, Real, N_Vector, N_Vector, void *);

No changes to function Disadvantages

Differentiation of toolkit support functions (may result in unnecessary work)

AD of parallel code (MPI, OpenMP) May need global coloring

Subdomain Function Subdomain Function LevelLevel

Advantages No need to differentiate communication

functions Interface may be well defined

Disadvantages May need local coloring May need to extract from parallel function

Using AD with PETScUsing AD with PETSc

Global-to-local scatter of ghost values

Parallel functionassembly

Local Function computation

Parallel Jacobian assembly


Local Jacobiancomputation


ADIFOR or ADIC

Local Jacobiancomputation

Script file

Coded manually; can be automated

Seed matrix initialization

Using AD with TAOUsing AD with TAO


Parallel functionassembly


Parallel gradient assembly


Local gradientcomputation


ADIFOR or ADIC

Local gradientcomputation

Script file

Coded manually; can be automated

Seed matrix initialization

Element/Vertex Function LevelElement/Vertex Function Level

Advantages Reduced memory requirements No need for matrix coloring

Disadvantages May be difficult to decompose function to this level

(boundary conditions, other special cases) Decomposition to this level may impede efficiency

ExampleExample int localfunction2d(Field **x,Field **f,int xs, int xm, int ys, int ym, int mx,int my, void *ptr) {

xints = xs; xinte = xs+xm; yints = ys; yinte = ys+ym;

if (yints == 0) { j = 0; yints = yints + 1; for (i=xs; i<xs+xm; i++) { f[j][i].u = x[j][i].u; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega + (x[j+1][i].u - x[j][i].u)*dhy; f[j][i].temp = x[j][i].temp-x[j+1][i].temp; } }

if (yinte == my) { j = my - 1; yinte = yinte - 1; for (i=xs; i<xs+xm; i++) { f[j][i].u = x[j][i].u - lid; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega + (x[j][i].u - x[j-1][i].u)*dhy; f[j][i].temp = x[j][i].temp-x[j-1][i].temp; } }

if (xints == 0) { i = 0; xints = xints + 1; for (j=ys; j<ys+ym; j++) { f[j][i].u = x[j][i].u; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega - (x[j][i+1].v - x[j][i].v)*dhx; f[j][i].temp = x[j][i].temp; } }

if (xinte == mx) { i = mx - 1; xinte = xinte - 1; for (j=ys; j<ys+ym; j++) { f[j][i].u = x[j][i].u; f[j][i].v = x[j][i].v; f[j][i].omega = x[j][i].omega - (x[j][i].v - x[j][i-1].v)*dhx; f[j][i].temp = x[j][i].temp - (double)(grashof>0); } }

for (j=yints; j<yinte; j++) { for (i=xints; i<xinte; i++) {

vx = x[j][i].u; avx = PetscAbsScalar(vx); vxp = p5*(vx+avx); vxm = p5*(vx-avx); vy = x[j][i].v; avy = PetscAbsScalar(vy); vyp = p5*(vy+avy); vym = p5*(vy-avy);

u = x[j][i].u; uxx = (two*u - x[j][i-1].u - x[j][i+1].u)*hydhx; uyy = (two*u - x[j-1][i].u - x[j+1][i].u)*hxdhy; f[j][i].u = uxx + uyy - p5*(x[j+1][i].omega-x[j-1][i].omega)*hx;

u = x[j][i].v; uxx = (two*u - x[j][i-1].v - x[j][i+1].v)*hydhx; uyy = (two*u - x[j-1][i].v - x[j+1][i].v)*hxdhy; f[j][i].v = uxx + uyy + p5*(x[j][i+1].omega-x[j][i-1].omega)*hy;

u = x[j][i].omega; uxx = (two*u - x[j][i-1].omega - x[j][i+1].omega)*hydhx; uyy = (two*u - x[j-1][i].omega - x[j+1][i].omega)*hxdhy; f[j][i].omega = uxx + uyy + (vxp*(u - x[j][i-1].omega) + vxm*(x[j][i+1].omega - u)) * hy +(vyp*(u - x[j-1][i].omega) + vym*(x[j+1][i].omega - u)) * hx -p5 * grashof * (x[j][i+1].temp - x[j][i-1].temp) * hy;

u = x[j][i].temp; uxx = (two*u - x[j][i-1].temp - x[j][i+1].temp)*hydhx; uyy = (two*u - x[j-1][i].temp - x[j+1][i].temp)*hxdhy; f[j][i].temp = uxx + uyy + prandtl * ((vxp*(u - x[j][i-1].temp) + vxm*(x[j][i+1].temp - u)) * hy + (vyp*(u - x[j-1][i].temp) + vym*(x[j+1][i].temp - u)) * hx); } }

ierr = PetscLogFlops(84*ym*xm);CHKERRQ(ierr); return 0; }

Experimental ResultsExperimental Results

Toolkit Level – Differentiated PETSc Linear Solver Parallel Nonlinear Function Level – SensPVODE Local Subdomain Function Level

PETSc TAO

Element Function Level – PETSc

Differentiated Linear Differentiated Linear Equation SolverEquation Solver

Increased AccuracyIncreased Accuracy

SensPVODE: ProblemSensPVODE: Problem

Diurnl kinetics advection-diffusion equation 100x100 structured grid 16 processors of a Linux cluster with 550 MHz

processors and Myrinet interconnect

SensPVODE: TimeSensPVODE: Time

SensPVODE: Number of TimestepsSensPVODE: Number of Timesteps

SensPVODE: Time/TimestepSensPVODE: Time/Timestep

PETSc ApplicationsPETSc Applications

Toy problems Solid fuel ignition: finite difference discretization; Fortran & C

variants; differentiated using ADIFOR, ADIC Driven cavity: finite difference discretization; C

implementation; differentiated using ADIC Euler code

Based on legacy F77 code from D. Whitfield (MSU) Finite volume discretization Up to 1,121,320 unknowns Mapped C-H grid Fully implicit steady-state Tools: SNES, DA, ADIFOR

C-H Structured GridC-H Structured Grid

Algorithmic PerformanceAlgorithmic Performance

Real PerformanceReal Performance

Hybrid MethodHybrid Method

Hybrid Method (cont.)Hybrid Method (cont.)

TAO: Preliminary ResultsTAO: Preliminary Results

For More InformationFor More Information

PETSc: http://www.mcs.anl.gov/petsc/ TAO: http://www.mcs.anl.gov/tao/ Automatic Differentiation at Argonne

http://www.mcs.anl.gov/autodiff/ ADIFOR: http://www.mcs.anl.gov/adifor/ ADIC: http://www.mcs.anl.gov/adic/

http://www.autodiff.org

10 challenges for PDE 10 challenges for PDE optimization algorithmsoptimization algorithms

1.1. Problem sizeProblem size2.2. Efficiency vs. intrusivenessEfficiency vs. intrusiveness3.3. ““Physics-based” globalizationsPhysics-based” globalizations4.4. Inexact PDE solversInexact PDE solvers5.5. Approximate JacobiansApproximate Jacobians6.6. Implicitly-defined PDE residualsImplicitly-defined PDE residuals7.7. Non-smooth solutionsNon-smooth solutions8.8. Pointwise inequality constraintsPointwise inequality constraints9.9. Scalability of adjoint methods to large numbers Scalability of adjoint methods to large numbers

of inequalities of inequalities 10.10. Time-dependent PDE optimization Time-dependent PDE optimization

7. Non-smoothness7. Non-smoothness

PDE residual may not depend smoothly on state PDE residual may not depend smoothly on state variables (maybe not even continuously)variables (maybe not even continuously) Solution-adaptivitySolution-adaptivity Discontinuity-capturing, front-trackingDiscontinuity-capturing, front-tracking Subgrid-scale modelsSubgrid-scale models Material property evaluationMaterial property evaluation Contact problemsContact problems Elasto(visco)plasticityElasto(visco)plasticity

PDE residual may not depend smoothly or PDE residual may not depend smoothly or continuously on decision variablescontinuously on decision variables Solid modeler- and mesh generator-inducedSolid modeler- and mesh generator-induced

Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory)

Documents

Transcript of Paul Hovland (Argonne National Laboratory) Steven Lee (Lawrence Livermore National Laboratory)