Introduction to Algorithmic Differentiation · 2012-08-13 · Introduction to Algorithmic...
Transcript of Introduction to Algorithmic Differentiation · 2012-08-13 · Introduction to Algorithmic...
-
Introduction to Algorithmic Differentiation
Kshitij Kulshreshtha
Universitat Paderborn
ESSIM Summer School13. August - 16. August 2012
Dresden
K. Kulshreshtha 1 / 94 Intro to AD ESSIM Summer School
-
Schedule
1. Introduction/Motivation/Function Representation/Forward mode2. Reverse Mode/Higher order Derivatives3. Implementations/ADOL-C4. Hand on Exercises (ADOL-C)
K. Kulshreshtha 2 / 94 Intro to AD ESSIM Summer School
-
Computing Derivatives
SimulationSensitivity
CalculationOptimization
K. Kulshreshtha 3 / 94 Intro to AD ESSIM Summer School
-
Computing Derivatives
SimulationSensitivity
CalculationOptimization
data
theory
user
input x
output y
modelling computerprogram
K. Kulshreshtha 3 / 94 Intro to AD ESSIM Summer School
-
Computing Derivatives
SimulationSensitivity
CalculationOptimization
data
theory
user
differentiation
input x
output y
modelling computerprogram
enhancedprogram
K. Kulshreshtha 3 / 94 Intro to AD ESSIM Summer School
-
Computing Derivatives
SimulationSensitivity
CalculationOptimization
data
theory
user user
differentiation
input x
output y
input x
output y
modelling computerprogram
enhancedprogram
optimizationalgorithm
sensitivityyx
K. Kulshreshtha 3 / 94 Intro to AD ESSIM Summer School
-
Computing DerivativesGiven:
Description of functional relation asI formula y = F (x) explicit expression y = F (x)I computer program ?
Task:
Computation of derivatives takingI requirements on exactnessI computational effort
into account
K. Kulshreshtha 4 / 94 Intro to AD ESSIM Summer School
-
Computing DerivativesGiven:
Description of functional relation asI formula y = F (x) explicit expression y = F (x)I computer program ?
Task:
Computation of derivatives takingI requirements on exactnessI computational effort
into account
K. Kulshreshtha 4 / 94 Intro to AD ESSIM Summer School
-
Derivation: What for?
K. Kulshreshtha 5 / 94 Intro to AD ESSIM Summer School
-
Derivation: What for?I Optimization:
unbounded: min f (x), f : Rn Rbounded: min f (x), f : Rn R
c(x) = 0, c : Rn Rmh(x) 0, h : Rn Rl
I Solution of nonlinear equation systemsF (x) = 0, F : Rn Rn
Newton method requires F (x) RnnI Simulation of complex procedures
I system definitionI integration of differential equations
I Sensitivity analysisI Real-time control
K. Kulshreshtha 5 / 94 Intro to AD ESSIM Summer School
-
Derivation: What for?I Optimization:
unbounded: min f (x), f : Rn Rbounded: min f (x), f : Rn R
c(x) = 0, c : Rn Rmh(x) 0, h : Rn Rl
I Solution of nonlinear equation systemsF (x) = 0, F : Rn Rn
Newton method requires F (x) Rnn
I Simulation of complex proceduresI system definitionI integration of differential equations
I Sensitivity analysisI Real-time control
K. Kulshreshtha 5 / 94 Intro to AD ESSIM Summer School
-
Derivation: What for?I Optimization:
unbounded: min f (x), f : Rn Rbounded: min f (x), f : Rn R
c(x) = 0, c : Rn Rmh(x) 0, h : Rn Rl
I Solution of nonlinear equation systemsF (x) = 0, F : Rn Rn
Newton method requires F (x) RnnI Simulation of complex procedures
I system definitionI integration of differential equations
I Sensitivity analysisI Real-time control
K. Kulshreshtha 5 / 94 Intro to AD ESSIM Summer School
-
Derivation: What for?I Optimization:
unbounded: min f (x), f : Rn Rbounded: min f (x), f : Rn R
c(x) = 0, c : Rn Rmh(x) 0, h : Rn Rl
I Solution of nonlinear equation systemsF (x) = 0, F : Rn Rn
Newton method requires F (x) RnnI Simulation of complex procedures
I system definitionI integration of differential equations
I Sensitivity analysisI Real-time control
K. Kulshreshtha 5 / 94 Intro to AD ESSIM Summer School
-
Symbolic Differentiation
K. Kulshreshtha 6 / 94 Intro to AD ESSIM Summer School
-
Symbolic Differentiation
x2 2x
K. Kulshreshtha 6 / 94 Intro to AD ESSIM Summer School
-
Symbolic Differentiation
x2 2xxp pxp1
K. Kulshreshtha 6 / 94 Intro to AD ESSIM Summer School
-
Symbolic Differentiation
x2 2xxp pxp1cos(x) sin(x)
K. Kulshreshtha 6 / 94 Intro to AD ESSIM Summer School
-
Symbolic Differentiation
x2 2xxp pxp1cos(x) sin(x)exp(x) exp(x)
K. Kulshreshtha 6 / 94 Intro to AD ESSIM Summer School
-
Symbolic Differentiation
x2 2xxp pxp1cos(x) sin(x)exp(x) exp(x)log(x) 1x
K. Kulshreshtha 6 / 94 Intro to AD ESSIM Summer School
-
Symbolic Differentiation
x2 2xxp pxp1cos(x) sin(x)exp(x) exp(x)log(x) 1xg(h(x)) g(h(x)) h(x)
...
K. Kulshreshtha 6 / 94 Intro to AD ESSIM Summer School
-
Symbolic Differentiation
I exact derivativesI f (x) = exp(sin(x2))
f (x) = exp(sin(x2)) cos(x2) 2xI min J(x , u) such that x = f (x , u) + IC
reduced formulation: J(x , u) J(u)
J (u) based on symbolic adjoint = fx (x , u)> + TC
I cost (common subexpression, implementation)
I legacy code with large number of linesclosed form expression not available
K. Kulshreshtha 7 / 94 Intro to AD ESSIM Summer School
-
Symbolic Differentiation
I exact derivativesI f (x) = exp(sin(x2))
f (x) = exp(sin(x2)) cos(x2) 2xI min J(x , u) such that x = f (x , u) + IC
reduced formulation: J(x , u) J(u)
J (u) based on symbolic adjoint = fx (x , u)> + TCI cost (common subexpression, implementation)
I legacy code with large number of linesclosed form expression not available
K. Kulshreshtha 7 / 94 Intro to AD ESSIM Summer School
-
read_input_file(argv[1], &code_control);
code_control.timestep_type = 0; // calculate timestep size like TAU
// read in CFD mesh read_cfd_mesh(code_control.CFDmesh_name, &gridbase); grid[0] = gridbase; // remove mesh corner points arizing more than once . . . // e.g. for block structured area and at interface between // block structured and unstructured area remove_double_points( &gridbase, grid); // write out mesh in tecplot format write_pointdata( name, &(grid[0]));
// calculate metric of finest grid level/* grid[0].xp[ii][1] += 0.00000001; */ calc_metric(&(grid[0]), &code_control); puts("calc_metric ready"); // create coarse meshes for multigrid, calculate their metric // and initialze forcing functions to zero for (i = 1; i < code_control.nlevels; i++) { create_coarse_mesh(&(grid[i1]), &(grid[i])); init2zero(&(grid[i]), grid[i].force); } puts("create_coarse_mesh ready"); // initialize flow field on all grid levels to free stream // quantities for (i = 0; i < code_control.nlevels; i++) init_field(&(grid[i]), &code_control); puts("init_field ready"); // if selected read restart file if (code_control.restart == 1) read_restart( "restart", grid, &code_control, &first_residual, &first_step);
// calculate primitive variables for all grid levels and // initialize states at the boundary for (i = 0; i < code_control.nlevels; i++) { cons2prim(&(grid[i]), &code_control); init_bdry_states(&(grid[i])); } // open file for writing convergence history conv = fopen("conv.dat", "w"); fprintf(conv, "title = convergence\n"); fprintf(conv, "variables = iter, l2res, lift, drag\n");
level = 0; printf("will perform %d steps\n",code_control.nsteps[level]);
// starting time of computation t1 = time(&t1);
double lift, drag;
// loop over all multigrid cycles
Jan 01, 08 21:46 Seite 29/30euler2d.c
for (it = 0; it < code_control.nsteps[level]; it++) { double residual; lift = 0.0; drag = 0.0;
// calculate actual weight of gradient needed for reconstruction if (sum_it+first_step 1.0e10) ? residual: 1.0;
// print out convergence information to file and standard output printf("IT = %d %20.10e %20.10e %20.10e %4.2f\n", sum_it, residual / first_residual, lift, drag, weight); fprintf(conv, "%d %20.10e %20.10e %20.10e\n", sum_it+first_step, residual / first_residual, lift, drag); sum_it++; }
// final time of computation t2 = time(&t2); // print out time needed for the time loop printf ("Zeit : %f\n", difftime(t2, t1)); last_step = first_step + code_control.nsteps[0] ;
fclose(conv);
// map solution from cell centers to vertices center2point(grid); // write out field solution write_eulerdata( "euler.dat", grid, &code_control); // write out solution on walls write_surf( "eulersurf.dat", grid, &code_control); // write restart file write_restart( "restart", grid, &code_control, first_residual, last_step);
return 0;}
Jan 01, 08 21:46 Seite 30/30euler2d.c
Gedruckt von Andrea Walther
Dienstag Januar 01, 2008 15/15euler2d.c
K. Kulshreshtha 8 / 94 Intro to AD ESSIM Summer School
-
Finite Differences
Idea: Taylor-expansion, f : R R smooth then
f (x + h) = f (x) + h(x) + h2f (x)/2 + h3f (x)/6 + f (x + h) f (x) + hf (x)
Df (x) f (x + h) f (x)h
I simple derivative calculation (only function evaluations!)I inexact derivativesI computation cost often too high
F : Rn R OPS(F (x)) (n + 1)OPS(F (x))
K. Kulshreshtha 9 / 94 Intro to AD ESSIM Summer School
-
Finite Differences
Idea: Taylor-expansion, f : R R smooth then
f (x + h) = f (x) + h(x) + h2f (x)/2 + h3f (x)/6 + f (x + h) f (x) + hf (x)
Df (x) f (x + h) f (x)h
I simple derivative calculation (only function evaluations!)I inexact derivativesI computation cost often too high
F : Rn R OPS(F (x)) (n + 1)OPS(F (x))
K. Kulshreshtha 9 / 94 Intro to AD ESSIM Summer School
-
Example 1
K. Kulshreshtha 10 / 94 Intro to AD ESSIM Summer School
-
Example 1
K. Kulshreshtha 11 / 94 Intro to AD ESSIM Summer School
-
Example 1
K. Kulshreshtha 12 / 94 Intro to AD ESSIM Summer School
-
Example 1
K. Kulshreshtha 13 / 94 Intro to AD ESSIM Summer School
-
Example 1
K. Kulshreshtha 14 / 94 Intro to AD ESSIM Summer School
-
Example 1
K. Kulshreshtha 15 / 94 Intro to AD ESSIM Summer School
-
Example 2
K. Kulshreshtha 16 / 94 Intro to AD ESSIM Summer School
-
Example 2
K. Kulshreshtha 17 / 94 Intro to AD ESSIM Summer School
-
Example 2
K. Kulshreshtha 18 / 94 Intro to AD ESSIM Summer School
-
Example 2
K. Kulshreshtha 19 / 94 Intro to AD ESSIM Summer School
-
Example 2
K. Kulshreshtha 20 / 94 Intro to AD ESSIM Summer School
-
Algorithmic Differentiation (AD)
Main Products:I Quantitative dependence information (local):
I Weighted and directed partial derivativesI Error and condition number estimates . . .I Lipschitz constants, interval enclosures . . .I Eigenvalues, Newton steps . . .
I Qualitative dependence information (regional):I Sparsity structures, degrees of polynomialsI Ranks, eigenvalue multiplicities . . .
K. Kulshreshtha 21 / 94 Intro to AD ESSIM Summer School
-
Algorithmic Differentiation (AD)
Main Products:I Quantitative dependence information (local):
I Weighted and directed partial derivativesI Error and condition number estimates . . .I Lipschitz constants, interval enclosures . . .I Eigenvalues, Newton steps . . .
I Qualitative dependence information (regional):I Sparsity structures, degrees of polynomialsI Ranks, eigenvalue multiplicities . . .
K. Kulshreshtha 21 / 94 Intro to AD ESSIM Summer School
-
Algorithmic Differentation (AD)
Main Properties:I No truncation errors
I Chain rule applied to numbersI Applicability to arbitrary programsI A priori bounded and/or adjustable costs:
I Total operations countI Maximal memory requirementI Total memory traffic
always relative to original function.
K. Kulshreshtha 22 / 94 Intro to AD ESSIM Summer School
-
Algorithmic Differentation (AD)
Main Properties:I No truncation errorsI Chain rule applied to numbers
I Applicability to arbitrary programsI A priori bounded and/or adjustable costs:
I Total operations countI Maximal memory requirementI Total memory traffic
always relative to original function.
K. Kulshreshtha 22 / 94 Intro to AD ESSIM Summer School
-
Algorithmic Differentation (AD)
Main Properties:I No truncation errorsI Chain rule applied to numbersI Applicability to arbitrary programs
I A priori bounded and/or adjustable costs:I Total operations countI Maximal memory requirementI Total memory traffic
always relative to original function.
K. Kulshreshtha 22 / 94 Intro to AD ESSIM Summer School
-
Algorithmic Differentation (AD)
Main Properties:I No truncation errorsI Chain rule applied to numbersI Applicability to arbitrary programsI A priori bounded and/or adjustable costs:
I Total operations countI Maximal memory requirementI Total memory traffic
always relative to original function.
K. Kulshreshtha 22 / 94 Intro to AD ESSIM Summer School
-
Algorithmic Differentation (AD)
Main Properties:I No truncation errorsI Chain rule applied to numbersI Applicability to arbitrary programsI A priori bounded and/or adjustable costs:
I Total operations countI Maximal memory requirementI Total memory traffic
always relative to original function. (Griewank, Walther 09)
K. Kulshreshtha 22 / 94 Intro to AD ESSIM Summer School
-
The Hello-World-Example of AD
Lighthouse
quay
y1 = tan( t) tan( t) and y2 =
tan( t) tan( t)
K. Kulshreshtha 23 / 94 Intro to AD ESSIM Summer School
-
The Hello-World-Example of AD
Lighthouse
quay
y1 = tan( t) tan( t) and y2 =
tan( t) tan( t)
K. Kulshreshtha 23 / 94 Intro to AD ESSIM Summer School
-
The Hello-World-Example of AD
quay
Lighthouse
y1
y2
t
y2 = y1
y1 = tan( t) tan( t) and y2 =
tan( t) tan( t)
K. Kulshreshtha 23 / 94 Intro to AD ESSIM Summer School
-
The Hello-World-Example of AD
quay
Lighthouse
y1
y2
t
y2 = y1
y1 = tan( t) tan( t) and y2 =
tan( t) tan( t)
K. Kulshreshtha 23 / 94 Intro to AD ESSIM Summer School
-
Automatisation:Computer tolls as Maple, Mathematica, . . .
Maple provides
K. Kulshreshtha 24 / 94 Intro to AD ESSIM Summer School
-
Automatisation:Computer tolls as Maple, Mathematica, . . .
Maple provides
K. Kulshreshtha 24 / 94 Intro to AD ESSIM Summer School
-
Automatisation:Computer tolls as Maple, Mathematica, . . .
Maple provides
K. Kulshreshtha 24 / 94 Intro to AD ESSIM Summer School
-
Automatisation:Computer tolls as Maple, Mathematica, . . .
Maple provides
K. Kulshreshtha 24 / 94 Intro to AD ESSIM Summer School
-
Forward Mode of AD
x(t)y(t)
x
yF
F
y(t) = t F (x(t)) = F(x(t)) x(t) F (x , x)
K. Kulshreshtha 25 / 94 Intro to AD ESSIM Summer School
-
Forward Mode(Lighthouse)v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = t
v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5 v2y1 = v5y2 = v6
v3 = x1v2 = x2v1 = x3v0 = x4v1 = v1 v0 + v1 v0v2 = v1/ cos(v1)2
v3 = v2 v2v4 = v3 v2 + v3 v2v5 = (v4 v3 v5) (1/v3)v6 = v5 v2 + v5 v2y1 = v5y2 = v6
K. Kulshreshtha 26 / 94 Intro to AD ESSIM Summer School
-
Forward Mode(Lighthouse)v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = t
v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5 v2y1 = v5y2 = v6
v3 = x1v2 = x2v1 = x3v0 = x4
v1 = v1 v0 + v1 v0v2 = v1/ cos(v1)2
v3 = v2 v2v4 = v3 v2 + v3 v2v5 = (v4 v3 v5) (1/v3)v6 = v5 v2 + v5 v2y1 = v5y2 = v6
K. Kulshreshtha 26 / 94 Intro to AD ESSIM Summer School
-
Forward Mode(Lighthouse)v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = t
v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5 v2y1 = v5y2 = v6
v3 = x1v2 = x2v1 = x3v0 = x4v1 = v1 v0 + v1 v0
v2 = v1/ cos(v1)2
v3 = v2 v2v4 = v3 v2 + v3 v2v5 = (v4 v3 v5) (1/v3)v6 = v5 v2 + v5 v2y1 = v5y2 = v6
K. Kulshreshtha 26 / 94 Intro to AD ESSIM Summer School
-
Forward Mode(Lighthouse)v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = t
v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5 v2y1 = v5y2 = v6
v3 = x1v2 = x2v1 = x3v0 = x4v1 = v1 v0 + v1 v0v2 = v1/ cos(v1)2
v3 = v2 v2v4 = v3 v2 + v3 v2v5 = (v4 v3 v5) (1/v3)v6 = v5 v2 + v5 v2y1 = v5y2 = v6
K. Kulshreshtha 26 / 94 Intro to AD ESSIM Summer School
-
Forward Mode(Lighthouse)v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = t
v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5 v2y1 = v5y2 = v6
v3 = x1v2 = x2v1 = x3v0 = x4v1 = v1 v0 + v1 v0v2 = v1/ cos(v1)2
v3 = v2 v2
v4 = v3 v2 + v3 v2v5 = (v4 v3 v5) (1/v3)v6 = v5 v2 + v5 v2y1 = v5y2 = v6
K. Kulshreshtha 26 / 94 Intro to AD ESSIM Summer School
-
Forward Mode(Lighthouse)v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = t
v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5 v2y1 = v5y2 = v6
v3 = x1v2 = x2v1 = x3v0 = x4v1 = v1 v0 + v1 v0v2 = v1/ cos(v1)2
v3 = v2 v2v4 = v3 v2 + v3 v2
v5 = (v4 v3 v5) (1/v3)v6 = v5 v2 + v5 v2y1 = v5y2 = v6
K. Kulshreshtha 26 / 94 Intro to AD ESSIM Summer School
-
Forward Mode(Lighthouse)v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = t
v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5 v2y1 = v5y2 = v6
v3 = x1v2 = x2v1 = x3v0 = x4v1 = v1 v0 + v1 v0v2 = v1/ cos(v1)2
v3 = v2 v2v4 = v3 v2 + v3 v2v5 = (v4 v3 v5) (1/v3)
v6 = v5 v2 + v5 v2y1 = v5y2 = v6
K. Kulshreshtha 26 / 94 Intro to AD ESSIM Summer School
-
Forward Mode(Lighthouse)v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = t
v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5 v2y1 = v5y2 = v6
v3 = x1v2 = x2v1 = x3v0 = x4v1 = v1 v0 + v1 v0v2 = v1/ cos(v1)2
v3 = v2 v2v4 = v3 v2 + v3 v2v5 = (v4 v3 v5) (1/v3)v6 = v5 v2 + v5 v2
y1 = v5y2 = v6
K. Kulshreshtha 26 / 94 Intro to AD ESSIM Summer School
-
Forward Mode(Lighthouse)v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = t
v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5 v2y1 = v5y2 = v6
v3 = x1v2 = x2v1 = x3v0 = x4v1 = v1 v0 + v1 v0v2 = v1/ cos(v1)2
v3 = v2 v2v4 = v3 v2 + v3 v2v5 = (v4 v3 v5) (1/v3)v6 = v5 v2 + v5 v2y1 = v5y2 = v6
K. Kulshreshtha 26 / 94 Intro to AD ESSIM Summer School
-
Complexity (Forward Mode)
tang c MOVES 1 + 1 3 + 3 3 + 3 2 + 2ADDS 0 1 + 1 0 + 1 0 + 0MULTS 0 0 1 + 2 0 + 1NLOPS 0 0 0 1 + 1
OPS(F (x)x) c OPS(F (x))
with c [2,5/2] platform dependent
K. Kulshreshtha 27 / 94 Intro to AD ESSIM Summer School
-
Reverse Mode AD = Discrete Adjoints
x
F
y
x y>F (x) = x y>F (x) F (x , y)
K. Kulshreshtha 28 / 94 Intro to AD ESSIM Summer School
-
Reverse Mode AD = Discrete Adjoints
x
F
y
y
y >y
=c
x y>F (x) = x y>F (x) F (x , y)
K. Kulshreshtha 28 / 94 Intro to AD ESSIM Summer School
-
Reverse Mode AD = Discrete Adjoints
x
x
F
Fy
y
y >F
(x)=
c
y >y
=c
x y>F (x) = x y>F (x) F (x , y)
K. Kulshreshtha 28 / 94 Intro to AD ESSIM Summer School
-
Reverse Mode AD = Discrete Adjoints
x
x
F
Fy
y
y >F
(x)=
c
y >y
=c
x y>F (x) = x y>F (x) F (x , y)
K. Kulshreshtha 28 / 94 Intro to AD ESSIM Summer School
-
Reverse Mode (Lighthouse)v3 = x1; v2 = x2; v1 = x3; v0 = x4;
v1 = v1 v0;v2 = tan(v1);
v3 = v2 v2;v4 = v3 v2;
v5 = v4/v3;v6 = v5 v2;
y1 = v5; y2 = v6;v5 = y1; v6 = y2;
v5 += v6v2; v2 += v6v5;v4 += v5/v3; v3 = v5v5/v3;
v3 += v4v2; v2 += v4v3;v2 += v3; v2 = v3;
v1 += v2/ cos2(v1);v1 += v1v0; v0 += v1v1;
x4 = v0; x3 = v1; x2 = v2; x1 = v3;
K. Kulshreshtha 29 / 94 Intro to AD ESSIM Summer School
-
Complexity (Reverse Mode)grad c
MOVES 1 + 1 3 + 6 3 + 8 2 + 5ADDS 0 1 + 2 0 + 2 0 + 1MULTS 0 0 1 + 2 0 + 1NLOPS 0 0 0 1 + 1
OPS(y>F (x)) c OPS(F (x))MEM(y>F (x)) OPS(F (x))
with c [3,4] platform dependent
Remarks:I Cost for gradient calculation independent of nI Memory requirement may cause problem! Checkpointing
K. Kulshreshtha 30 / 94 Intro to AD ESSIM Summer School
-
Complexity (Reverse Mode)grad c
MOVES 1 + 1 3 + 6 3 + 8 2 + 5ADDS 0 1 + 2 0 + 2 0 + 1MULTS 0 0 1 + 2 0 + 1NLOPS 0 0 0 1 + 1
OPS(y>F (x)) c OPS(F (x))MEM(y>F (x)) OPS(F (x))
with c [3,4] platform dependentRemarks:
I Cost for gradient calculation independent of nI Memory requirement may cause problem! Checkpointing
K. Kulshreshtha 30 / 94 Intro to AD ESSIM Summer School
-
Algorithmic Differentiation (AD)Differentiation of computer programs within machine precision
Basic forms:
forward mode: OPS(F (x)x) c1 OPS(F ), c1 [2,5/2]reverse mode: OPS(y>F (x)) c2 OPS(F ), c2 [3,4]
MEM(y> F (x)) OPS(F )combination: OPS(y>F (x)x) c3 OPS(F ), c3 [7,10]
Tasks:
Derivatives (of any order) sparsity patterns, differentiation of fixpointiterations and time-stepping procedures . . .Implementations?
K. Kulshreshtha 31 / 94 Intro to AD ESSIM Summer School
-
Algorithmic Differentiation (AD)Differentiation of computer programs within machine precision
Basic forms:
forward mode: OPS(F (x)x) c1 OPS(F ), c1 [2,5/2]reverse mode: OPS(y>F (x)) c2 OPS(F ), c2 [3,4]
MEM(y> F (x)) OPS(F )combination: OPS(y>F (x)x) c3 OPS(F ), c3 [7,10]
Tasks:
Derivatives (of any order) sparsity patterns, differentiation of fixpointiterations and time-stepping procedures . . .
Implementations?
K. Kulshreshtha 31 / 94 Intro to AD ESSIM Summer School
-
Algorithmic Differentiation (AD)Differentiation of computer programs within machine precision
Basic forms:
forward mode: OPS(F (x)x) c1 OPS(F ), c1 [2,5/2]reverse mode: OPS(y>F (x)) c2 OPS(F ), c2 [3,4]
MEM(y> F (x)) OPS(F )combination: OPS(y>F (x)x) c3 OPS(F ), c3 [7,10]
Tasks:
Derivatives (of any order) sparsity patterns, differentiation of fixpointiterations and time-stepping procedures . . .Implementations?
K. Kulshreshtha 31 / 94 Intro to AD ESSIM Summer School
-
Intermediate Conclusions
K. Kulshreshtha 32 / 94 Intro to AD ESSIM Summer School
-
Intermediate Conclusions
I Evaluation of derivatives with working accuracy
I Efficient evaluation of derivatives of any order
I Numerous tools available,based on source transformation or operator overloading
I Nondifferentiability only on meager set
I Optimal Jacobi accumulation is (NP-)hard
I Full Jacobians/Hessians often not needed or sparse
I see www.autodiff.org
K. Kulshreshtha 32 / 94 Intro to AD ESSIM Summer School
-
Intermediate Conclusions
I Evaluation of derivatives with working accuracy
I Efficient evaluation of derivatives of any order
I Numerous tools available,based on source transformation or operator overloading
I Nondifferentiability only on meager set
I Optimal Jacobi accumulation is (NP-)hard
I Full Jacobians/Hessians often not needed or sparse
I see www.autodiff.org
K. Kulshreshtha 32 / 94 Intro to AD ESSIM Summer School
-
Intermediate Conclusions
I Evaluation of derivatives with working accuracy
I Efficient evaluation of derivatives of any order
I Numerous tools available,based on source transformation or operator overloading
I Nondifferentiability only on meager set
I Optimal Jacobi accumulation is (NP-)hard
I Full Jacobians/Hessians often not needed or sparse
I see www.autodiff.org
K. Kulshreshtha 32 / 94 Intro to AD ESSIM Summer School
-
Forward Mode of AD
x(t)y(t)
x
yF
F
y(t) = t F (x(t)) = F(x(t)) x(t) F (x , x)
K. Kulshreshtha 33 / 94 Intro to AD ESSIM Summer School
-
Evaluation procedure for:
F (x) = exp(sin(x2))
ddt
F (x) = exp(sin(x2)) cos(x2) 2x x
v0 = x v0 = xv1 = v20 v1 = 2v0 v0v2 = sin(v1) v2 = cos(v1) v1v3 = exp(v2) v3 = exp(v2) v2y = v3 y = v3 = ddt F (x)
K. Kulshreshtha 34 / 94 Intro to AD ESSIM Summer School
-
General: With ui (vj )ji
v = (ui ) v = i (ui )ui
Elementary Tangents for Instrinsic Functions
v = sin(u) v = cos(u) uv =
u v = 0.5 u/vv = tan(u) v = u/ cos2(u)
Elementary Tangents for Arithmetic Operations
v = u w v = u wv = u w v = w u + u wv = v/w v = (u v w)/w
K. Kulshreshtha 35 / 94 Intro to AD ESSIM Summer School
-
Forward Differentiation of Lighthouse Examplev3 = x1 = v3 x1v2 = x2 = v2 x2v1 = x3 = v1 x3v0 = x4 = t v0 x4v1 = v1 v0v2 = tan(v1)v3 = v2 v2v4 = v3 v2v5 = v4/v3v6 = v5v7 = v5 v2y1 = v6y2 = v7
K. Kulshreshtha 36 / 94 Intro to AD ESSIM Summer School
-
Forward Differentiation of Lighthouse Examplev3 = x1 = v3 x1v2 = x2 = v2 x2v1 = x3 = v1 x3v0 = x4 = t v0 x4v1 = v1 v0 v1 = v1 v0 + v1 v0v2 = tan(v1) v2 = v1/ cos(v1)2
v3 = v2 v2 v3 = v2 v2v4 = v3 v2 v4 = v3 v2 + v3 v2v5 = v4/v3 v5 = (v4 v3 v5) (1/v3)v6 = v5 v6 = v5v7 = v5 v2 v7 = v5 v2 + v5 v2y1 = v6y2 = v7
K. Kulshreshtha 36 / 94 Intro to AD ESSIM Summer School
-
Forward Differentiation of Lighthouse Examplev3 = x1 = v3 x1v2 = x2 = v2 x2v1 = x3 = v1 x3v0 = x4 = t v0 x4v1 = v1 v0 v1 = v1 v0 + v1 v0v2 = tan(v1) v2 = v1/ cos(v1)2
v3 = v2 v2 v3 = v2 v2v4 = v3 v2 v4 = v3 v2 + v3 v2v5 = v4/v3 v5 = (v4 v3 v5) (1/v3)v6 = v5 v6 = v5v7 = v5 v2 v7 = v5 v2 + v5 v2y1 = v6 y1 = v6y2 = v7 y2 = v7
K. Kulshreshtha 36 / 94 Intro to AD ESSIM Summer School
-
Definei (ui , ui ) i (ui )ui
General Tangent Procedure
[vin, vin] = [xi , xi ] i = 1, . . . ,n
[vi , vi ] = [i (ui ), i (ui )] i = 1, . . . , l
[ymi , ymi ] = [vli , vli ] i = m 1, . . . ,0
K. Kulshreshtha 37 / 94 Intro to AD ESSIM Summer School
-
Complexity:
tang c MOVES 1 + 1 3 + 3 3 + 3 2 + 2ADDS 0 1 + 1 0 + 1 0 + 0MULTS 0 0 1 + 2 0 + 1NLOPS 0 0 0 1 + 1
OPS(F (x)x) c OPS(F (x))
with c [2,5/2] platform dependent
Remarks:I How to arange evaluation of [vi , vi ] optimal?I Overwrites are no problem
K. Kulshreshtha 38 / 94 Intro to AD ESSIM Summer School
-
Complexity:
tang c MOVES 1 + 1 3 + 3 3 + 3 2 + 2ADDS 0 1 + 1 0 + 1 0 + 0MULTS 0 0 1 + 2 0 + 1NLOPS 0 0 0 1 + 1
OPS(F (x)x) c OPS(F (x))
with c [2,5/2] platform dependent
Remarks:I How to arange evaluation of [vi , vi ] optimal?I Overwrites are no problem
K. Kulshreshtha 38 / 94 Intro to AD ESSIM Summer School
-
Reverse Mode of AD
x
F
y
x> y>F (x) = x y>F (x) F (x , y)
K. Kulshreshtha 39 / 94 Intro to AD ESSIM Summer School
-
Reverse Mode of AD
x
F
y
y
y >y
=c
x> y>F (x) = x y>F (x) F (x , y)
K. Kulshreshtha 39 / 94 Intro to AD ESSIM Summer School
-
Reverse Mode of AD
x
x F
F
y
y
y >F
(x)=
c
y >y
=c
x> y>F (x) = x y>F (x) F (x , y)
K. Kulshreshtha 39 / 94 Intro to AD ESSIM Summer School
-
Reverse Mode of AD
x
x F
F
y
y
y >F
(x)=
c
y >y
=c
x> y>F (x) = x y>F (x) F (x , y)
K. Kulshreshtha 39 / 94 Intro to AD ESSIM Summer School
-
Evaluation Procedure:
F (x) = exp(sin(x2))yF (x) = y exp(sin(x2)) cos(x2) 2x
v0 = x v3 = yv1 = v20 v2 = exp(v2) v3v2 = sin(v1) v1 = cos(v1) v2v3 = exp(v2) v0 = 2v0 v1y = v3 x = v0
K. Kulshreshtha 40 / 94 Intro to AD ESSIM Summer School
-
Computation of yF (x)?
Rewrite general tangent procedure
[vin, vin] = [xi , xi ] i = 1, . . . ,n
[vi , vi ] = [i (ui ), i (ui )] i = 1, . . . , l
[ymi , ymi ] = [vli , vli ] i = m 1, . . . ,0
in matrix-vector notation.
K. Kulshreshtha 41 / 94 Intro to AD ESSIM Summer School
-
Define
cij ivj
Ai
1 0 0 0...
......
......
......
...0 0 1 0ci 1n ci 2n ci i1 0 00 0 1 0...
......
......
......
...0 0 1
R(n+l)(n+l)
= I + en+i [i (ui ) en+i ]T i = 1, . . . , lPn [I,0, . . . ,0] Rn(n+l),
Qm [0, . . . , I] Rm(n+l).
K. Kulshreshtha 42 / 94 Intro to AD ESSIM Summer School
-
Tangent Operation vi = i (ui )ui equals v = Ai v
Linear transformation y = F (x) x equals
y = QmAlAl1 . . .A2A1PTn x ,
with the product representation
F (x) = QmAlAl1 . . .A2A1PTn Rmn .
linear transformation x = y F (x) (reverse mode) equals
xT = PnAT1 AT2 . . .A
Tl1A
Tl Q
Tmy
T ,
K. Kulshreshtha 43 / 94 Intro to AD ESSIM Summer School
-
Because ofATi = I + [i (ui ) en+i ]eTn+i ,
product ATi vj with vj Rn+l forms incremental operation:
1. Index i 6= j 6 i vj is left unchanged
2. Index j i vj is incremented by vi cij3. Index i vi is set to zero
Hence, for computing x = y F (x) one has toI evaluate all intermediate vi obtaining partial derivatives cijI increment adjoint values vi in reverse order.
K. Kulshreshtha 44 / 94 Intro to AD ESSIM Summer School
-
Corresponding Incremental Adjoint Recursion
vi = 0 i = 1 n, . . . , lvin = xi i = 1, . . . ,nvi = i (vj )ji i = 1, . . . , lymi = vli i = m 1, . . . ,0vli = ymi i = 0, . . . ,m 1vj + = vi vj
i (ui ) for j i i = l , . . . ,1xi = vin i = n, . . . ,1
No overwriting Zeroing out of the vi is omitted.
K. Kulshreshtha 45 / 94 Intro to AD ESSIM Summer School
-
Adjoint Recursion of Lighthouse-Examplev3 = x1; v2 = x2; v1 = x3; v0 = x4;
v1 = v1 v0;v2 = tan(v1);
v3 = v2 v2;v4 = v3 v2;
v5 = v4/v3;v6 = v5 v2;
y1 = v5; y2 = v6;v5 = y1; v6 = y2;
v5 += v6v2; v2 += v6v5;v4 += v5/v3; v3 += v5v5/v3;
v3 += v4v2; v2 += v4v3;v2 += v3; v2 = v3;
v1 += v2/ cos2(v1);v1 += v1v0; v0 += v1v1;
x4 = v0; x3 = v1; x2 = v2; x1 = v3;
K. Kulshreshtha 46 / 94 Intro to AD ESSIM Summer School
-
Reverse Operations of ElementalsElemental Reverse
v = c v = 0
v = u w u += v ; w = v ; v = 0
v = u w u += v w ; w += v u; v = 0
v = 1/u u= (v v) v ; v = 0
v =
u u += 0.5 v/v ; v = 0
v = uc u += (v c) v/u; v = 0
v = exp (u) u += v v ; v = 0
v = log (u) u += v/u; v = 0
v = sin (u) u += v cos (u); v = 0
K. Kulshreshtha 47 / 94 Intro to AD ESSIM Summer School
-
Complexity:
grad c MOVES 1 + 1 3 + 6 3 + 8 2 + 5ADDS 0 1 + 2 0 + 2 0 + 1MULTS 0 0 1 + 2 0 + 1NLOPS 0 0 0 1 + 1
OPS(yF (x)) c OPS(F (x))
with c [3,5] platform dependentRemark:
I Overwrites cause problem! See Checkpointing.
K. Kulshreshtha 48 / 94 Intro to AD ESSIM Summer School
-
Interpretation of adjoint values vi :I Lagrange multipliers of
Min yys.t. (ui ) vi = 0 i = 1, . . . , l
vli ymi = 0 i = 1, . . . ,mI Error propagation factors
vi = vi (1 + i ), |i | macheps
y y =l
i=1
vivii + H.O.T
l
i=1
|vivi |+ H.O.T
K. Kulshreshtha 49 / 94 Intro to AD ESSIM Summer School
-
Vector ModesHow to reduce overhead in forward and reverse mode?
Propagate a bundle of vectors instead of a single vector!
Forward Mode:
Compute Y = F (x)X Rmp for X Rnp
instead of
y = F (x)x Rm for x Rn
I Replace vj R by Vj Rp in tangent procedureI Replace z R(lm) by Z R(lm)pI Everything else remains unchanged
K. Kulshreshtha 50 / 94 Intro to AD ESSIM Summer School
-
Vector Tangents:
v = u w V = U Wv = u w V = w U + u Wv = sin(u) V = cos(u) U
Complexity:
tangp c MOVES 1 + p 3 + 3p 3 + 3p 2 + 2pADDS 0 1 + p 0 + p 0 + 0MULTS 0 0 1 + 2p 0 + pNLOPS 0 0 0 2
OPS(F (x)X ) c OPS(F (x))
with c [1 + p,1 + 1.5p]
K. Kulshreshtha 51 / 94 Intro to AD ESSIM Summer School
-
Applications:I Computation of full Jacobian
Set X = In
OPS(F (x)) (1 + 1.5n)OPS(F (x))
Compare with scalar mode, i.e. F (x)ei , i = 1, . . . ,n:
OP(f (x)) 2.5nOPS(F (x))
Remarkable run-time savings can be obtainedI Computation of sparse Jacobians
K. Kulshreshtha 52 / 94 Intro to AD ESSIM Summer School
-
Reverse mode:
Compute X = YF (x) Rqn for Y Rqm
instead of
x = yF (x) Rn for y Rm
Implementation:I Replace vi R by Vi Rq in adjoint recursionI Everything else remains unchanged
K. Kulshreshtha 53 / 94 Intro to AD ESSIM Summer School
-
Complexity:
gradq c MOVES 1 + q 3 + 6q 5 + 6q 3 + 4qADDS 0 1 + 2q 0 + 2q 0 + qMULTS 0 0 1 + 2q 0 + qNLOPS 0 0 0 2
OPS(YF (x)) k OPS(F (x))
with c [1 + 2q,1.5 + 2.5q]
Remarks:I Computation of full JacobianI Remarkable run-time savings can be obtainedI Application: e.g. matrix compression for sparse Jacobians
K. Kulshreshtha 54 / 94 Intro to AD ESSIM Summer School
-
General Evaluation Procedure:
vin = xi for i = 1, . . . ,n
vi = i (vj )ji for i = 1, . . . , l
ymi = vli for i = m 1, . . . ,0
whereI vi , i 0, are the independentsI vi , i l m + 1, are the dependentsI i is elemental function, = set of elemental functionsI j i is dependence relation: vi depends directly on vj
K. Kulshreshtha 55 / 94 Intro to AD ESSIM Summer School
-
Essential Optional Vector
Smooth u + v , u v u v , u/v k uk vku, c c u, c u k ck ukrec(u) 1/u ukexp(u), log(u) uc , arcsin(u)sin(u), cos(u) tan(u)|u|c , c > 1 . . .
Lipschitz abs(u) |u| max(u, v) max{|uk |}k ,
k |uk ||u, v |
u2+v2 min(u, v)
k u
2k
General heav(u) sign(u)(u > 0)?v , w
K. Kulshreshtha 56 / 94 Intro to AD ESSIM Summer School
-
Representation as computational graph
v3
v2
v1
v0
v1 v2
v4 v5
v3
v6
v7
Computational graph of lighthouse example
K. Kulshreshtha 57 / 94 Intro to AD ESSIM Summer School
-
Representation as extended system E(x ; v) = 0
I Defineui (vj )ji for i = 1 n, . . . , li (ui ) xi+n for i = 1 n, . . . ,0cij ivj called elemental partials
I Assume dependents variables as mutually independent
Then one has thatI i < 1 or j > l m implies cij 0I Evaluation procedure equivalent to nonlinear system
0 = E(x ; v) (i (ui ) vi )i=1n,...,l
K. Kulshreshtha 58 / 94 Intro to AD ESSIM Summer School
-
Topics to keep in mind:
I Composite Differentiation (overall differentiability):Here it is assumed that all i are differentiable atrespective argument.
I Overwriting = Shared Allocation of vi and vjNot considered here, but:There are ways to handle them.
I Aliasing of Arguments and Results:Not considered here, but:Again there are ways to handle them.
K. Kulshreshtha 59 / 94 Intro to AD ESSIM Summer School
-
Main Results:
I Gradients are cheap function costsI Costs grow only quadratically in degree
I Nondifferentiability only on meager setI Optimal Jacobi-accumulation is (NP-)hardI Jacobi-accumulation is often unneccessary
K. Kulshreshtha 60 / 94 Intro to AD ESSIM Summer School
-
Main Results:
I Gradients are cheap function costsI Costs grow only quadratically in degreeI Nondifferentiability only on meager set
I Optimal Jacobi-accumulation is (NP-)hardI Jacobi-accumulation is often unneccessary
K. Kulshreshtha 60 / 94 Intro to AD ESSIM Summer School
-
Main Results:
I Gradients are cheap function costsI Costs grow only quadratically in degreeI Nondifferentiability only on meager setI Optimal Jacobi-accumulation is (NP-)hard
I Jacobi-accumulation is often unneccessary
K. Kulshreshtha 60 / 94 Intro to AD ESSIM Summer School
-
Main Results:
I Gradients are cheap function costsI Costs grow only quadratically in degreeI Nondifferentiability only on meager setI Optimal Jacobi-accumulation is (NP-)hardI Jacobi-accumulation is often unneccessary
K. Kulshreshtha 60 / 94 Intro to AD ESSIM Summer School
-
Higher Order DerivativesSecond derivatives:
Use forward and reverse differentiation together!
y = F (x)reverse
differentiation x = yF (x)
x = yF (x)forward
differentiation x = yF (x)x + yF (x)
For scalar-valued function F , y = 1, y = 0, and x one has
x = F (x)x = Hessian-vector product
K. Kulshreshtha 61 / 94 Intro to AD ESSIM Summer School
-
Implementation
x1
d1
d2
bx
y4
Obj
y1
y2
Sourcecode
y = F (x) Rm
K. Kulshreshtha 62 / 94 Intro to AD ESSIM Summer School
-
Implementation
d2
Sourcecode
Object Code
y = F (x)
y = F (x) Rm
y = F (x) x Rm
forwa
rd
(x, x) R2n
K. Kulshreshtha 62 / 94 Intro to AD ESSIM Summer School
-
Implementation
Sourcecode
Object Code Object Code
y = F (x)
y = F (x) Rm
y = F (x) x Rm
forwa
rdreverse
(x, x) R2n
Intermediatevalues
Stack (Tape)
x = yF (x) Rn
(x, y) Rn+m
K. Kulshreshtha 62 / 94 Intro to AD ESSIM Summer School
-
Implementation
Sourcecode
Object Code Object Code
y = F (x)
y = F (x) Rm
y = F (x) x Rm
forwa
rdreverse
(x, x) R2n
Intermediatevalues
Stack (Tape)
x = yF (x) Rnx = yF (x) x Rn
(x, y) Rn+m
K. Kulshreshtha 62 / 94 Intro to AD ESSIM Summer School
-
Higher derivatives: Think in Taylor polynomialsLet x be the path
x(t) d
j=0
xj t j : R 7 R with xj =1j! j
t jx(t)
t=0
HenceI xj is scaled derivative at t = 0.I x1 = tangent,x2 = curvature.
Now:Consider F : Rn Rm,d times continuously differentiable y(t) F (x(t)) : R 7 Rm is smooth (chain-rule)
K. Kulshreshtha 63 / 94 Intro to AD ESSIM Summer School
-
There exist Taylor coefficients yj Rm, 0 j d , with
y(t) =d
j=0
yj t j + O(td+1).
Using the chain-rule, one obtains:
y0 = F (x0)
y1 = F (x0)x1
y2 = F (x0)x2 +12
F (x0)x1x1
y3 = F (x0)x3 + F (x0)x1x2 +16
F (x0)x1x1x1
directional derivatives of high order (forward mode)
OPS(x0, . . . , xd y0, . . . , yd ) d2OPS(x0 y0)
using Taylor arithmetic
K. Kulshreshtha 64 / 94 Intro to AD ESSIM Summer School
-
Cross Country / Preaccumulation
Consider again Jacobian of the extended system E(x ; v):
E (x ; v) =
I 0 0B L I 0R T 1
with F (x) = R + T (I L)1B = Schur complement
I Forward mode: Row-Block-Elimination of T by L II Reverse mode: Column-Block-Elimination of B by L II Cross country: Apply other elimination order
K. Kulshreshtha 65 / 94 Intro to AD ESSIM Summer School
-
Another InterpretationElimination in Jacobian of extended system =Vertex elimination in computational graph:
4
1
+
Example for forward and reverse mode:
0
1
1 2 3 4
6
5
c 1,0
c1,
1 c2,1 c3,2 c4,3
c5,1
c 5,4
c6,0
c6,4
K. Kulshreshtha 66 / 94 Intro to AD ESSIM Summer School
-
Another InterpretationElimination in Jacobian of extended system =Vertex elimination in computational graph:
4
1
+
Example for forward and reverse mode:
0
1
1 2 3 4
6
5
c 1,0
c1,
1 c2,1 c3,2 c4,3
c5,1
c 5,4
c6,0
c6,4
K. Kulshreshtha 66 / 94 Intro to AD ESSIM Summer School
-
AD by Source Transformation
Code for Simulation Code for Derivatives
func.src 1. Lexical analysis2. Syntax analysis3. Semantic analysis4. Static data flow analyses (Symbol table, Error handler)5. AD !!!6. Code optimization7. Unparsing
func&der.drc
K. Kulshreshtha 67 / 94 Intro to AD ESSIM Summer School
-
AD by Source Transformation
Code for Simulation Code for Derivatives
func.src 1. Lexical analysis2. Syntax analysis3. Semantic analysis4. Static data flow analyses (Symbol table, Error handler)5. AD !!!6. Code optimization7. Unparsing
func&der.drc
K. Kulshreshtha 67 / 94 Intro to AD ESSIM Summer School
-
Remarks
Pros:I Make use of compiler optimization !!I Derivative code is readable
Cons:I Requires complete compiler activity analysis, new language features
I Flexibility
K. Kulshreshtha 68 / 94 Intro to AD ESSIM Summer School
-
Speelpennings Example
Speelpennings function f (x) =n
i=1 xi coded in Fortran:
SUBROUTINE speelpenning ( x , y )DOUBLE PRECISION x (10 ) , yINTEGER i
y=1.0DO i =1 ,10
y=yx ( i )END DOEND
K. Kulshreshtha 69 / 94 Intro to AD ESSIM Summer School
-
Forward Mode Differentiation
Requirements for AD tool:I determine active variablesI accociate derivative objectsI generate differntiated statementsI generate differentiated subroutines if required
K. Kulshreshtha 70 / 94 Intro to AD ESSIM Summer School
-
Forward Mode Differentiation
SUBROUTINE SPEELPENNING D( x , xd , y , ydDOUBLE PRECISION x (10 ) , xd (10 ) , y , ydINTEGER i
y=1.0yd =0.D0DO i =1 ,10
y=yd=ydx ( i )+ yxd ( i )y=yx ( i )
END DOEND
K. Kulshreshtha 71 / 94 Intro to AD ESSIM Summer School
-
Reverse Mode Differentiation
Requirements for AD tool:I determine active variablesI associate derivative objectsI generate adjoint statementsI determine values to be recordedI reverse control flowI generate recording part of the code
K. Kulshreshtha 72 / 94 Intro to AD ESSIM Summer School
-
Reverse Mode DifferentiationSUBROUTINE SPEELPENNING B( x , xb , y , yb )DOUBLE PRECISION x (10 ) , xb (10 ) , y , ybINTEGER i
y=1.0DO i =1 ,10
CALL PUSHREAL8( y )y=yx ( i )
END DOCALL PUSHINTEGER4( i 1)CALL POPINTEGER4( adTo )DO i =adTo,1 ,1
CALL POPREAL8( y )xb ( i )= xb ( i ) + yybyb=x ( i ) yb
END DOyb =0.D0END
K. Kulshreshtha 73 / 94 Intro to AD ESSIM Summer School
-
Source Transformation ToolsFortran:
AMPL Bell Lab, USA forward + reverseNAG RWTH Aachen, germany + NAG forward + reverseTAF FastOpt, Germany forward + reverseTapenade INRIA, France forward + reverseOpenAD ANL, USA, RWTH Aachen, germany forward. . .
All tools provide at most second derivatives !Goal: Get AD into the compiler
Matlab:ADIMAT RWTH Aachen, Germany forwardMAD (in TOMS) Univ. Shrivenham, GB forward. . .
K. Kulshreshtha 74 / 94 Intro to AD ESSIM Summer School
-
Source Transformation ToolsFortran:
AMPL Bell Lab, USA forward + reverseNAG RWTH Aachen, germany + NAG forward + reverseTAF FastOpt, Germany forward + reverseTapenade INRIA, France forward + reverseOpenAD ANL, USA, RWTH Aachen, germany forward. . .
All tools provide at most second derivatives !Goal: Get AD into the compiler
Matlab:ADIMAT RWTH Aachen, Germany forwardMAD (in TOMS) Univ. Shrivenham, GB forward. . .
K. Kulshreshtha 74 / 94 Intro to AD ESSIM Summer School
-
AD by Opeator OverloadingImplementation strategies:
I New variable type for active variables,i.e., for independents, dependents, intermediates (!)
I Derivative information included asI field in composite structuresI generation of internal function representation
Principle: New class definition
type ad double = c lassVAL : double ;???
end ;
K. Kulshreshtha 75 / 94 Intro to AD ESSIM Summer School
-
AD by Field in Composite StructureIdea for forward mode:
type ad double = c lassVAL : double ;DER : array [ 1 . . p ] of double ;
end ;
All operations are overloaded, e.g., multiplication:
opera tor (U,V : ad double ) W : ad double ;i : integer ;begin
W.VAL := U. VAL V.VAL ;for i := 1 to p do
W.DER[ i ] := U.VALV.DER[ i ] + U.DER[ i ]V.VAL ;end ;
K. Kulshreshtha 76 / 94 Intro to AD ESSIM Summer School
-
AD by Field in Composite StructureIdea for forward mode:
type ad double = c lassVAL : double ;DER : array [ 1 . . p ] of double ;
end ;
All operations are overloaded, e.g., multiplication:
opera tor (U,V : ad double ) W : ad double ;i : integer ;begin
W.VAL := U.VAL V.VAL ;for i := 1 to p do
W.DER[ i ] := U.VALV.DER[ i ] + U.DER[ i ]V.VAL ;end ;
K. Kulshreshtha 76 / 94 Intro to AD ESSIM Summer School
-
Remarks:I Implements basic rules of differentations for
I each arithmetic operationI each intrinsic function
I Easy to realizetapeless forward mode of ADOL-C
Idea for reverse mode: Graph in coreNeed of backward pointer for calculating adjoints new node type:
. . .
VAL IND
DER ADJ
K. Kulshreshtha 77 / 94 Intro to AD ESSIM Summer School
-
Remarks:I Implements basic rules of differentations for
I each arithmetic operationI each intrinsic function
I Easy to realizetapeless forward mode of ADOL-C
Idea for reverse mode: Graph in coreNeed of backward pointer for calculating adjoints new node type:
. . .
VAL IND
DER ADJ
K. Kulshreshtha 77 / 94 Intro to AD ESSIM Summer School
-
W := U * V generates
U.VAL U.IND
V.VAL V.IND
V.ADJV.DER
U.ADJU.DER
W.VAL
W.DER
W.IND
W.ADJ
Data dependency in opposite direction !!
K. Kulshreshtha 78 / 94 Intro to AD ESSIM Summer School
-
W := U * V generates
U.VAL U.IND
V.VAL V.IND
V.ADJV.DER
U.ADJU.DER
W.VAL
W.DER
W.IND
W.ADJ
Data dependency in opposite direction !!
K. Kulshreshtha 78 / 94 Intro to AD ESSIM Summer School
-
Remarks:I Calculation of derivatives only possible after
function evaluation at same point
I Complete computational graph is kept in coreAmount of internal storage may cause problems
I Access on nodes largely at random
Conclusion:Derivative calculation in active class OK for forward mode butnot practicable for reverse mode on large problems !!
K. Kulshreshtha 79 / 94 Intro to AD ESSIM Summer School
-
Remarks:I Calculation of derivatives only possible after
function evaluation at same point
I Complete computational graph is kept in coreAmount of internal storage may cause problems
I Access on nodes largely at random
Conclusion:Derivative calculation in active class OK for forward mode butnot practicable for reverse mode on large problems !!
K. Kulshreshtha 79 / 94 Intro to AD ESSIM Summer School
-
AD by Internal RepresentationIdea:
Class adouble {VAL : double ;IND : i n t e g e r ; }
W = U * V causes recording onto three tapes:INDEXCOUNT += 1W.IND = INDEXCOUNTW.VAL = U.VAL * V.VALW.VAL VALUE-TAPEMULT OPERATION-TAPE(U.IND, V.IND, W.IND) INDEX-TAPE
K. Kulshreshtha 80 / 94 Intro to AD ESSIM Summer School
-
AD by Internal RepresentationIdea:
Class adouble {VAL : double ;IND : i n t e g e r ; }
W = U * V causes recording onto three tapes:INDEXCOUNT += 1W.IND = INDEXCOUNTW.VAL = U.VAL * V.VALW.VAL VALUE-TAPEMULT OPERATION-TAPE(U.IND, V.IND, W.IND) INDEX-TAPE
K. Kulshreshtha 80 / 94 Intro to AD ESSIM Summer School
-
Remarks for Tape-based Derivatives
I Mechanical application of chain-rule based on the tapes
I Data access on tapes strictly sequential
I Tapes can be reused for function / derivativecalculation at different points
I Several tapes for different control flows depending onactive variables
K. Kulshreshtha 81 / 94 Intro to AD ESSIM Summer School
-
General Remarks for Operator Overloading
Pros:I Only choice for C/C++I Great flexibility with respect to mode/orderI Simple to maintain
Cons:I Compiler optimization may be lostI Only object code for derivatives
K. Kulshreshtha 82 / 94 Intro to AD ESSIM Summer School
-
Operator Overloading Tools
C/C++:ADO1 Cranfield Uni., GB forward + reverseADOL-C Uni. Paderborn, Germany forward + reverseCppAD Uni. Washington, USA forward + reverseFADBAD TU Denmark forward + reverseTADIFF TU Denmark forward...
Matlab:ADMAT Cayuga Research Associates, USA forward + reverse...
K. Kulshreshtha 83 / 94 Intro to AD ESSIM Summer School
-
Operator Overloading Tools
C/C++:ADO1 Cranfield Uni., GB forward + reverseADOL-C Uni. Paderborn, Germany forward + reverseCppAD Uni. Washington, USA forward + reverseFADBAD TU Denmark forward + reverseTADIFF TU Denmark forward...
Matlab:ADMAT Cayuga Research Associates, USA forward + reverse...
K. Kulshreshtha 83 / 94 Intro to AD ESSIM Summer School
-
Conclusions
I Evaluation of derivatives with working accuracy
I Forward mode: OPS(F (x)x) c OPS(F ), c [2,5/2]I Reverse mode: OPS(y> F (x)) c OPS(F ), c [3,4]
Gradients are cheap Function Costs!!
I Combination: OPS(y>,F (x)x) c OPS(F ), c [7,10]I Cost of higher derivatives grows quadratically in the degree
K. Kulshreshtha 84 / 94 Intro to AD ESSIM Summer School
-
Conclusions
I Evaluation of derivatives with working accuracy
I Efficient evaluation of derivatives of any order
I Numerous tools available,based on source transformation or operator overloading
I Nondifferentiability only on meager set
I Optimal Jacobi accumulation is (NP-)hard
I Full Jacobians/Hessians often not needed or sparse
I see www.autodiff.org
K. Kulshreshtha 84 / 94 Intro to AD ESSIM Summer School
-
Conclusions
I Evaluation of derivatives with working accuracy
I Efficient evaluation of derivatives of any order
I Numerous tools available,based on source transformation or operator overloading
I Nondifferentiability only on meager set
I Optimal Jacobi accumulation is (NP-)hard
I Full Jacobians/Hessians often not needed or sparse
I see www.autodiff.org
K. Kulshreshtha 84 / 94 Intro to AD ESSIM Summer School
-
Conclusions
I Evaluation of derivatives with working accuracy
I Efficient evaluation of derivatives of any order
I Numerous tools available,based on source transformation or operator overloading
I Nondifferentiability only on meager set
I Optimal Jacobi accumulation is (NP-)hard
I Full Jacobians/Hessians often not needed or sparse
I see www.autodiff.org
K. Kulshreshtha 84 / 94 Intro to AD ESSIM Summer School
-
Automatic differentiation by overloading in C++
ADOL-C version 2.1
I reorganization of tapingtape dependent information kept in separate structures
I different differentiation contextsI documented external function facilityI documented fixpoint iteration facilityI documented checkpointing facility based on revolve
I documented parallelization of derivative calculationI coupled with ColPack for exploitation of sparsityI available at COIN-OR since May 2009
K. Kulshreshtha 85 / 94 Intro to AD ESSIM Summer School
-
Source Code ModificationsI Include needed header-files
easiest way: #include adolc.h
I Define region that has to be differentiated:
trace on(tag,keep); Start of... active sectiontrace off(file); and its end
I Mark independents and dependents in active section:
xa = yp; mark dependents
I Declare all active variables of type adoubleI Calculate derivative objects after trace off(file)
K. Kulshreshtha 86 / 94 Intro to AD ESSIM Summer School
-
Source Code ModificationsI Include needed header-files
easiest way: #include adolc.hI Define region that has to be differentiated:
trace on(tag,keep); Start of... active sectiontrace off(file); and its end
I Mark independents and dependents in active section:
xa = yp; mark dependents
I Declare all active variables of type adoubleI Calculate derivative objects after trace off(file)
K. Kulshreshtha 86 / 94 Intro to AD ESSIM Summer School
-
Source Code ModificationsI Include needed header-files
easiest way: #include adolc.hI Define region that has to be differentiated:
trace on(tag,keep); Start of... active sectiontrace off(file); and its end
I Mark independents and dependents in active section:
xa = yp; mark dependents
I Declare all active variables of type adoubleI Calculate derivative objects after trace off(file)
K. Kulshreshtha 86 / 94 Intro to AD ESSIM Summer School
-
Source Code ModificationsI Include needed header-files
easiest way: #include adolc.hI Define region that has to be differentiated:
trace on(tag,keep); Start of... active sectiontrace off(file); and its end
I Mark independents and dependents in active section:
xa = yp; mark dependents
I Declare all active variables of type adouble
I Calculate derivative objects after trace off(file)
K. Kulshreshtha 86 / 94 Intro to AD ESSIM Summer School
-
Source Code ModificationsI Include needed header-files
easiest way: #include adolc.hI Define region that has to be differentiated:
trace on(tag,keep); Start of... active sectiontrace off(file); and its end
I Mark independents and dependents in active section:
xa = yp; mark dependents
I Declare all active variables of type adoubleI Calculate derivative objects after trace off(file)
K. Kulshreshtha 86 / 94 Intro to AD ESSIM Summer School
-
Evaluation Procedure (Lighthouse)
y1 = tan(t)tan(t)
y2 = tan(t)tan(t)
v3 = x1 = v2 = x2 = v1 = x3 = v0 = x4 = tv1 = v1 v0 1(v1, v0)v2 = tan(v1) 2(v1)v3 = v2 v2 3(v2, v2)v4 = v3 v2 4(v3, v2)v5 = v4/v3 5(v4, v3)v6 = v5 v2 6(v5, v2)y1 = v5y2 = v6
K. Kulshreshtha 87 / 94 Intro to AD ESSIM Summer School
-
Model Generation with ADOL-C
v
v
v
v
x1
1
x2
x3x4
2
3 4
5
6
v
v
K. Kulshreshtha 88 / 94 Intro to AD ESSIM Summer School
-
Model Generation with ADOL-C
v
v
v
v
x1
1
x2
x3x4
2
3 4
5
6
v
v
double x1, x2, x3, x4;double v1, v2, . . .double y1, y2;
x1 = 3.7; x2 = 0.7;x3 = 0.5; x4 = 0.5;
v1 = x3*x4;v2 = tan(v1);v3 = x2-v2;v4 = x1*v2;v5 = v4/v3;v6 = v5*x2
y1 = v5; y2 = v6
K. Kulshreshtha 88 / 94 Intro to AD ESSIM Summer School
-
Model Generation with ADOL-C
v
v
v
v
x1
1
x2
x3x4
2
3 4
5
6
v
v
adouble x1, x2, x3, x4;adouble v1, v2, . . .double y1, y2;
trace on(tag);x1= 3.7; x2= 0.7;x3= 0.5; x4= 0.5;v1 = x3*x4;v2 = tan(v1);v3 = x2-v2;v4 = x1*v2;v5 = v4/v3;v6 = v5*x2
v5= y1; v6= y2;trace off();
K. Kulshreshtha 88 / 94 Intro to AD ESSIM Summer School
-
Model Generation with ADOL-C
v
v
v
v
x1
1
x2
x3x4
2
3 4
5
6
v
v
adouble x1, x2, x3, x4;adouble v1, v2, . . .double y1, y2;
trace on(tag);x1= 3.7; x2= 0.7;x3= 0.5; x4= 0.5;v1 = x3*x4;v2 = tan(v1);v3 = x2-v2;v4 = x1*v2;v5 = v4/v3;v6 = v5*x2
v5= y1; v6= y2;trace off();
-
Easy-to-use routinesint gradient (tag,n,x[n],g[n]): tag = tape number
n = # indepsx[n] = values of indepsg[n] = f (x)
int jacobian (tag,m,n,x[n],J[m][n]): tag = tape numberm = # depsn = # indepsx[n] = values of indepsJ[m][n] = F (x)
int hessian (tag,n,x[n],H[n][n]) tag = tape numbern = # indepsx[n] = values of indepsH[n][n] = 2f (x)
K. Kulshreshtha 89 / 94 Intro to AD ESSIM Summer School
-
Easy-to-use routinesint gradient (tag,n,x[n],g[n]): tag = tape number
n = # indepsx[n] = values of indepsg[n] = f (x)
int jacobian (tag,m,n,x[n],J[m][n]): tag = tape numberm = # depsn = # indepsx[n] = values of indepsJ[m][n] = F (x)
int hessian (tag,n,x[n],H[n][n]) tag = tape numbern = # indepsx[n] = values of indepsH[n][n] = 2f (x)
K. Kulshreshtha 89 / 94 Intro to AD ESSIM Summer School
-
Easy-to-use routinesint gradient (tag,n,x[n],g[n]): tag = tape number
n = # indepsx[n] = values of indepsg[n] = f (x)
int jacobian (tag,m,n,x[n],J[m][n]): tag = tape numberm = # depsn = # indepsx[n] = values of indepsJ[m][n] = F (x)
int hessian (tag,n,x[n],H[n][n]) tag = tape numbern = # indepsx[n] = values of indepsH[n][n] = 2f (x)
K. Kulshreshtha 89 / 94 Intro to AD ESSIM Summer School
-
Further drivers for optimization:
I vec_jac(tag,m,n,repeat,x[n],u[m],z[n])
Computes z = uT F (x)I jac_vec(tag,m,n,x[n],v[n],z[n])
Computes z = F (x)vI hess_vec(tag,n,x[n],v[n],z[n])
Computes z = 2f (x)vI lagra_hess_vec(tag,n,m,x[n],v[n],u[m],h[n])
Computes h = uT F (x)vExtension to uT F (x)V available
I jac_solv(tag,n,x[n],b[n], sparse, mode)Computes w with F (x)w = b and store result in b
K. Kulshreshtha 90 / 94 Intro to AD ESSIM Summer School
vec_jac(tag, m, n, repeat, x[n], u[m], z[n])jac_vec(tag, m, n, x[n], v[n], z[n])hess_vec(tag, n, x[n], v[n], z[n])lagra_hess_vec(tag, n, m, x[n], v[n], u[m], h[n])jac_solv(tag, n, x[n], b[n]
-
Drivers for sparse Derivative MatricesJacobian:sparse_jac(tag,m,n,repeat,x,&nnz,&row_ind,&col_ind,
&values,options);
jac_pat(tag,m,n,x,JP,options);
generate_seed_jac(m,n,JP,&seed,&p,option);
Hessian:sparse_hess(tag,n,repeat,x,&nnz,&row_ind,&col_ind,
&values,options);
hess_pat(tag,n,x,HP,options);
generate_seed_hess(n,HP,&seed,&p,option);
graph coloring routines by ColPack (Gebrehemedin, Pothen)
K. Kulshreshtha 91 / 94 Intro to AD ESSIM Summer School
sparse_jac(tag, m, n, repeat, x, &nnz, &row_ind, &col_ind,& values, options);jac_pat(tag, m, n, x, JP, options);generate_seed_jac(m, n, JP, &seed, &p, option);sparse_hess(tag, n, repeat, x, &nnz, &row_ind, &col_ind,&values, options);hess_pat(tag, n, x, HP, options);generate_seed_hess(n, HP, &seed, &p, option);
-
Drivers for sparse Derivative MatricesJacobian:sparse_jac(tag,m,n,repeat,x,&nnz,&row_ind,&col_ind,
&values,options);
jac_pat(tag,m,n,x,JP,options);
generate_seed_jac(m,n,JP,&seed,&p,option);
Hessian:sparse_hess(tag,n,repeat,x,&nnz,&row_ind,&col_ind,
&values,options);
hess_pat(tag,n,x,HP,options);
generate_seed_hess(n,HP,&seed,&p,option);
graph coloring routines by ColPack (Gebrehemedin, Pothen)
K. Kulshreshtha 91 / 94 Intro to AD ESSIM Summer School
sparse_jac(tag, m, n, repeat, x, &nnz, &row_ind, &col_ind,& values, options);jac_pat(tag, m, n, x, JP, options);generate_seed_jac(m, n, JP, &seed, &p, option);sparse_hess(tag, n, repeat, x, &nnz, &row_ind, &col_ind,&values, options);hess_pat(tag, n, x, HP, options);generate_seed_hess(n, HP, &seed, &p, option);
-
Drivers for sparse Derivative MatricesJacobian:sparse_jac(tag,m,n,repeat,x,&nnz,&row_ind,&col_ind,
&values,options);
jac_pat(tag,m,n,x,JP,options);
generate_seed_jac(m,n,JP,&seed,&p,option);
Hessian:sparse_hess(tag,n,repeat,x,&nnz,&row_ind,&col_ind,
&values,options);
hess_pat(tag,n,x,HP,options);
generate_seed_hess(n,HP,&seed,&p,option);
graph coloring routines by ColPack (Gebrehemedin, Pothen)
K. Kulshreshtha 91 / 94 Intro to AD ESSIM Summer School
sparse_jac(tag, m, n, repeat, x, &nnz, &row_ind, &col_ind,& values, options);jac_pat(tag, m, n, x, JP, options);generate_seed_jac(m, n, JP, &seed, &p, option);sparse_hess(tag, n, repeat, x, &nnz, &row_ind, &col_ind,&values, options);hess_pat(tag, n, x, HP, options);generate_seed_hess(n, HP, &seed, &p, option);
-
Drivers for sparse Derivative MatricesJacobian:sparse_jac(tag,m,n,repeat,x,&nnz,&row_ind,&col_ind,
&values,options);
jac_pat(tag,m,n,x,JP,options);
generate_seed_jac(m,n,JP,&seed,&p,option);
Hessian:sparse_hess(tag,n,repeat,x,&nnz,&row_ind,&col_ind,
&values,options);
hess_pat(tag,n,x,HP,options);
generate_seed_hess(n,HP,&seed,&p,option);
graph coloring routines by ColPack (Gebrehemedin, Pothen)
K. Kulshreshtha 91 / 94 Intro to AD ESSIM Summer School
sparse_jac(tag, m, n, repeat, x, &nnz, &row_ind, &col_ind,& values, options);jac_pat(tag, m, n, x, JP, options);generate_seed_jac(m, n, JP, &seed, &p, option);sparse_hess(tag, n, repeat, x, &nnz, &row_ind, &col_ind,&values, options);hess_pat(tag, n, x, HP, options);generate_seed_hess(n, HP, &seed, &p, option);
-
Advanced taping techniques
Part A Part B Part C
ADOL-C tape (basic)
K. Kulshreshtha 92 / 94 Intro to AD ESSIM Summer School
-
Advanced taping techniques
Part A Part B Part C
ADOL-C tape (basic)
ADOL-C tape (nested)
K. Kulshreshtha 92 / 94 Intro to AD ESSIM Summer School
-
Advanced taping techniques
Part A Part B Part C
ADOL-C tape (basic)
ADOL-C tape (nested)
External differentiated functions
K. Kulshreshtha 92 / 94 Intro to AD ESSIM Summer School
-
Differentiation of fixpoint iterations
ADOLC tape (basic)
initialization iteration evaluation
() () ()
K. Kulshreshtha 93 / 94 Intro to AD ESSIM Summer School
-
Differentiation of fixpoint iterations
ADOLC tape (basic)
initialization iteration evaluation
ADOLC function:
ADOLC tape (nested)
fp_iteration()
() () ()
K. Kulshreshtha 93 / 94 Intro to AD ESSIM Summer School
-
Differentiation of fixpoint iterations
ADOLC tape (basic)
initialization iteration evaluation
repeated subtape evaluation
reverse evaluation
()
()
()
K. Kulshreshtha 93 / 94 Intro to AD ESSIM Summer School
-
Summary and Outlook
I AD for C and C++ derivatives of any orderI Easy-to-use routines for standard derivative objects,
optimization, ODEs, tensors, . . .I Randomly accessed memory of original magnitudeI Sequential access of data on tape, i.e., array or fileI Wide range of application:
I aerodynamic, chemical engineering, medicine, netword opt., . . .Several ideas for improvement:
I runtime activity, parallelization, front ends, . . .
Contact:http://www.coin-or.org/projects/ADOL-C.xml
K. Kulshreshtha 94 / 94 Intro to AD ESSIM Summer School