Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

Automatic Differentiation

Tobias Hoeppner

18 May 2011

Tobias Hoeppner Automatic Differentiation


Outline

Automatic Differentiation

Higher derivatives and Taylor series



What is Automatic Differentiation

also own as:

I computational differentiation

I algorithmic differentiation

I differentiation of algorithms

AD is a process for evaluating derivatives which depends only onan alorithmic apecification of the function to be differentiated. inpractice the specification of the function is part of a computerprogramIt’s not symbolic differentiation It’s not divided differences



An Example

the function

f (x , y) = (xy + sin x + 4)(3y2 + 6) (1)

the goal of symbolic diff is to produce fromulas for its derivatives

∂f

∂x= (y + cos x)(3y2 + 6) = 3y2 cos x + 6 cos x + 3y3 + 6y ,

(2)

∂f

∂y= 6y(xy + sin x + 4) + x(3y2 + 6) = 9xy2 + 6y sin x + 24y + 6x

(3)

in principle, avaluation of these formulas gives exact values of thederivatives but roundoff error due to floating point



divided differences

I produce approximations to values

I involving only function evaluations

∂f

∂x≈ f (x + ∆x , y)− f (x − ∆x , y)

2∆x=

∂d

∂x+O(∆x2) (4)

where the term O(∆x2) denotes the (unknown) truncation error.in contrast, the values for derivatives obtained by AD are exactand are often much less expensive to compute.



How and why does AD work?

I AD works whenever the chain rule holds

I the theretical exactness of automatic differentiation stemsfrom the fact that it uses the same rules of differentiation aslearned in elementary calculus.

I rules are applied to an algorithmic specification rather than toa formula

I step back a little and consider how to evaluate (rather thandifferentiate) a formula)



evaluating a formula

I the formula given by (1)

I one starts with the values of x and y , builds up each factor,and then multiplies them to obtain the final result.

I the steps involved:

t1 = x , t6 = t5 + 4,

t2 = y , t7 = t22 ,

t3 = t1t2, t8 = 3t7, (5)

t4 = sin t1, t9 = t8 + 6,

t5 = t3 + t4, t10 = t6t9

the result is t10 = f (x , y)



obtain derivatives

I in case of a function f = f (x1, . . . , xm) of several variables,the first partial derivatives can be expressed compactly as thegradient vector

∇f =

[∂f

∂x1, . . . ,

∂f

∂xm

](6)

if u and v are functions whose gradients ∇u and ∇v are known orare previously computed, we compute ∇f using the rules

∇(u ± v) = ∇u ±∇v ,

∇(uv) = u∇v + v∇u,

∇(u/v) = (∇− (u/v)∇v)/v , v 6= 0,

for the arithmetic options and the chain rule

∇φ(u) = φ′(u)∇u, (7)



derivative of the code list

the code list (5) can be augmented with the gradients of eachentry,

t1 = x , ∇t1 = [1, 0],

t2 = y , ∇t2 = [0, 1],

t3 = t1t2, ∇t3 = t1∇t2 + t2∇t1 → [t2, t1],

t4 = sin t1, ∇t4 = (cos t1)∇t1 → [cos t1, 0],

t5 = t3 + t4, ∇t5 = ∇t3 +∇t4 → [t2 + cos t1, t1],

t6 = t5 + 4, ∇t6 = ∇t5 → [t2 + cos t1, t1],

t7 = t22 , ∇t7 = 2t2∇t2 → [0, 2t2],

t8 = 3t7, ∇t8 = 3∇t7 → [0, 6t2],

t9 = t8 + 6, ∇t9 = ∇t8 → [0, 6t2]

t10 = t6t9 ∇t10 = t6∇t9 + t9∇t6 → [t9(t2 + cos t1), 6t2t6 + t1t9].



final results

I the final results are t10 = f (x , y) and its gradient∇t10 = ∇f (x , y) = [t9(t2 + cos t1), 6t2t6 + t1t9].

I count of operatations: 22 = 2 + 10m



2nd order derivative

I in preceeding section, we computed first derivatives

I once a code list representation of function has been obtained,one can also apply rules for higher derivatives or recurrentrelations for Taylor coefficients.

I the second partial derivatives of a function f : Rm → Rconstitutes its Hessian matrix

H(f ) =

[∂2f

∂xi∂xj

]i ,j=1,...,m

(8)

-required for optimization algos using Newton’s method



Rules for arithmetic operations

I the rules for results of arithmetic operations are

H(u ± v) = H(u)±H(v),

H(uv) = uH(v) +∇uT∇v +∇vT∇u + vH(u),

H(u/v) =(H(u)−∇(u/v)T∇v −∇vT∇(u/v)− (u/v)H(v)

), v 6= 0,

and the chain rule takes the form

H(φ(u)) = φ′′(u)∇uT∇u + φ′(u)H(u)

for a twice differentiable functions φ as the standard functions.



The Taylor Series

f (x) = f (x0) +1

1!f ′(x0) · (x − x0)

+1

2!f ′′(x0)(x − x0)

2

+ . . . +1

n!f (n)(x0) · (x − x0)

n + . . .

=∞

∑n=0

1

n!f (n)(x0) · (x − x0)

n. (9)



Taylor coefficients

I Taylor coefficients are scalars

I suppose that f is a function of m variables

I series expansion at point x0 = (x01, . . . , x0m)

I in direction h = (h1, . . . , hm)

f (x0 + h) =∞

∑k=0

1

k !f (k)(x0)h

k =∞

∑k=0

fk , (10)

where fk = f (k)(x0)hk/k !, k = 0, 1, . . . denote the normalizedTaylor coefficients.


Automatic Differentiation

Documents

Transcript of Automatic Differentiation