Automatic Differentiation

14
Automatic Differentiation Higher derivatives and Taylor series Automatic Differentiation Tobias Hoeppner 18 May 2011 Tobias Hoeppner Automatic Differentiation

description

An introduction to automatic differentiation with examples.

Transcript of Automatic Differentiation

Page 1: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

Automatic Differentiation

Tobias Hoeppner

18 May 2011

Tobias Hoeppner Automatic Differentiation

Page 2: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

Outline

Automatic Differentiation

Higher derivatives and Taylor series

Tobias Hoeppner Automatic Differentiation

Page 3: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

What is Automatic Differentiation

also own as:

I computational differentiation

I algorithmic differentiation

I differentiation of algorithms

AD is a process for evaluating derivatives which depends only onan alorithmic apecification of the function to be differentiated. inpractice the specification of the function is part of a computerprogramIt’s not symbolic differentiation It’s not divided differences

Tobias Hoeppner Automatic Differentiation

Page 4: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

An Example

the function

f (x , y) = (xy + sin x + 4)(3y2 + 6) (1)

the goal of symbolic diff is to produce fromulas for its derivatives

∂f

∂x= (y + cos x)(3y2 + 6) = 3y2 cos x + 6 cos x + 3y3 + 6y ,

(2)

∂f

∂y= 6y(xy + sin x + 4) + x(3y2 + 6) = 9xy2 + 6y sin x + 24y + 6x

(3)

in principle, avaluation of these formulas gives exact values of thederivatives but roundoff error due to floating point

Tobias Hoeppner Automatic Differentiation

Page 5: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

divided differences

I produce approximations to values

I involving only function evaluations

∂f

∂x≈ f (x + ∆x , y)− f (x − ∆x , y)

2∆x=

∂d

∂x+O(∆x2) (4)

where the term O(∆x2) denotes the (unknown) truncation error.in contrast, the values for derivatives obtained by AD are exactand are often much less expensive to compute.

Tobias Hoeppner Automatic Differentiation

Page 6: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

How and why does AD work?

I AD works whenever the chain rule holds

I the theretical exactness of automatic differentiation stemsfrom the fact that it uses the same rules of differentiation aslearned in elementary calculus.

I rules are applied to an algorithmic specification rather than toa formula

I step back a little and consider how to evaluate (rather thandifferentiate) a formula)

Tobias Hoeppner Automatic Differentiation

Page 7: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

evaluating a formula

I the formula given by (1)

I one starts with the values of x and y , builds up each factor,and then multiplies them to obtain the final result.

I the steps involved:

t1 = x , t6 = t5 + 4,

t2 = y , t7 = t22 ,

t3 = t1t2, t8 = 3t7, (5)

t4 = sin t1, t9 = t8 + 6,

t5 = t3 + t4, t10 = t6t9

the result is t10 = f (x , y)

Tobias Hoeppner Automatic Differentiation

Page 8: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

obtain derivatives

I in case of a function f = f (x1, . . . , xm) of several variables,the first partial derivatives can be expressed compactly as thegradient vector

∇f =

[∂f

∂x1, . . . ,

∂f

∂xm

](6)

if u and v are functions whose gradients ∇u and ∇v are known orare previously computed, we compute ∇f using the rules

∇(u ± v) = ∇u ±∇v ,

∇(uv) = u∇v + v∇u,

∇(u/v) = (∇− (u/v)∇v)/v , v 6= 0,

for the arithmetic options and the chain rule

∇φ(u) = φ′(u)∇u, (7)

Tobias Hoeppner Automatic Differentiation

Page 9: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

derivative of the code list

the code list (5) can be augmented with the gradients of eachentry,

t1 = x , ∇t1 = [1, 0],

t2 = y , ∇t2 = [0, 1],

t3 = t1t2, ∇t3 = t1∇t2 + t2∇t1 → [t2, t1],

t4 = sin t1, ∇t4 = (cos t1)∇t1 → [cos t1, 0],

t5 = t3 + t4, ∇t5 = ∇t3 +∇t4 → [t2 + cos t1, t1],

t6 = t5 + 4, ∇t6 = ∇t5 → [t2 + cos t1, t1],

t7 = t22 , ∇t7 = 2t2∇t2 → [0, 2t2],

t8 = 3t7, ∇t8 = 3∇t7 → [0, 6t2],

t9 = t8 + 6, ∇t9 = ∇t8 → [0, 6t2]

t10 = t6t9 ∇t10 = t6∇t9 + t9∇t6 → [t9(t2 + cos t1), 6t2t6 + t1t9].

Tobias Hoeppner Automatic Differentiation

Page 10: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

final results

I the final results are t10 = f (x , y) and its gradient∇t10 = ∇f (x , y) = [t9(t2 + cos t1), 6t2t6 + t1t9].

I count of operatations: 22 = 2 + 10m

Tobias Hoeppner Automatic Differentiation

Page 11: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

2nd order derivative

I in preceeding section, we computed first derivatives

I once a code list representation of function has been obtained,one can also apply rules for higher derivatives or recurrentrelations for Taylor coefficients.

I the second partial derivatives of a function f : Rm → Rconstitutes its Hessian matrix

H(f ) =

[∂2f

∂xi∂xj

]i ,j=1,...,m

(8)

-required for optimization algos using Newton’s method

Tobias Hoeppner Automatic Differentiation

Page 12: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

Rules for arithmetic operations

I the rules for results of arithmetic operations are

H(u ± v) = H(u)±H(v),

H(uv) = uH(v) +∇uT∇v +∇vT∇u + vH(u),

H(u/v) =(H(u)−∇(u/v)T∇v −∇vT∇(u/v)− (u/v)H(v)

), v 6= 0,

and the chain rule takes the form

H(φ(u)) = φ′′(u)∇uT∇u + φ′(u)H(u)

for a twice differentiable functions φ as the standard functions.

Tobias Hoeppner Automatic Differentiation

Page 13: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

The Taylor Series

f (x) = f (x0) +1

1!f ′(x0) · (x − x0)

+1

2!f ′′(x0)(x − x0)

2

+ . . . +1

n!f (n)(x0) · (x − x0)

n + . . .

=∞

∑n=0

1

n!f (n)(x0) · (x − x0)

n. (9)

Tobias Hoeppner Automatic Differentiation

Page 14: Automatic Differentiation

Automatic DifferentiationHigher derivatives and Taylor series

Taylor coefficients

I Taylor coefficients are scalars

I suppose that f is a function of m variables

I series expansion at point x0 = (x01, . . . , x0m)

I in direction h = (h1, . . . , hm)

f (x0 + h) =∞

∑k=0

1

k !f (k)(x0)h

k =∞

∑k=0

fk , (10)

where fk = f (k)(x0)hk/k !, k = 0, 1, . . . denote the normalizedTaylor coefficients.

Tobias Hoeppner Automatic Differentiation