Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez...

102
Adjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

Transcript of Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez...

Page 1: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

Adjoint computation and Backpropagation

Guillaume Pallez (Aupy)

Inria & University of Bordeaux

Meeting of the Royal Society, April 2019

Page 2: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

1

Who’s who

Automatic Differentiation

Paul Hovland (Argonne)

Navjot Kukreja (Imperial College)

Krishna Narayanan (Argonne)Machine Learning (I)

Alexis Joly (Inria)

Machine Learning (II)

Alena Shilova (Inria)Scheduling

Guillaume Pallez (Inria)

Olivier Beaumont (Inria)

Julien Herrmann (Inria)

Page 3: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

2

Guillaume Pallez

Page 4: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

3

Adjoint Computation andBackpropagation

Page 5: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

4

Ice-sheet model (I)

“In climate modelling, Ice-sheet models use numerical methods to simulate theevolution, dynamics and thermodynamics of ice sheets.” (wikipedia)

Credit: Daniel Goldberg

Simpler Version:

proc Model Algorithm(u0,y)begin

Do stuff;for i = 0 to n do

ui+1 = fi(ui);Do stuff;

end/* F (u0) = fn ◦ fn−1 ◦ . . . ◦ f0(u0) */

Compute ∇F (u0)y ;

end

Page 6: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

4

Ice-sheet model (I)

“In climate modelling, Ice-sheet models use numerical methods to simulate theevolution, dynamics and thermodynamics of ice sheets.” (wikipedia)

Credit: Daniel Goldberg

Simpler Version:

proc Model Algorithm(u0,y)begin

Do stuff;for i = 0 to n do

ui+1 = fi(ui);Do stuff;

end/* F (u0) = fn ◦ fn−1 ◦ . . . ◦ f0(u0) */

Compute ∇F (u0)y ;

end

Page 7: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

5

Ice-sheet model (II)

A quick reminder about the gradient:

F (u0) = fn ◦ fn−1 ◦ . . . ◦ f1 ◦ f0(u0)

∇F (u0)y = Jf0(u0)T ·∇(fn ◦ f1)(u1) · y

= Jf0(u0)T · Jf1(u1)T · . . . · Jfn−1(un−1)T · Jfn(un)T · y

JfT = Transpose Jacobian matrix of f ;ui+1 = fi(ui) = fi (fi−1 ◦ . . . ◦ f0(u0)) .

Page 8: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

6

A better solution?

∇F (u0)y = Jf0(u0)T · Jf1(u1)T · . . . · Jfn−1(un−1)T · Jfn(un)T · y

proc Algo A(u0,y)begin

Do stuff;for i = 0 to n do

ui+1 = fi(ui);Do stuff;

endCompute ∇F (u0)y;

end

→ What is the problem with Algo A?

proc Algo B(u0,y)begin

Do stuff;for i = 0 to n do

ui+1 = fi(ui);Do stuff;

vi+1 = vi · Jfi+1(ui+1)T ;

end

end

→ What is the problem with Algo B?

∇F (u0)y =((

. . .(Jf0

T · Jf1T)· . . . · Jfn−1

T)· JfnT

)· y n MatMat ops

∇F (u0)y = Jf0T ·

(Jf1

T · . . . ·(Jfn−1

T ·(Jfn

T · y). . .

))n MatVec ops

Page 9: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

6

A better solution?

∇F (u0)y = Jf0(u0)T · Jf1(u1)T · . . . · Jfn−1(un−1)T · Jfn(un)T · y

proc Algo A(u0,y)begin

Do stuff;for i = 0 to n do

ui+1 = fi(ui);Do stuff;

endCompute ∇F (u0)y;

end

→ What is the problem with Algo A?

proc Algo B(u0,y)begin

Do stuff;for i = 0 to n do

ui+1 = fi(ui);Do stuff;

vi+1 = vi · Jfi+1(ui+1)T ;

end

end

→ What is the problem with Algo B?

∇F (u0)y =((

. . .(Jf0

T · Jf1T)· . . . · Jfn−1

T)· JfnT

)· y n MatMat ops

∇F (u0)y = Jf0T ·

(Jf1

T · . . . ·(Jfn−1

T ·(Jfn

T · y). . .

))n MatVec ops

Page 10: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

6

A better solution?

∇F (u0)y = Jf0(u0)T · Jf1(u1)T · . . . · Jfn−1(un−1)T · Jfn(un)T · y

proc Algo A(u0,y)begin

Do stuff;for i = 0 to n do

ui+1 = fi(ui);Do stuff;

endCompute ∇F (u0)y;

end

→ What is the problem with Algo A?

proc Algo B(u0,y)begin

Do stuff;for i = 0 to n do

ui+1 = fi(ui);Do stuff;

vi+1 = vi · Jfi+1(ui+1)T ;

end

end

→ What is the problem with Algo B?

∇F (u0)y =((

. . .(Jf0

T · Jf1T)· . . . · Jfn−1

T)· JfnT

)· y n MatMat ops

∇F (u0)y = Jf0T ·

(Jf1

T · . . . ·(Jfn−1

T ·(Jfn

T · y). . .

))n MatVec ops

Page 11: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

6

A better solution?

∇F (u0)y = Jf0(u0)T · Jf1(u1)T · . . . · Jfn−1(un−1)T · Jfn(un)T · y

proc Algo A(u0,y)begin

Do stuff;for i = 0 to n do

ui+1 = fi(ui);Do stuff;

endCompute ∇F (u0)y;

end

→ What is the problem with Algo A?

proc Algo B(u0,y)begin

Do stuff;for i = 0 to n do

ui+1 = fi(ui);Do stuff;

vi+1 = vi · Jfi+1(ui+1)T ;

end

end

→ What is the problem with Algo B?

∇F (u0)y =((

. . .(Jf0

T · Jf1T)· . . . · Jfn−1

T)· JfnT

)· y n MatMat ops

∇F (u0)y = Jf0T ·

(Jf1

T · . . . ·(Jfn−1

T ·(Jfn

T · y). . .

))n MatVec ops

Page 12: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

6

A better solution?

∇F (u0)y = Jf0(u0)T · Jf1(u1)T · . . . · Jfn−1(un−1)T · Jfn(un)T · y

proc Algo A(u0,y)begin

Do stuff;for i = 0 to n do

ui+1 = fi(ui);Do stuff;

endCompute ∇F (u0)y;

end

→ What is the problem with Algo A?

proc Algo B(u0,y)begin

Do stuff;for i = 0 to n do

ui+1 = fi(ui);Do stuff;

vi+1 = vi · Jfi+1(ui+1)T ;

end

end

→ What is the problem with Algo B?

∇F (u0)y =((

. . .(Jf0

T · Jf1T)· . . . · Jfn−1

T)· JfnT

)· y n MatMat ops

∇F (u0)y = Jf0T ·

(Jf1

T · . . . ·(Jfn−1

T ·(Jfn

T · y). . .

))n MatVec ops

Page 13: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

7

Adjoint computation

Fi(xi) = xi+1 i < l

F̄i(xi, x̄i+1) = x̄i i ≤ l

F0 F1 · · · Fl−2 Fl−1

F̄0 F̄1 F̄2 · · · F̄l−1 T

x0 x1 x2 xl−2 xl−1

x̄lx̄l−1x̄3x̄2x̄1x̄0

x0 x1 x2 xl−1 xl

Page 14: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

7

Adjoint computation

Fi(xi) = xi+1 i < l

F̄i(xi, x̄i+1) = x̄i i ≤ l

F0 F1 · · · Fl−2 Fl−1

F̄0 F̄1 F̄2 · · · F̄l−1 T

x0 x1 x2 xl−2 xl−1

x̄lx̄l−1x̄3x̄2x̄1x̄0

x0 x1 x2 xl−1 xl

Page 15: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

7

Adjoint computation

Fi(xi) = xi+1 i < l

F̄i(xi, x̄i+1) = x̄i i ≤ l

F0 F1 · · · Fl−2 Fl−1

F̄0 F̄1 F̄2 · · · F̄l−1 T

x0 x1 x2 xl−2 xl−1

x̄lx̄l−1x̄3x̄2x̄1x̄0

x0 x1 x2 xl−1 xl

Page 16: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

7

Adjoint computation

Fi(xi) = xi+1 i < l

F̄i(xi, x̄i+1) = x̄i i ≤ l

F0 F1 · · · Fl−2 Fl−1

F̄0 F̄1 F̄2 · · · F̄l−1 T

x0 x1 x2 xl−2 xl−1

x̄lx̄l−1x̄3x̄2x̄1x̄0

x0 x1 x2 xl−1 xl

Page 17: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

7

Adjoint computation

Fi(xi) = xi+1 i < l

F̄i(xi, x̄i+1) = x̄i i ≤ l

F0 F1 · · · Fl−2 Fl−1

F̄0 F̄1 F̄2 · · · F̄l−1 T

x0 x1 x2 xl−2 xl−1

x̄lx̄l−1x̄3x̄2x̄1x̄0

x0 x1 x2 xl−1 xl

Page 18: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

8

Relation to IA? (I)

GoogleNet graph:

Source : Internet :s

Page 19: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

8

Relation to IA? (I)

GoogleNet graph:

Source : Internet :s

Page 20: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

9

Relation to IA? (II)

4

Derivatives in machine learning“Backprop” and gradient descent are at the core of all recent advances Computer vision

NVIDIA DRIVE PX 2 segmentationTop-5 error rate for ImageNet (NVIDIA devblog) Faster R-CNN (Ren et al. 2015)

Speech recognition & synthesis

Word error rates (Huang et al., 2014) Google Neural Machine Translation System (GNMT)

Machine translation

Source: Baydin, Beyond backprop: Automatic differentiation in machine learning, 2017.

Page 21: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1F1 F2 F3 F4

Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4̄F4 T

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 22: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2F2 F3 F4

Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4̄F4 T

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 23: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2F2 F3 F4

Peak Mem

F̄0 F̄1 F̄2 F̄3̄F3 F̄4 T

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 24: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1F1 F2 F3 F4

Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4̄F4 T

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 25: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1F1 F2 F3 F4

Peak Mem

F̄0 F̄1 F̄2 F̄3̄F3 F̄4 T

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 26: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1F1 F2 F3 F4

Peak Mem

F̄0 F̄1 F̄2 F̄3̄F3 F̄4 T

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 27: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3 F4

Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 28: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3 F4

Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 29: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0F0 F1 F2 F3 F4

Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 30: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0F0 F1F1 F2 F3 F4

Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 31: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0F0 F1F1 F2F2 F3 F4

Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 32: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0F0 F1F1 F2F2 F3F3 F4

Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 33: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0F0 F1F1 F2F2 F3F3 F4F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T 6

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 34: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0F0 F1F1 F2F2 F3F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 TT 6

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 35: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0F0 F1F1 F2F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4̄F4 T 6

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 36: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0F0 F1F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3̄F3 F̄4 T 6

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 37: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2̄F2 F̄3 F̄4 T 6

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 38: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1̄F1 F̄2 F̄3 F̄4 T 6

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 39: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3 F4 Peak Mem

F̄0̄F0 F̄1 F̄2 F̄3 F̄4 T 6

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time Space

Store all $ $$$Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 40: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T 6

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none”

$$$ $

No reuse $$ $$Memory Reuse $ $

Page 41: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none”

$$$ $No reuse $$ $$

Memory Reuse $ $

Page 42: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none”

$$$ $No reuse $$ $$

Memory Reuse $ $

Page 43: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none”

$$$ $No reuse $$ $$

Memory Reuse $ $

Page 44: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none”

$$$ $No reuse $$ $$

Memory Reuse $ $

Page 45: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none”

$$$ $No reuse $$ $$

Memory Reuse $ $

Page 46: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3 F4F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none”

$$$ $No reuse $$ $$

Memory Reuse $ $

Page 47: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 TT

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none”

$$$ $No reuse $$ $$

Memory Reuse $ $

Page 48: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 TT 3

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none”

$$$ $No reuse $$ $$

Memory Reuse $ $

Page 49: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 TT 3

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none”

$$$ $No reuse $$ $$

Memory Reuse $ $

Page 50: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 TT 3

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none”

$$$ $No reuse $$ $$

Memory Reuse $ $

Page 51: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 TT 3

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none”

$$$ $No reuse $$ $$

Memory Reuse $ $

Page 52: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4̄F4 T 3

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $

No reuse $$ $$Memory Reuse $ $

Page 53: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 54: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 55: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 56: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 57: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2F2 F3F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 58: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2F2 F3 F4F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 59: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 TT 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 60: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2F2 F3F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 TT 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 61: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4̄F4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 62: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3̄F3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 63: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3̄F3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 64: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3̄F3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 65: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2̄F2 F̄3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 66: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2̄F2 F̄3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 67: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1̄F1 F̄2 F̄3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 68: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3 F4 Peak Mem

F̄0̄F0 F̄1 F̄2 F̄3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 69: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 70: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3̄F3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 71: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3̄F3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 72: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0F0 F1F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2 F̄3̄F3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 73: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1 F̄2̄F2 F̄3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 74: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3 F4 Peak Mem

F̄0 F̄1̄F1 F̄2 F̄3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 75: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

10

Model of execution

F0 F1 F2 F3 F4 Peak Mem

F̄0̄F0 F̄1 F̄2 F̄3 F̄4 T 4

I Memory to store output ofcomputations (xi or x̄i). Initial state:contains x0.

I Cost to write: wm = 0,I Cost to read: rm = 0.

Example of executionStrategy Time SpaceStore all $ $$$

Store “none” $$$ $No reuse $$ $$

Memory Reuse $ $

Page 76: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

11

Problem formulation

We want to minimize the makespan of:

Initial state:AC graph: size l

Steps: uf , ub

Memory: cm, wm = rm = 0, Mini = {x0}Storage k: ck, wk, rk, Sini = ∅

F0 F1 · · · Fl−2 Fl−1

F̄0 F̄1 F̄2 · · · F̄l−1 T

Page 77: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

12

Existing work

Question: How to organize the reverse execution of intermediate steps?What do we store, what do we recompute?

I Store all: memory expensive

I Recompute all: compute expensive

I Intermediate status?

Page 78: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

13

Bounded memory

Griewand and Walther, 2000: Revolve(l, cm), optimal algorithm with cm memoryslots.

Source: Andrea Walther’s PhD thesis, 1999

Page 79: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

14

Storage hierarchy

A., Herrmann, Hovland, Robert, 2015: Optimal algorithm for two level of storage:cheap bounded memory and costly unbounded disks.A., Herrmann, 2019: Library of optimal schedules for any number of storage level.

(https://gitlab.inria.fr/adjoint-computation)

Forward phase

F0 Fl−1

Revolve(m, cm)

m1 m2 m3 m4 l −∑

mi

Page 80: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

15

What directions for AI?

Then what? Are we done? Just let AD people and ML people talk together!

Cut the middle-(scheduling)-people!

Source: A graffiti in Paris (twitter)

Page 81: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

15

What directions for AI?

Then what? Are we done? Just let AD people and ML people talk together!Cut the middle-(scheduling)-people!

Source: A graffiti in Paris (twitter)

Page 82: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

15

What directions for AI?

While the core of the algorithms remain similar, the problematics are different:

I Shallower graphs (O(100− 1000) levels).

I Cost functions (time/memory) are not necessarily uniform.

I Graphs with more structure than chains.

I Multi-Learners/Hyperparameter tuning (independent graphs executedsimultaneously), shared memory?

I Etc.

Pffiew, just saved my job

Page 83: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

15

What directions for AI?

While the core of the algorithms remain similar, the problematics are different:

I Shallower graphs (O(100− 1000) levels).

I Cost functions (time/memory) are not necessarily uniform.

I Graphs with more structure than chains.

I Multi-Learners/Hyperparameter tuning (independent graphs executedsimultaneously), shared memory?

I Etc.

Pffiew, just saved my job

Page 84: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

15

What directions for AI?

While the core of the algorithms remain similar, the problematics are different:

I Shallower graphs (O(100− 1000) levels).

I Cost functions (time/memory) are not necessarily uniform.

I Graphs with more structure than chains.

I Multi-Learners/Hyperparameter tuning (independent graphs executedsimultaneously), shared memory?

I Etc.

Pffiew, just saved my job

Page 85: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

16

Dir. for AI: Graph structure

Remember this google network:

Page 86: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

16

Dir. for AI: Graph structure

Remember this google network:

Nice try, we are not going tohelp you with this for free Google ,

Page 87: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

17

Dir. for AI: Graph structure II

Source: Rao et al., A Deep Siamese Neural

Network (...), 2016

Source: Surı́s et al., Cross-Modal Embeddings

for Video and Audio Retrieval, 2018

Page 88: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

17

Dir. for AI: Graph structure II

Pitchfork graph1 (aka join graphs):

Theorem (A., Beaumont, Herrmann, Shilova, 2019)

Given a bounded memory and a pitchfork with a bounded number of “teeth”, we can find inpolynomial time the solution that backpropagates it in minimal time.

1This should not be seen as an endorsement of the YJ movement, I don’t think I’m allowed togive publicly my opinion.

Page 89: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

18

A Grasp of the proof? (I)

Three phase algorithm:

1 Forward phase

2 Turn

3 Backward phase

I Traverse all branches. Write someintermediate data

I Backpropagate the handle of the pitchfork

I Iteratively, read some checkpointed datafrom one of the branches, backpropagate asubset of the graph (can write additionalintermediate data)

Page 90: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

18

A Grasp of the proof? (I)

Three phase algorithm:

1 Forward phase

2 Turn

3 Backward phase

I Traverse all branches. Write someintermediate data

I Backpropagate the handle of the pitchfork

I Iteratively, read some checkpointed datafrom one of the branches, backpropagate asubset of the graph (can write additionalintermediate data)

Page 91: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

18

A Grasp of the proof? (I)

Three phase algorithm:

1 Forward phase

2 Turn

3 Backward phase

I Traverse all branches. Write someintermediate data

I Backpropagate the handle of the pitchfork

I Iteratively, read some checkpointed datafrom one of the branches, backpropagate asubset of the graph (can write additionalintermediate data)

Page 92: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

18

A Grasp of the proof? (I)

Three phase algorithm:

1 Forward phase

2 Turn

3 Backward phase

I Traverse all branches. Write someintermediate data

I Backpropagate the handle of the pitchfork

I Iteratively, read some checkpointed datafrom one of the branches, backpropagate asubset of the graph (can write additionalintermediate data)

Page 93: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

19

A Grasp of the proof? (II)

It relies on key properties of thebackward phase:

I Stability of execution

I Checkpoint persistence

which give us a multi-phase approach.

Page 94: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

19

A Grasp of the proof? (II)

It relies on key properties of thebackward phase:

I Stability of execution

I Checkpoint persistence

which give us a multi-phase approach.

Lemma (Stability 1)

If Fi is “backpropagated”, then there areno Fj for i ≤ j.

Page 95: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

19

A Grasp of the proof? (II)

It relies on key properties of thebackward phase:

I Stability of execution

I Checkpoint persistence

which give us a multi-phase approach.

Lemma (Checkpoint persistence)

If xi is stored, until Fi is“backpropagated”, there are no Fj forj < i.

Page 96: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

19

A Grasp of the proof? (II)

It relies on key properties of thebackward phase:

I Stability of execution

I Checkpoint persistence

which give us a multi-phase approach.

Lemma (Stability 2)

If xi is read, then there are no Fj onother branches until it is backpropagated.

Page 97: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

19

A Grasp of the proof? (II)

It relies on key properties of thebackward phase:

I Stability of execution

I Checkpoint persistence

which give us a multi-phase approach.

In this case, for a given forward phase,we get a multi-phase backward phase:

Forward phase

F0 F1 F2 Fl−1

I Where do we schedule thecheckpoints in the forward phase?

I In which order do we execute thesubsegment on each branch?

Page 98: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

Is it worth it?

I From a scheduling perspective: Yes! (new fun problems)

I From an adjoint perspective: Yes!With a memory of size O(M):I Store All can execute a graph of size O(M) in time O(M);I Revolve can execute a graph of size O(eM ) in time O(MeM )!I H-Revolve inproves performance by a factor of magnitude.

I Machine Learning perspective: deeper networks!

Thanks

Page 99: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

Is it worth it?

I From a scheduling perspective: Yes! (new fun problems)

I From an adjoint perspective: Yes!With a memory of size O(M):I Store All can execute a graph of size O(M) in time O(M);I Revolve can execute a graph of size O(eM ) in time O(MeM )!I H-Revolve inproves performance by a factor of magnitude.

I Machine Learning perspective: deeper networks!

Thanks

Page 100: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

Is it worth it?

I From a scheduling perspective: Yes! (new fun problems)

I From an adjoint perspective: Yes!With a memory of size O(M):I Store All can execute a graph of size O(M) in time O(M);I Revolve can execute a graph of size O(eM ) in time O(MeM )!I H-Revolve inproves performance by a factor of magnitude.

I Machine Learning perspective: deeper networks!

Thanks

Page 101: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

Is it worth it?

I From a scheduling perspective: Yes! (new fun problems)

I From an adjoint perspective: Yes!With a memory of size O(M):I Store All can execute a graph of size O(M) in time O(M);I Revolve can execute a graph of size O(eM ) in time O(MeM )!I H-Revolve inproves performance by a factor of magnitude.

I Machine Learning perspective: deeper networks!

Thanks

Page 102: Adjoint computation and BackpropagationAdjoint computation and Backpropagation Guillaume Pallez (Aupy) Inria & University of Bordeaux Meeting of the Royal Society, April 2019

Is it worth it?

I From a scheduling perspective: Yes! (new fun problems)

I From an adjoint perspective: Yes!With a memory of size O(M):I Store All can execute a graph of size O(M) in time O(M);I Revolve can execute a graph of size O(eM ) in time O(MeM )!I H-Revolve inproves performance by a factor of magnitude.

I Machine Learning perspective: deeper networks!

Thanks