Post on 24-Sep-2020
Lecture 3 recap
1Prof. Leal-TaixΓ© and Prof. Niessner
Beyond linearβ’ Linear score function π = ππ₯
Credit: Li/Karpathy/Johnson
On CIFAR-10
On ImageNet2
Prof. Leal-TaixΓ© and Prof. Niessner
Neural Network
Credit: Li/Karpathy/Johnson
3Prof. Leal-TaixΓ© and Prof. Niessner
Neural Network
Depth
Wid
th
4Prof. Leal-TaixΓ© and Prof. Niessner
Neural Networkβ’ Linear score function π = ππ₯
β’ Neural network is a nesting of βfunctionsββ 2-layers: π = π2max(0,π1π₯)
β 3-layers: π = π3max(0,π2max(0,π1π₯))
β 4-layers: π = π4 tanh (W3, max(0,π2max(0,π1π₯)))
β 5-layers: π = π5π(π4 tanh(W3, max(0,π2max(0,π1π₯))))
β β¦ up to hundreds of layers
5Prof. Leal-TaixΓ© and Prof. Niessner
Computational Graphsβ’ Neural network is a computational graph
β It has compute nodes
β It has edges that connect nodes
β It is directional
β It is organized in βlayersβ
6Prof. Leal-TaixΓ© and Prof. Niessner
Backprop conβt
7Prof. Leal-TaixΓ© and Prof. Niessner
The importance of gradientsβ’ All optimization schemes are based on computing
gradients
β’ One can compute gradients analytically but what if our function is too complex?
β’ Break down gradient computation Backpropagation
Rumelhart 19868
Prof. Leal-TaixΓ© and Prof. Niessner
Backprop: Forward Passβ’ π π₯, π¦, π§ = π₯ + π¦ β π§ Initialization π₯ = 1, π¦ = β3, π§ = 4
mult
sum π = β8
1
β3
4
π = β2
1
β3
4
Prof. Leal-TaixΓ© and Prof. Niessner 9
Backprop: Backward Pass
with π₯ = 1, π¦ = β3, π§ = 4
mult
sum π = β8
1
β3
4
π = β2
1
β3
4
π π₯, π¦, π§ = π₯ + π¦ β π§
π = π₯ + π¦ππ
ππ₯= 1, ππ
ππ¦= 1
π = π β π§ππ
ππ= π§, ππ
ππ§= π
What is ππππ₯,ππ
ππ¦,ππ
ππ§?
Prof. Leal-TaixΓ© and Prof. Niessner 10
Backprop: Backward Pass
with π₯ = 1, π¦ = β3, π§ = 4
mult
sum π = β8
1
β3
4
π = β2
1
β3
4
π π₯, π¦, π§ = π₯ + π¦ β π§
π = π₯ + π¦ππ
ππ₯= 1, ππ
ππ¦= 1
π = π β π§ππ
ππ= π§, ππ
ππ§= π
What is ππππ₯,ππ
ππ¦,ππ
ππ§?
ππ
ππ
1
Prof. Leal-TaixΓ© and Prof. Niessner 11
Backprop: Backward Pass
with π₯ = 1, π¦ = β3, π§ = 4
mult
sum π = β8
1
β3
4
π = β2
1
β3
4
π π₯, π¦, π§ = π₯ + π¦ β π§
π = π₯ + π¦ππ
ππ₯= 1, ππ
ππ¦= 1
π = π β π§ππ
ππ= π§, ππ
ππ§= π
What is ππππ₯,ππ
ππ¦,ππ
ππ§?
1
ππ
ππ§
β2β2
Prof. Leal-TaixΓ© and Prof. Niessner 12
Backprop: Backward Pass
with π₯ = 1, π¦ = β3, π§ = 4
mult
sum π = β8
1
β3
4
π = β2
1
β3
4
π π₯, π¦, π§ = π₯ + π¦ β π§
π = π₯ + π¦ππ
ππ₯= 1, ππ
ππ¦= 1
π = π β π§ππ
ππ= π§, ππ
ππ§= π
What is ππππ₯,ππ
ππ¦,ππ
ππ§?
β2
ππ
ππ
4
1
β2
Prof. Leal-TaixΓ© and Prof. Niessner 13
Backprop: Backward Pass
with π₯ = 1, π¦ = β3, π§ = 4
mult
sum π = β8
1
β3
4
π = β2
1
β3
4
π π₯, π¦, π§ = π₯ + π¦ β π§
π = π₯ + π¦ππ
ππ₯= 1, ππ
ππ¦= 1
π = π β π§ππ
ππ= π§, ππ
ππ§= π
What is ππππ₯,ππ
ππ¦,ππ
ππ§?
1
4
ππ
ππ¦
4
ππ
ππ¦=ππ
ππβ ππ
ππ¦
Chain Rule:
βππ
ππ¦= 4 β 1 = 4
4
β2
Prof. Leal-TaixΓ© and Prof. Niessner 14
Backprop: Backward Pass
with π₯ = 1, π¦ = β3, π§ = 4
mult
sum π = β8
1
β3
4
π = β2
1
β3
4
π π₯, π¦, π§ = π₯ + π¦ β π§
π = π₯ + π¦ππ
ππ₯= 1, ππ
ππ¦= 1
π = π β π§ππ
ππ= π§, ππ
ππ§= π
What is ππππ₯,ππ
ππ¦,ππ
ππ§?
1
β2
44
β2
ππ
ππ₯=ππ
ππβ ππ
ππ₯
Chain Rule:
βππ
ππ₯= 4 β 1 = 4
ππ
ππ₯
4
4
4
Prof. Leal-TaixΓ© and Prof. Niessner 15
Compute Graphs -> Neural Networksβ’ π₯π input variablesβ’ π€π,π,π network weights (note 3 indices)
β l which layerβ m which neuron in layerβ n weights in neuron
β’ π¦π computed output (i output dim; nout)β’ π‘π ground truth targetsβ’ πΏ is loss function
Prof. Leal-TaixΓ© and Prof. Niessner 16
Compute Graphs -> Neural Networks
Prof. Leal-TaixΓ© and Prof. Niessner 17
π₯0
π₯1
π¦0 π‘0
Input layer Output layer
e.g., class label / regression target
π₯0
π₯1 β π€1
β π€0
-π‘0+ x*x
Input Weights(unknowns!)
L2 Lossfunction
Loss/cost
We want to compute gradients w.r.t. all weights w
Compute Graphs -> Neural Networks
Prof. Leal-TaixΓ© and Prof. Niessner 18
π₯0
π₯1
π¦0 π‘0
Input layer Output layer
e.g., class label / regression target
π₯0
π₯1 β π€1
β π€0
-π‘0+ x*x
Input Weights(unknowns!)
L2 Lossfunction
Loss/cost
We want to compute gradients w.r.t. all weights w
max(0, π₯)
ReLU Activation(btw. Iβm not arguingthis is the right choice here)
Compute Graphs -> Neural Networks
Prof. Leal-TaixΓ© and Prof. Niessner 19
π₯0
π₯1
π¦0 π‘0
Input layer Output layer
π¦1
π¦2
π‘1
π‘2
π₯0
π₯1
β π€0,0
-π‘0+ x*x
Loss/cost
-π‘1+ x*x
Loss/cost
β π€0,1
β π€1,0
β π€1,1
-π‘2+ x*x
Loss/costβ π€2,0
β π€2,1
We want to compute gradients w.r.t. all weights w
Compute Graphs -> Neural Networks
Prof. Leal-TaixΓ© and Prof. Niessner 20
π₯0
π₯π
π¦0 π‘0
Input layer Output layer
π¦1 π‘1
β¦
π¦π = π΄(ππ +
π
π₯ππ€π,π)
πΏπ = π¦π β π‘π2
πΏ =
π
πΏπ
We want to compute gradients w.r.t. all weights w*AND* biases b
L2 loss -> simply sum up squaresEnergy to minimize is E=L
Activation function bias
ππΏ
ππ€π,π=
ππΏ
ππ¦πβ ππ¦πππ€π,π
-> use chain rule to compute partials
Gradient Descent
Optimum
Initialization
21Prof. Leal-TaixΓ© and Prof. Niessner
Gradient Descentβ’ From derivative to gradient
β’ Gradient steps in direction of negative gradient
Direction of greatest
increase of the function
Learning rate22
Prof. Leal-TaixΓ© and Prof. Niessner
Gradient Descent for Neural Networks
Prof. Leal-TaixΓ© and Prof. Niessner 23
π»π€ππ₯,π‘ (π€) =
ππ
ππ€0,0,0β¦β¦ππ
ππ€π,π,π
For a given training pair {x,t},we want to update all weights;i.e., we need to compute derivativesw.r.t. to all weights
π Layers
πN
euro
ns
Gradient step:π€β² = π€ β ππ»π€ππ₯,π‘ (π€)
Gradient Descent for Neural Networks
Prof. Leal-TaixΓ© and Prof. Niessner 24
π₯0
π₯1
π₯2
β0
β1
β2
β3
π¦0
π¦1
π‘0
π‘1
π¦π = π΄(π1,π +
π
βππ€1,π,π)βπ = π΄(π0,π +
π
π₯ππ€0,π,π)
πΏπ = π¦π β π‘π2
π»π€ππ₯,π‘ (π€) =
ππ
ππ€0,0,0β¦β¦ππ
ππ€π,π,πβ¦β¦ππ
πππ,π
Just simple: π΄ π₯ = max(0, π₯)
Gradient Descent for Neural Networks
Prof. Leal-TaixΓ© and Prof. Niessner 25
π₯0
π₯1
π₯2
β0
β1
β2
β3
π¦0
π¦1
π‘0
π‘1
π¦π = π΄(π1,π +
π
βππ€1,π,π)βπ = π΄(π0,π +
π
π₯ππ€0,π,π)
πΏπ = π¦π β π‘π2
Backpropagation
ππΏ
ππ€1,π,π=
ππΏ
ππ¦πβ ππ¦π
ππ€1,π,π
ππΏ
ππ€0,π,π=
ππΏ
ππ¦πβ ππ¦ππβπ
β πβπ
ππ€0,π,π
Just
go
th
rou
gh la
yer
by
laye
r
ππΏπππ¦π
= 2(yi β ti)
ππ¦π
ππ€1,π,π= βπ if > 0, else 0
β¦
Gradient Descent for Neural Networks
Prof. Leal-TaixΓ© and Prof. Niessner 26
π₯0
π₯1
π₯2
β0
β1
β2
β3
π¦0
π¦1
π‘0
π‘1
π¦π = π΄(π1,π +
π
βππ€1,π,π)
πΏπ = π¦π β π‘π2
How many unknown weights?
Output layer: 2 β 4 + 2
Hidden Layer: 4 β 3 + 4
#neurons #biases#input channels
Note that some activations have also weightsβπ = π΄(π0,π +
π
π₯ππ€0,π,π)
Derivatives of Cross Entropy Loss
Prof. Leal-TaixΓ© and Prof. Niessner 27
[Sadowski]
Derivatives of Cross Entropy Loss
Prof. Leal-TaixΓ© and Prof. Niessner 28
[Sadowski]
Gradients of weights of last layer:
Derivatives of Cross Entropy Loss
Prof. Leal-TaixΓ© and Prof. Niessner 29
[Sadowski]
Gradients of weights of first layer:
βπ
Derivatives of Neural Networks
Prof. Leal-TaixΓ© and Prof. Niessner 30
Combining nodes:Linear activation node + hinge loss + regularization
Credit: Li/Karpathy/Johnson
Can become quite complexβ¦
Prof. Leal-TaixΓ© and Prof. Niessner 31
π Layers
πN
euro
ns
Can become quite complexβ¦β’ These graphs can be huge!
32Prof. Leal-TaixΓ© and Prof. Niessner
The flow of the gradientsA
ctiv
atio
ns
Activation function 33Prof. Leal-TaixΓ© and Prof. Niessner
The flow of the gradientsA
ctiv
atio
ns
Activation function 34Prof. Leal-TaixΓ© and Prof. Niessner
The flow of the gradients
β’ Many many many many of these nodes form a neural network
β’ Each one has its own work to do
NEURONS
FORWARD AND BACKWARD PASS
35Prof. Leal-TaixΓ© and Prof. Niessner
Gradient descent
36Prof. Leal-TaixΓ© and Prof. Niessner
Gradient Descent
Optimum
Initialization
37Prof. Leal-TaixΓ© and Prof. Niessner
Gradient Descentβ’ From derivative to gradient
β’ Gradient steps in direction of negative gradient
Direction of greatest
increase of the function
Learning rate38
Prof. Leal-TaixΓ© and Prof. Niessner
Gradient Descentβ’ How to pick good learning rate?
β’ How to compute gradient for single training pair?
β’ How to compute gradient for large training set?
β’ How to speed things up
Prof. Leal-TaixΓ© and Prof. Niessner 39
Next lectureβ’ This week:
β No tutorial this Thursday (due to MaiTUM)β Exercise 1 will be released on Thursday as planned
β’ Next lecture on May 7th: β Optimization of Neural Networksβ In particular, introduction to SGD (our main method!)
40Prof. Leal-TaixΓ© and Prof. Niessner
See you next week!
Prof. Leal-TaixΓ© and Prof. Niessner 41