April 9, 2021 CS231n Discussion Section

39
Backpropagation Slides credits: Barak Oshri, Vincent Chen, Nish Khandwala, Yi Wen, Mandy Lu TA: Mandy Lu April 9, 2021 CS231n Discussion Section

Transcript of April 9, 2021 CS231n Discussion Section

Page 1: April 9, 2021 CS231n Discussion Section

Backpropagation

Slides credits: Barak Oshri, Vincent Chen, Nish Khandwala, Yi Wen, Mandy Lu

TA: Mandy Lu

April 9, 2021CS231n Discussion Section

Page 2: April 9, 2021 CS231n Discussion Section

Agenda

● Motivation● Backprop Tips & Tricks● Matrix calculus primer

Page 3: April 9, 2021 CS231n Discussion Section

Agenda

● Motivation● Backprop Tips & Tricks● Matrix calculus primer

Page 4: April 9, 2021 CS231n Discussion Section

MotivationRecall: Optimization objective is to minimize loss

Page 5: April 9, 2021 CS231n Discussion Section

MotivationRecall: Optimization objective is to minimize loss

Goal: how should we tweak the parameters to decrease the loss?

Page 6: April 9, 2021 CS231n Discussion Section

Agenda

● Motivation● Backprop Tips & Tricks● Matrix calculus primer

Page 7: April 9, 2021 CS231n Discussion Section

A Simple ExampleLoss

Goal: Tweak the parameters to minimize loss

=> minimize a multivariable function in parameter space

Page 8: April 9, 2021 CS231n Discussion Section

A Simple Example

=> minimize a multivariable function

Plotted on WolframAlpha

Page 9: April 9, 2021 CS231n Discussion Section

Approach #1: Random Search Intuition: the step we take in the domain of function

Page 10: April 9, 2021 CS231n Discussion Section

Approach #2: Numerical GradientIntuition: rate of change of a function with respect to a variable surrounding a small region

Page 11: April 9, 2021 CS231n Discussion Section

Approach #2: Numerical GradientIntuition: rate of change of a function with respect to a variable surrounding a small region

Finite Differences:

Page 12: April 9, 2021 CS231n Discussion Section

Approach #3: Analytical GradientRecall: partial derivative by limit definition

Page 13: April 9, 2021 CS231n Discussion Section

Approach #3: Analytical GradientRecall: chain rule

Page 14: April 9, 2021 CS231n Discussion Section

Approach #3: Analytical GradientRecall: chain rule

E.g.

Page 15: April 9, 2021 CS231n Discussion Section

Approach #3: Analytical GradientRecall: chain rule

E.g.

Page 16: April 9, 2021 CS231n Discussion Section

Approach #3: Analytical GradientRecall: chain rule

Intuition: upstream gradient values propagate backwards -- we can reuse them!

Page 17: April 9, 2021 CS231n Discussion Section

Gradient

“direction and rate of fastest increase”

Numerical Gradient vs Analytical Gradient

Page 18: April 9, 2021 CS231n Discussion Section

What about Autograd?Q: “Why do we have to write the backward pass when frameworks in the real world, such as TensorFlow, compute them for you automatically?”

A: Problems might surface related to underlying gradients when debugging your model (e.g. vanishing or exploding gradients)

“Yes You Should Understand Backprop”

https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b

Page 19: April 9, 2021 CS231n Discussion Section

Problem Statement: Backpropagation

Given a function f with respect to inputs x, labels y, and parameters 𝜃compute the gradient of the Loss with respect to 𝜃

Page 20: April 9, 2021 CS231n Discussion Section

Problem Statement: Backpropagation An algorithm for computing the gradient of a compound function as a series of local, intermediate gradients:

Input xlocal(x,W,b) => y

dx,dW,db <= grad_local(dy,x,W,b)dx dy

W,b

dW,db

1. Identify intermediate functions (forward prop)2. Compute local gradients (chain rule)3. Combine with upstream signal to get full gradient

y output

Page 21: April 9, 2021 CS231n Discussion Section

Modularity: Previous Example

Compound function

Intermediate Variables(forward propagation)

Page 22: April 9, 2021 CS231n Discussion Section

Modularity: 2-Layer Neural Network

Compound function

Intermediate Variables(forward propagation)

= Squared Euclidean Distance between and

Page 23: April 9, 2021 CS231n Discussion Section

? f(x;W,b) = Wx + b ?Intermediate Variables(forward propagation)

(↑ lecture note) Input one feature vector

(← here) Input a batch of data (matrix)

Page 24: April 9, 2021 CS231n Discussion Section

Intermediate Variables(forward propagation)

Intermediate Gradients(backward propagation)

1. intermediate functions 2. local gradients3. full gradients

???

???

???

Page 25: April 9, 2021 CS231n Discussion Section

Agenda

● Motivation● Backprop Tips & Tricks● Matrix calculus primer

Page 26: April 9, 2021 CS231n Discussion Section

Derivative w.r.t. Vector

Scalar-by-Vector

Vector-by-Vector

Page 27: April 9, 2021 CS231n Discussion Section

Derivative w.r.t. Vector: Chain Rule1. intermediate functions 2. local gradients3. full gradients

?

Page 28: April 9, 2021 CS231n Discussion Section

Derivative w.r.t. Vector: Takeaway

Page 29: April 9, 2021 CS231n Discussion Section

Derivative w.r.t. Matrix

Vector-by-Matrix ?

Scalar-by-Matrix

Page 30: April 9, 2021 CS231n Discussion Section

Derivative w.r.t. Matrix: Dimension Balancing

When you take scalar-by-matrix gradients

The gradient has shape of denominator

● Dimension balancing is the “cheap” but efficient approach to gradient calculations in most practical settings

Page 31: April 9, 2021 CS231n Discussion Section

Derivative w.r.t. Matrix: Takeaway

Page 32: April 9, 2021 CS231n Discussion Section

Intermediate Variables(forward propagation)

Intermediate Gradients(backward propagation)

1. intermediate functions 2. local gradients3. full gradients

Page 33: April 9, 2021 CS231n Discussion Section

Backprop Menu for Success

1. Write down variable graph

2. Keep track of error signals

3. Compute derivative of loss function

4. Enforce shape rule on error signals, especially when deriving

over a linear transformation

Page 34: April 9, 2021 CS231n Discussion Section

Vector-by-vector

?

Page 35: April 9, 2021 CS231n Discussion Section

Vector-by-vector

?

Page 36: April 9, 2021 CS231n Discussion Section

Vector-by-vector

?

Page 37: April 9, 2021 CS231n Discussion Section

Vector-by-vector

?

Page 38: April 9, 2021 CS231n Discussion Section

Matrix multiplication [Backprop]

? ?

Page 39: April 9, 2021 CS231n Discussion Section

Elementwise function [Backprop]

?