Download - Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Transcript
Page 1: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 20161

Lecture 4:

Backpropagation and

Neural Networks part 1

Page 2: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 20162

AdministrativeA1 is due Jan 20 (Wednesday). ~150 hours leftWarning: Jan 18 (Monday) is Holiday (no class/office hours)

Also note: Lectures are non-exhaustive. Read course notes for completeness.

I’ll hold make up office hours on Wed Jan20, 5pm @ Gates 259

Page 3: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 20163

want

scores function

SVM loss

data loss + regularization

Where we are...

Page 4: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 20164

(image credits to Alec Radford)

Optimization

Page 5: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 20165

Gradient Descent

Numerical gradient: slow :(, approximate :(, easy to write :)Analytic gradient: fast :), exact :), error-prone :(

In practice: Derive analytic gradient, check your implementation with numerical gradient

Page 6: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 20166

Computational Graph

x

W

* hinge loss

R

+ Ls (scores)

Page 7: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 20167

Convolutional Network(AlexNet)

input imageweights

loss

Page 8: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 20168

Neural Turing Machine

input tape

loss

Page 9: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 20169

Neural Turing Machine

Page 10: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201610

e.g. x = -2, y = 5, z = -4

Page 11: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201611

e.g. x = -2, y = 5, z = -4

Want:

Page 12: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201612

e.g. x = -2, y = 5, z = -4

Want:

Page 13: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201613

e.g. x = -2, y = 5, z = -4

Want:

Page 14: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201614

e.g. x = -2, y = 5, z = -4

Want:

Page 15: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201615

e.g. x = -2, y = 5, z = -4

Want:

Page 16: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201616

e.g. x = -2, y = 5, z = -4

Want:

Page 17: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201617

e.g. x = -2, y = 5, z = -4

Want:

Page 18: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201618

e.g. x = -2, y = 5, z = -4

Want:

Page 19: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201619

e.g. x = -2, y = 5, z = -4

Want:

Chain rule:

Page 20: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201620

e.g. x = -2, y = 5, z = -4

Want:

Page 21: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201621

e.g. x = -2, y = 5, z = -4

Want:

Chain rule:

Page 22: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201622

f

activations

Page 23: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201623

f

activations

“local gradient”

Page 24: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201624

f

activations

“local gradient”

gradients

Page 25: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201625

f

activations

gradients

“local gradient”

Page 26: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201626

f

activations

gradients

“local gradient”

Page 27: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201627

f

activations

gradients

“local gradient”

Page 28: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201628

Another example:

Page 29: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201629

Another example:

Page 30: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201630

Another example:

Page 31: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201631

Another example:

Page 32: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201632

Another example:

Page 33: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201633

Another example:

Page 34: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201634

Another example:

Page 35: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201635

Another example:

Page 36: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201636

Another example:

Page 37: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201637

Another example:

(-1) * (-0.20) = 0.20

Page 38: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201638

Another example:

Page 39: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201639

Another example:

[local gradient] x [its gradient][1] x [0.2] = 0.2[1] x [0.2] = 0.2 (both inputs!)

Page 40: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201640

Another example:

Page 41: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201641

Another example:

[local gradient] x [its gradient]x0: [2] x [0.2] = 0.4w0: [-1] x [0.2] = -0.2

Page 42: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201642

sigmoid function

sigmoid gate

Page 43: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201643

sigmoid function

sigmoid gate

(0.73) * (1 - 0.73) = 0.2

Page 44: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201644

Patterns in backward flow

add gate: gradient distributormax gate: gradient routermul gate: gradient… “switcher”?

Page 45: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201645

Gradients add at branches

+

Page 46: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201646

Implementation: forward/backward API

Graph (or Net) object. (Rough psuedo code)

Page 47: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201647

Implementation: forward/backward API

(x,y,z are scalars)

*

x

y

z

Page 48: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201648

Implementation: forward/backward API

(x,y,z are scalars)

*

x

y

z

Page 49: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201649

Example: Torch Layers

Page 50: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201650

Example: Torch Layers

=

Page 51: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201651

Example: Torch MulConstant

initialization

forward()

backward()

Page 52: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201652

Example: Caffe Layers

Page 53: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201653

Caffe Sigmoid Layer

*top_diff (chain rule)

Page 54: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201654

Gradients for vectorized code

f

“local gradient”

This is now the Jacobian matrix (derivative of each element of z w.r.t. each element of x)

(x,y,z are now vectors)

gradients

Page 55: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201655

Vectorized operations

f(x) = max(0,x)(elementwise)

4096-d input vector

4096-d output vector

Page 56: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201656

Vectorized operations

f(x) = max(0,x)(elementwise)

4096-d input vector

4096-d output vector

Q: what is the size of the Jacobian matrix?

Jacobian matrix

Page 57: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201657

max(0,x)(elementwise)

4096-d input vector

4096-d output vector

Q: what is the size of the Jacobian matrix?[4096 x 4096!]

Q2: what does it look like?

Vectorized operations

Jacobian matrix

f(x) = max(0,x)(elementwise)

Page 58: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201658

max(0,x)(elementwise)

100 4096-d input vectors

100 4096-d output vectors

Vectorized operations

in practice we process an entire minibatch (e.g. 100) of examples at one time:

i.e. Jacobian would technically be a[409,600 x 409,600] matrix :\

f(x) = max(0,x)(elementwise)

Page 59: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201659

Assignment: Writing SVM/SoftmaxStage your forward/backward computation!

E.g. for the SVM:margins

Page 60: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201660

Summary so far

- neural nets will be very large: no hope of writing down gradient formula by hand for all parameters

- backpropagation = recursive application of the chain rule along a computational graph to compute the gradients of all inputs/parameters/intermediates

- implementations maintain a graph structure, where the nodes implement the forward() / backward() API.

- forward: compute result of an operation and save any intermediates needed for gradient computation in memory

- backward: apply the chain rule to compute the gradient of the loss function with respect to the inputs.

Page 61: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201661

Page 62: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201662

Neural Network: without the brain stuff

(Before) Linear score function:

Page 63: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201663

Neural Network: without the brain stuff

(Before) Linear score function:

(Now) 2-layer Neural Network

Page 64: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201664

Neural Network: without the brain stuff

(Before) Linear score function:

(Now) 2-layer Neural Network

x hW1 sW2

3072 100 10

Page 65: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201665

Neural Network: without the brain stuff

(Before) Linear score function:

(Now) 2-layer Neural Network

x hW1 sW2

3072 100 10

Page 66: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201666

Neural Network: without the brain stuff

(Before) Linear score function:

(Now) 2-layer Neural Network or 3-layer Neural Network

Page 67: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201667

Full implementation of training a 2-layer Neural Network needs ~11 lines:

from @iamtrask, http://iamtrask.github.io/2015/07/12/basic-python-network/

Page 68: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201668

Assignment: Writing 2layer NetStage your forward/backward computation!

Page 69: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201669

Page 70: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201670

Page 71: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201671

Page 72: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201672

Page 73: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201673

sigmoid activation function

Page 74: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201674

Page 75: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201675

Be very careful with your Brain analogies:

Biological Neurons:- Many different types- Dendrites can perform complex non-

linear computations- Synapses are not a single weight but

a complex non-linear dynamical system

- Rate code may not be adequate

[Dendritic Computation. London and Hausser]

Page 76: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201676

Activation Functions

Sigmoid

tanh tanh(x)

ReLU max(0,x)

Leaky ReLUmax(0.1x, x)

MaxoutELU

Page 77: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201677

Neural Networks: Architectures

“Fully-connected” layers“2-layer Neural Net”, or“1-hidden-layer Neural Net”

“3-layer Neural Net”, or“2-hidden-layer Neural Net”

Page 78: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201678

Example Feed-forward computation of a Neural Network

We can efficiently evaluate an entire layer of neurons.

Page 79: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201679

Example Feed-forward computation of a Neural Network

Page 80: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201680

Setting the number of layers and their sizes

more neurons = more capacity

Page 81: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201681

(you can play with this demo over at ConvNetJS: http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html)

Do not use size of neural network as a regularizer. Use stronger regularization instead:

Page 82: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201682

Summary

- we arrange neurons into fully-connected layers- the abstraction of a layer has the nice property that it

allows us to use efficient vectorized code (e.g. matrix multiplies)

- neural networks are not really neural- neural networks: bigger = better (but might have to

regularize more strongly)

Page 83: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201683

Next Lecture:

More than you ever wanted to know about Neural Networks and how to train them.

Page 84: Lecture 4: Backpropagation and Neural Networks part 1vision.stanford.edu/teaching/cs231n/slides/2016/winter1516_lecture… · Fei-Fei Li & Andrej Karpathy & Justin Johnson Lecture

Lecture 4 - 13 Jan 2016Fei-Fei Li & Andrej Karpathy & Justin JohnsonFei-Fei Li & Andrej Karpathy & Justin Johnson Lecture 4 - 13 Jan 201684

reverse-mode differentiation (if you want effect of many things on one thing)

forward-mode differentiation (if you want effect of one thing on many things)

for many different x

for many different y

complex graph

inputs x outputs y