ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks &...

65
ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE Dept.)

Transcript of ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks &...

Page 1: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

ELEC 576: Neural Networks & Backpropagation

Lecture 3Ankit B. Patel

Baylor College of Medicine (Neuroscience Dept.) Rice University (ECE Dept.)

Page 2: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Outline• Neural Networks

• Definition of NN and terminology

• Review of (Old) Theoretical Results about NNs

• Intuition for why compositions of nonlinear functions are more expressive

• Expressive power theorems [McC-Pitts, Rosenblatt, Cybenko]

• Backpropagation algorithm (Gradient Descent + Chain Rule)

• History of backprop summary

• Gradient descent (Review).

• Chain Rule (Review).

• Backprop

• Intro to Convnets

• Convolutional Layer, ReLu, Max-Pooling

Page 3: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Neural Networks

Page 4: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Neural Network: Definitions

http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/

Input Units

Output Units

Net Input (Output)

Activation

Page 5: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Neural Networks: Activation Functions

http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/

Page 6: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Neural Networks: Definitions

http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/

Feedforward Propagation: Scalar Form

Input Units

Output Units

Net Input

(Output) Activation

Hidden Units

Page 7: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Neural Networks: Definitions

http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/

Feedforward Propagation: Vector Form

Input Units

Output Units

Net Input

(Output) Activation

Hidden Units

Page 8: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Neural Networks: Definitions

http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/

Deep Feedforward Propagation: Vector Form

Page 9: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Neural Networks: Definitions

http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/

The Training Objective

Page 10: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Expressive Power Theorems

Page 11: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Compositions of Nonlinear Functions are more expressive

[Yoshua Bengio]

Page 12: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

McCulloch-Pitts Neurons

Page 13: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Expressive Power of McCulloch-Pitts Nets

Page 14: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

The Perceptron (Rosenblatt)

Page 15: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Limitations of Perceptron• Rosenblatt was overly enthusiastic about the perceptron and made the ill-timed

proclamation that:

• "Given an elementary α-perceptron, a stimulus world W, and any classification C(W) for which a solution exists; let all stimuli in W occur in any sequence, provided that each stimulus must reoccur in finite time; then beginning from an arbitrary initial state, an error correction procedure will always yield a solution to C(W) in finite time…” [4]

• In 1969, Marvin Minsky and Seymour Papert showed that the perceptron could only solve linearly separable functions.  Of particular interest was the fact that the perceptron still could not solve the XOR and NXOR functions.

• Problem outlined by Minsky and Papert can be solved by deep NNs. However, many of the artificial neural networks in use today still stem from the early advances of the McCulloch-Pitts neuron and the Rosenblatt perceptron.

Page 16: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Universal Approximation Theorem [Cybenko 1989, Hornik 1991]

• https://en.wikipedia.org/wiki/Universal_approximation_theorem

Page 17: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Universal Approximation Theorem• https://en.wikipedia.org/wiki/Universal_approximation_theorem

• Shallow neural networks can represent a wide variety of interesting functions when given appropriate parameters; however, it does not touch upon the algorithmic learnability of those parameters.

• Proved by George Cybenko in 1989 for sigmoid activation functions.[2]

• Kurt Hornik showed in 1991[3] that it is not the specific choice of the activation function, but rather the multilayer feedforward architecture itself which gives neural networks the potential of being universal approximators.

Page 18: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Question (5 min):Why is the theorem true? What is the intuition?

What happens when you go deep? Try iterating f(x) = x^2 vs. f(x) = ax+b

Page 19: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Training Neural Networks Via Gradient Descent

Page 20: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Gradient Descent

[Fei-Fei Li, Andrej Karpathy, Justin Johnson]

Page 21: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Gradient Descent

Page 22: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Gradient Descent

Page 23: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Question:What kind of problems might you run into with Gradient Descent? (4 min)

Page 24: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Global Optima is not Guaranteed

Page 25: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Learning Rate Needs to Be Carefully Chosen

Page 26: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Training Neural Networks: Computing Gradients Efficiently

with the Backpropagation Algorithm

Page 27: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Chain Rule

Page 28: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Chain Rule

Page 29: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Chain Rule

Page 30: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Exercise:Do Chain Rule on

a nested function (2 min)

Page 31: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Backpropagation is an efficient way to compute gradients

• a.k.a. Reverse Mode Automatic Differentiation (AD), and based on a systematic application of the chain rule. It is fast for low-dimensional outputs. For one output (e.g. a scalar loss function), time to compute gradients with respect to ALL inputs is proportional to the time to compute the output. An explicit mathematical expression of the output is not required, only an algorithm to compute it.

• it is NOT the same as symbolic differentiation (e.g. mathematica).

• Numerical/Finite Differences are slow for high-dimensional inputs (e.g. model parameters) and outputs. For a single output, time to compute gradients scales as the number of inputs. May suffer from issues of floating point precision and requires a choice of a parameter increment.

https://www.cs.cmu.edu/~mgormley/courses/10601-s17/slides/lecture20-backprop.pdf

Centered Finite DifferenceGeometrical

Secant

Page 32: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

There is no equivalent Cheap Jacobian Principle or Cheap Hessian Principle

The Cheap Gradient Principle in Backpropagation: The time complexity scales up to the number of

operations performed in the forward pass

https://www.math.uni-bielefeld.de/documenta/vol-ismp/52_griewank-andreas-b.pdf

𝙾𝙿𝚂 {F′�(x)} ≤ 𝚖 ω 𝙾𝙿𝚂 {F(x)}

for polynomial operations and OPS counting the number of multiplications

𝙾𝙿𝚂 {∇𝚏(x)} ≤ ω 𝙾𝙿𝚂 {𝚏(x)}

ω = 3

ω ∼ 5x Is a multidimensional input,

F(x)More generally, for an m-dimensional output

Page 33: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

The Spatial Complexity of Backpropagation scales with the number of operations

performed in the forward pass

https://www.math.uni-bielefeld.de/documenta/vol-ismp/52_griewank-andreas-b.pdf

𝙼𝙴𝙼 {F′�(x)} ∼ 𝙾𝙿𝚂 {F(x)} ≳ 𝙼𝙴𝙼 {F(x)}

Page 34: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

There is no Cheap Jacobian Principle or Cheap Hessian Principle but a Jacobian-vector product can be computed as efficiently as the gradient, and a Hessian-vector

product can be computed efficiently in O(n) instead of O(nxn)

Temporal Complexity in Automatic Differentiation

https://arxiv.org/pdf/1502.05767.pdf https://www.math.uni-bielefeld.de/documenta/vol-ismp/52_griewank-andreas-b.pdf

𝙾𝙿𝚂 {F′�(x)} ≤ 𝚖 ω 𝙾𝙿𝚂 {F(x)}

x Is a n-dimensional input, F(x) Is an m-dimensional output

Reverse Mode:

𝙾𝙿𝚂 {F′�(x)} ≤ 𝚗 ω 𝙾𝙿𝚂 {F(x)}Forward Mode:

ω < 6

Page 35: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

How to Learn NNs? History of the Backpropagation Algorithm (1960-86)

• Introduced by Henrey J. Kelley (1960) and Arthur Bryson (1961) in control theory, using Dynamic Programming

• Simpler derivation using Chain Rule by Stephen Dreyfus (1962)

• General method for Automatic Differentiation by Seppo Linnainamaa (1970)

• Using backdrop for parameters of controllers minimizing error by Stuart Dreyfus (1973)

• Backprop brought into NN world by Paul Werbos (1974)

• Used it to learn representations in hidden layers of NNs by Rumelhart, Hinton & Williams (1986)

Page 36: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Modified from https://www.cs.cmu.edu/~mgormley/courses/10601-s17/slides/lecture20-backprop.pdf

Output calculation

Pass Pass

Backpropagation Example (5 min)

Page 37: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Modified from https://www.cs.cmu.edu/~mgormley/courses/10601-s17/slides/lecture20-backprop.pdf

Output calculation

Gradient calculationLinked by the chain rule

Pass Pass

Backpropagation Example

Page 38: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Modified from https://www.cs.cmu.edu/~mgormley/courses/10601-s17/slides/lecture20-backprop.pdf

The backward pass computes the derivative of the single output J wrt all

inputs efficiently

Pass Pass

xj , ✓j , a, y<latexit sha1_base64="crdFHbUXSsnvTGKUcKBVcxaFfdM=">AAAB+XicbZDLSsNAFIYn9VbrLerSzWARXJSSiKDLohuXFewF2hAm00k7djIJMyfFEPomblwo4tY3cefbOG2z0NYfBj7+cw7nzB8kgmtwnG+rtLa+sblV3q7s7O7tH9iHR20dp4qyFo1FrLoB0UxwyVrAQbBuohiJAsE6wfh2Vu9MmNI8lg+QJcyLyFDykFMCxvJt+8l/rPVhxIAYILXMt6tO3ZkLr4JbQBUVavr2V38Q0zRiEqggWvdcJwEvJwo4FWxa6aeaJYSOyZD1DEoSMe3l88un+Mw4AxzGyjwJeO7+nshJpHUWBaYzIjDSy7WZ+V+tl0J47eVcJikwSReLwlRgiPEsBjzgilEQmQFCFTe3YjoiilAwYVVMCO7yl1ehfVF3nbp7f1lt3BRxlNEJOkXnyEVXqIHuUBO1EEUT9Ixe0ZuVWy/Wu/WxaC1Zxcwx+iPr8wfex5Ml</latexit><latexit sha1_base64="crdFHbUXSsnvTGKUcKBVcxaFfdM=">AAAB+XicbZDLSsNAFIYn9VbrLerSzWARXJSSiKDLohuXFewF2hAm00k7djIJMyfFEPomblwo4tY3cefbOG2z0NYfBj7+cw7nzB8kgmtwnG+rtLa+sblV3q7s7O7tH9iHR20dp4qyFo1FrLoB0UxwyVrAQbBuohiJAsE6wfh2Vu9MmNI8lg+QJcyLyFDykFMCxvJt+8l/rPVhxIAYILXMt6tO3ZkLr4JbQBUVavr2V38Q0zRiEqggWvdcJwEvJwo4FWxa6aeaJYSOyZD1DEoSMe3l88un+Mw4AxzGyjwJeO7+nshJpHUWBaYzIjDSy7WZ+V+tl0J47eVcJikwSReLwlRgiPEsBjzgilEQmQFCFTe3YjoiilAwYVVMCO7yl1ehfVF3nbp7f1lt3BRxlNEJOkXnyEVXqIHuUBO1EEUT9Ixe0ZuVWy/Wu/WxaC1Zxcwx+iPr8wfex5Ml</latexit><latexit sha1_base64="crdFHbUXSsnvTGKUcKBVcxaFfdM=">AAAB+XicbZDLSsNAFIYn9VbrLerSzWARXJSSiKDLohuXFewF2hAm00k7djIJMyfFEPomblwo4tY3cefbOG2z0NYfBj7+cw7nzB8kgmtwnG+rtLa+sblV3q7s7O7tH9iHR20dp4qyFo1FrLoB0UxwyVrAQbBuohiJAsE6wfh2Vu9MmNI8lg+QJcyLyFDykFMCxvJt+8l/rPVhxIAYILXMt6tO3ZkLr4JbQBUVavr2V38Q0zRiEqggWvdcJwEvJwo4FWxa6aeaJYSOyZD1DEoSMe3l88un+Mw4AxzGyjwJeO7+nshJpHUWBaYzIjDSy7WZ+V+tl0J47eVcJikwSReLwlRgiPEsBjzgilEQmQFCFTe3YjoiilAwYVVMCO7yl1ehfVF3nbp7f1lt3BRxlNEJOkXnyEVXqIHuUBO1EEUT9Ixe0ZuVWy/Wu/WxaC1Zxcwx+iPr8wfex5Ml</latexit><latexit sha1_base64="crdFHbUXSsnvTGKUcKBVcxaFfdM=">AAAB+XicbZDLSsNAFIYn9VbrLerSzWARXJSSiKDLohuXFewF2hAm00k7djIJMyfFEPomblwo4tY3cefbOG2z0NYfBj7+cw7nzB8kgmtwnG+rtLa+sblV3q7s7O7tH9iHR20dp4qyFo1FrLoB0UxwyVrAQbBuohiJAsE6wfh2Vu9MmNI8lg+QJcyLyFDykFMCxvJt+8l/rPVhxIAYILXMt6tO3ZkLr4JbQBUVavr2V38Q0zRiEqggWvdcJwEvJwo4FWxa6aeaJYSOyZD1DEoSMe3l88un+Mw4AxzGyjwJeO7+nshJpHUWBaYzIjDSy7WZ+V+tl0J47eVcJikwSReLwlRgiPEsBjzgilEQmQFCFTe3YjoiilAwYVVMCO7yl1ehfVF3nbp7f1lt3BRxlNEJOkXnyEVXqIHuUBO1EEUT9Ixe0ZuVWy/Wu/WxaC1Zxcwx+iPr8wfex5Ml</latexit>

Backpropagation Example

Page 39: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Modified from https://www.cs.cmu.edu/~mgormley/courses/10601-s17/slides/lecture20-backprop.pdf

How efficient? The backward pass takes time proportional to making

the forward pass.

Pass Pass

Backpropagation Example

Page 40: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Modified from https://www.cs.cmu.edu/~mgormley/courses/10601-s17/slides/lecture20-backprop.pdf

The values of the derivatives are computed at each step. Backprop does not store their mathematical

expressions, unlike in symbolic differentiation

Pass Pass

Backpropagation Example

Page 41: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

When is Backpropagation efficient

Backprop(Reverse Mode AD)

FD

SymbolicDifferentiation

@J({✓i}, {xj})@✓1

,

@J({✓i}, {xj})@✓2

, ...

<latexit sha1_base64="TVic7qeEw4LNE1ZBp+bQmDBdDpM=">AAACZHicnVHLS8MwHE7re76q4kmQ4BAmSGmHoEfRi3ia4FRYSkmz1MWlD5JfxVH6T3rz6MW/w2wW1M2TPwh8fI88vkS5FBo8782y5+YXFpeWVxqra+sbm87W9p3OCsV4l2UyUw8R1VyKlHdBgOQPueI0iSS/j4aXY/3+mSstsvQWRjkPEvqYilgwCoYKnZLEirKS5FSBoBJft0hJYMCBhoJUx6R8CZ9IdVR9O2rVr47xf7Ntk3VdN3SanutNBs8CvwZNVE8ndF5JP2NFwlNgkmrd870cgnK8OZO8apBC85yyIX3kPQNTmnAdlJOSKnxomD6OM2VWCnjC/kyUNNF6lETGmVAY6GltTP6l9QqIz4JSpHkBPGVfB8WFxJDhceO4LxRnIEcGUKaEuStmA2qaA/MvDVOCP/3kWXDXdn3P9W9OmucXdR3LaA8doBby0Sk6R1eog7qIoXdryXKsLevDXrN37N0vq23VmR30a+z9T1SSuF8=</latexit><latexit sha1_base64="TVic7qeEw4LNE1ZBp+bQmDBdDpM=">AAACZHicnVHLS8MwHE7re76q4kmQ4BAmSGmHoEfRi3ia4FRYSkmz1MWlD5JfxVH6T3rz6MW/w2wW1M2TPwh8fI88vkS5FBo8782y5+YXFpeWVxqra+sbm87W9p3OCsV4l2UyUw8R1VyKlHdBgOQPueI0iSS/j4aXY/3+mSstsvQWRjkPEvqYilgwCoYKnZLEirKS5FSBoBJft0hJYMCBhoJUx6R8CZ9IdVR9O2rVr47xf7Ntk3VdN3SanutNBs8CvwZNVE8ndF5JP2NFwlNgkmrd870cgnK8OZO8apBC85yyIX3kPQNTmnAdlJOSKnxomD6OM2VWCnjC/kyUNNF6lETGmVAY6GltTP6l9QqIz4JSpHkBPGVfB8WFxJDhceO4LxRnIEcGUKaEuStmA2qaA/MvDVOCP/3kWXDXdn3P9W9OmucXdR3LaA8doBby0Sk6R1eog7qIoXdryXKsLevDXrN37N0vq23VmR30a+z9T1SSuF8=</latexit><latexit sha1_base64="TVic7qeEw4LNE1ZBp+bQmDBdDpM=">AAACZHicnVHLS8MwHE7re76q4kmQ4BAmSGmHoEfRi3ia4FRYSkmz1MWlD5JfxVH6T3rz6MW/w2wW1M2TPwh8fI88vkS5FBo8782y5+YXFpeWVxqra+sbm87W9p3OCsV4l2UyUw8R1VyKlHdBgOQPueI0iSS/j4aXY/3+mSstsvQWRjkPEvqYilgwCoYKnZLEirKS5FSBoBJft0hJYMCBhoJUx6R8CZ9IdVR9O2rVr47xf7Ntk3VdN3SanutNBs8CvwZNVE8ndF5JP2NFwlNgkmrd870cgnK8OZO8apBC85yyIX3kPQNTmnAdlJOSKnxomD6OM2VWCnjC/kyUNNF6lETGmVAY6GltTP6l9QqIz4JSpHkBPGVfB8WFxJDhceO4LxRnIEcGUKaEuStmA2qaA/MvDVOCP/3kWXDXdn3P9W9OmucXdR3LaA8doBby0Sk6R1eog7qIoXdryXKsLevDXrN37N0vq23VmR30a+z9T1SSuF8=</latexit><latexit sha1_base64="TVic7qeEw4LNE1ZBp+bQmDBdDpM=">AAACZHicnVHLS8MwHE7re76q4kmQ4BAmSGmHoEfRi3ia4FRYSkmz1MWlD5JfxVH6T3rz6MW/w2wW1M2TPwh8fI88vkS5FBo8782y5+YXFpeWVxqra+sbm87W9p3OCsV4l2UyUw8R1VyKlHdBgOQPueI0iSS/j4aXY/3+mSstsvQWRjkPEvqYilgwCoYKnZLEirKS5FSBoBJft0hJYMCBhoJUx6R8CZ9IdVR9O2rVr47xf7Ntk3VdN3SanutNBs8CvwZNVE8ndF5JP2NFwlNgkmrd870cgnK8OZO8apBC85yyIX3kPQNTmnAdlJOSKnxomD6OM2VWCnjC/kyUNNF6lETGmVAY6GltTP6l9QqIz4JSpHkBPGVfB8WFxJDhceO4LxRnIEcGUKaEuStmA2qaA/MvDVOCP/3kWXDXdn3P9W9OmucXdR3LaA8doBby0Sk6R1eog7qIoXdryXKsLevDXrN37N0vq23VmR30a+z9T1SSuF8=</latexit>

High-dimensional inputs

YESCheap Gradient Principle

time cost ~ one forward pass

NOtime cost is multiple forward passes; 2 PER input

May not beFormula for J can grow exponentially in size,

aka Expression Swell(https://arxiv.org/pdf/1502.05767.pdf)

Efficient?High-dimensional outputs

@J1({✓i}, {xj})@✓

,

@J2({✓i}, {xj})@✓

, ...

<latexit sha1_base64="nv0sERYZsqD+bDshzoY4pppSDmM=">AAACZHicjVFJSwMxGM2Me91GiydBgkVQKMOMCHoUvYinCnaBpgyZNNNGMwvJN2IZ5k968+jF32G6gNp68IPA4y1ZXsJMCg2e927ZS8srq2vrG5XNre2dXWdvv6XTXDHeZKlMVSekmkuR8CYIkLyTKU7jUPJ2+Hw71tsvXGmRJo8wyngvpoNERIJRMFTgFCRSlBUkowoElfg+8E9JQWDIgQaClHVSvAZPpDwrvz1Ttazjhez5/7Ou6wZOzXO9yeBF4M9ADc2mEThvpJ+yPOYJMEm17vpeBr1ivDWTvKyQXPOMsmc64F0DExpz3SsmJZX4xDB9HKXKrATwhP2ZKGis9SgOjTOmMNTz2pj8S+vmEF31CpFkOfCETQ+KcokhxePGcV8ozkCODKBMCXNXzIbUNAfmXyqmBH/+yYugde76nus/XNSub2Z1rKNDdIxOkY8u0TW6Qw3URAx9WGuWY+1Zn/aWXbUPplbbmmWq6NfYR19D7bhf</latexit><latexit sha1_base64="nv0sERYZsqD+bDshzoY4pppSDmM=">AAACZHicjVFJSwMxGM2Me91GiydBgkVQKMOMCHoUvYinCnaBpgyZNNNGMwvJN2IZ5k968+jF32G6gNp68IPA4y1ZXsJMCg2e927ZS8srq2vrG5XNre2dXWdvv6XTXDHeZKlMVSekmkuR8CYIkLyTKU7jUPJ2+Hw71tsvXGmRJo8wyngvpoNERIJRMFTgFCRSlBUkowoElfg+8E9JQWDIgQaClHVSvAZPpDwrvz1Ttazjhez5/7Ou6wZOzXO9yeBF4M9ADc2mEThvpJ+yPOYJMEm17vpeBr1ivDWTvKyQXPOMsmc64F0DExpz3SsmJZX4xDB9HKXKrATwhP2ZKGis9SgOjTOmMNTz2pj8S+vmEF31CpFkOfCETQ+KcokhxePGcV8ozkCODKBMCXNXzIbUNAfmXyqmBH/+yYugde76nus/XNSub2Z1rKNDdIxOkY8u0TW6Qw3URAx9WGuWY+1Zn/aWXbUPplbbmmWq6NfYR19D7bhf</latexit><latexit sha1_base64="nv0sERYZsqD+bDshzoY4pppSDmM=">AAACZHicjVFJSwMxGM2Me91GiydBgkVQKMOMCHoUvYinCnaBpgyZNNNGMwvJN2IZ5k968+jF32G6gNp68IPA4y1ZXsJMCg2e927ZS8srq2vrG5XNre2dXWdvv6XTXDHeZKlMVSekmkuR8CYIkLyTKU7jUPJ2+Hw71tsvXGmRJo8wyngvpoNERIJRMFTgFCRSlBUkowoElfg+8E9JQWDIgQaClHVSvAZPpDwrvz1Ttazjhez5/7Ou6wZOzXO9yeBF4M9ADc2mEThvpJ+yPOYJMEm17vpeBr1ivDWTvKyQXPOMsmc64F0DExpz3SsmJZX4xDB9HKXKrATwhP2ZKGis9SgOjTOmMNTz2pj8S+vmEF31CpFkOfCETQ+KcokhxePGcV8ozkCODKBMCXNXzIbUNAfmXyqmBH/+yYugde76nus/XNSub2Z1rKNDdIxOkY8u0TW6Qw3URAx9WGuWY+1Zn/aWXbUPplbbmmWq6NfYR19D7bhf</latexit><latexit sha1_base64="nv0sERYZsqD+bDshzoY4pppSDmM=">AAACZHicjVFJSwMxGM2Me91GiydBgkVQKMOMCHoUvYinCnaBpgyZNNNGMwvJN2IZ5k968+jF32G6gNp68IPA4y1ZXsJMCg2e927ZS8srq2vrG5XNre2dXWdvv6XTXDHeZKlMVSekmkuR8CYIkLyTKU7jUPJ2+Hw71tsvXGmRJo8wyngvpoNERIJRMFTgFCRSlBUkowoElfg+8E9JQWDIgQaClHVSvAZPpDwrvz1Ttazjhez5/7Ou6wZOzXO9yeBF4M9ADc2mEThvpJ+yPOYJMEm17vpeBr1ivDWTvKyQXPOMsmc64F0DExpz3SsmJZX4xDB9HKXKrATwhP2ZKGis9SgOjTOmMNTz2pj8S+vmEF31CpFkOfCETQ+KcokhxePGcV8ozkCODKBMCXNXzIbUNAfmXyqmBH/+yYugde76nus/XNSub2Z1rKNDdIxOkY8u0TW6Qw3URAx9WGuWY+1Zn/aWXbUPplbbmmWq6NfYR19D7bhf</latexit>

NO

May not beUnless common subexpressions are leveraged

May not be

YESForward Mode AD NO

Page 42: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Pseudo-Code for Backprop: Scalar Form

http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/

Page 43: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Pseudo-Code for Backprop: Matrix-Vector Form

http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/

Page 44: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Gradient Descent for Neural Networks

http://ufldl.stanford.edu/tutorial/supervised/MultiLayerNeuralNetworks/

Page 45: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Backpropagation: Network View

Page 46: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Another Deeper Example (for practice)

Page 47: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Example

[Fei-Fei Li, Andrej Karpathy, Justin Johnson]

Page 48: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Example

[Fei-Fei Li, Andrej Karpathy, Justin Johnson]

Page 49: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Example

[Fei-Fei Li, Andrej Karpathy, Justin Johnson]

Page 50: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Example

[Fei-Fei Li, Andrej Karpathy, Justin Johnson]

Page 51: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Example

[Fei-Fei Li, Andrej Karpathy, Justin Johnson]

Page 52: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Example

[Fei-Fei Li, Andrej Karpathy, Justin Johnson]

Page 53: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Example

[Fei-Fei Li, Andrej Karpathy, Justin Johnson]

Page 54: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Example

[Fei-Fei Li, Andrej Karpathy, Justin Johnson]

Page 55: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Example

[Fei-Fei Li, Andrej Karpathy, Justin Johnson]

Page 56: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Example

[Fei-Fei Li, Andrej Karpathy, Justin Johnson]

Page 57: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Example

[Fei-Fei Li, Andrej Karpathy, Justin Johnson]

Page 58: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Example

[Fei-Fei Li, Andrej Karpathy, Justin Johnson]

Page 59: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Example

[Fei-Fei Li, Andrej Karpathy, Justin Johnson]

Page 60: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Example

[Fei-Fei Li, Andrej Karpathy, Justin Johnson]

Page 61: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Example

[Fei-Fei Li, Andrej Karpathy, Justin Johnson]

Page 62: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Example

[Fei-Fei Li, Andrej Karpathy, Justin Johnson]

Page 63: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Example

[Fei-Fei Li, Andrej Karpathy, Justin Johnson]

Page 64: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Question:What problems might you encounter with deeply nested functions? (3 min)

Page 65: ELEC 576: Neural Networks & Backpropagation Lecture 3€¦ · ELEC 576: Neural Networks & Backpropagation Lecture 3 Ankit B. Patel Baylor College of Medicine (Neuroscience Dept.)

Visualizing Backprop during Training:Classification with 2-Layer Neural Network

• http://cs.stanford.edu/people/karpathy/convnetjs/demo/classify2d.html

• Try playing around with this app to build intuition:

• change datapoints to see how decision boundaries change

• change network layer types, widths, activation functions, etc.

• try shallower vs deeper