CS 4803 / 7643: Deep Learning

Dhruv Batra

Georgia Tech

Topics: – Automatic Differentiation

– (Finish) Forward mode vs Reverse mode AD– Patterns in backprop

Administrativia• HW1 Reminder

– Due: 09/09, 11:59pm

• HW2 out on 9/10• Schedule: https://www.cc.gatech.edu/classes/AY2021/cs7643_fall/

• Project discussion next class

(C) Dhruv Batra 2

Recap from last time

(C) Dhruv Batra 3

How do we compute gradients?• Analytic or “Manual” Differentiation

• Symbolic Differentiation

• Numerical Differentiation

• Automatic Differentiation– Forward mode AD– Reverse mode AD

• aka “backprop”

(C) Dhruv Batra 4

Vector/Matrix Derivatives Notation

(C) Dhruv Batra 5

Vector/Matrix Derivatives Notation

(C) Dhruv Batra 6

Vector Derivative Example

(C) Dhruv Batra 7

Extension to Tensors

(C) Dhruv Batra 8

Chain Rule: Composite Functions

(C) Dhruv Batra 9

Chain Rule: Scalar Case

(C) Dhruv Batra 10

Chain Rule: Vector Case

(C) Dhruv Batra 11

Chain Rule: Jacobian view

(C) Dhruv Batra 12

Chain Rule: Graphical view

(C) Dhruv Batra 13

Chain Rule: Cascaded

(C) Dhruv Batra 14

Deep Learning = Differentiable Programming

• Computation = Graph– Input = Data + Parameters– Output = Loss– Scheduling = Topological ordering

• Auto-Diff– A family of algorithms for

implementing chain-rule on computation graphs

(C) Dhruv Batra 15

Directed Acyclic Graphs (DAGs)• Exactly what the name suggests

– Directed edges– No (directed) cycles– Underlying undirected cycles okay

(C) Dhruv Batra 16

Directed Acyclic Graphs (DAGs)• Concept

– Topological Ordering

(C) Dhruv Batra 17

Computational Graphs• Notation

(C) Dhruv Batra 18

Deep Learning = Differentiable Programming

• Computation = Graph– Input = Data + Parameters– Output = Loss– Scheduling = Topological ordering

• Auto-Diff– A family of algorithms for

implementing chain-rule on computation graphs

(C) Dhruv Batra 19

Forward mode AD

Reverse mode AD

Plan for Today• Automatic Differentiation

– (Finish) Forward mode vs Reverse mode AD– Backprop– Patterns in backprop

(C) Dhruv Batra 22

Example: Forward mode AD

(C) Dhruv Batra 23

sin( )

(C) Dhruv Batra 24

sin( )

(C) Dhruv Batra 25

sin( )

(C) Dhruv Batra 26

sin( )

(C) Dhruv Batra 27

sin( )

Q: What happens if there’s another input variable x3?

(C) Dhruv Batra 28

sin( )

Q: What happens if there’s another input variable x3?A: more sophisticated graph; d+1 “passes” for d+1 variables

(C) Dhruv Batra 29

sin( )

Q: What happens if there’s another output variable f2?

(C) Dhruv Batra 30

sin( )

Q: What happens if there’s another output variable f2?A: more sophisticated graph;

d “passes” for variables

Example: Reverse mode AD

(C) Dhruv Batra 31

sin( )

(C) Dhruv Batra 32

sin( )

Gradients add at branches

Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n

Duality in Fprop and Bprop

(C) Dhruv Batra 34

FPROP BPROPSUM

(C) Dhruv Batra 35

sin( )

(C) Dhruv Batra 36

sin( )

Q: What happens if there’s another input variable x3?

(C) Dhruv Batra 37

sin( )

Q: What happens if there’s another input variable x3?A: more sophisticated graph;

single “backward prop”

(C) Dhruv Batra 38

sin( )

Q: What happens if there’s another output variable f2?

(C) Dhruv Batra 39

sin( )

Q: What happens if there’s another output variable f2?A: more sophisticated graph; c “backward props” for c vars

Forward Pass vs Forward mode AD vs Reverse Mode AD

(C) Dhruv Batra 40

sin( )

Forward mode vs Reverse Mode• What are the differences?

(C) Dhruv Batra 41

sin( )

• Which one is faster to compute? – Forward or backward?

(C) Dhruv Batra 42

Forward mode vs Reverse Mode• x Graph L• Intuition of Jacobian

(C) Dhruv Batra 43

• Which one is faster to compute? – Forward or backward?

• Which one is more memory efficient (less storage)? – Forward or backward?

(C) Dhruv Batra 44

Plan for Today• Automatic Differentiation

– (Finish) Forward mode vs Reverse mode AD– Backprop– Patterns in backprop

(C) Dhruv Batra 45

(C) Dhruv Batra 46Figure Credit: Andrea Vedaldi

Neural Network Computation Graph

(C) Dhruv Batra 47Figure Credit: Andrea Vedaldi

Backprop

Any DAG of differentiable modules is allowed!

Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 48

Computational Graph

Key Computation: Forward-Prop

(C) Dhruv Batra 49Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Key Computation: Back-Prop

Neural Network Training• Step 1: Compute Loss on mini-batch [F-Pass]

Neural Network Training• Step 1: Compute Loss on mini-batch [F-Pass]• Step 2: Compute gradients wrt parameters [B-Pass]

Neural Network Training• Step 1: Compute Loss on mini-batch [F-Pass]• Step 2: Compute gradients wrt parameters [B-Pass]• Step 3: Use gradient to update parameters

Backpropagation: a simple example

Patterns in backward flow

Q: What is an add gate?

add gate: gradient distributor

Q: What is a max gate?

max gate: gradient router

Q: What is a mul gate?

mul gate: gradient switcher

Gradients add at branches

Duality in Fprop and Bprop

(C) Dhruv Batra 68

FPROP BPROPSUM

CS 4803 / 7643: Deep Learning

Documents

Transcript of CS 4803 / 7643: Deep Learning

Brochure 4803%20 %2020%20avenue%20nw

CS 4803 / 7643: Deep Learning - College of Computing€¦ · CS 4803 / 7643: Deep Learning Zsolt Kira Georgia Tech Topics: – Training Neural Networks 2 • PS3/HW3 –Will be out

CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward

Df 4803 df 4524 tpi 2010 - 2

Dimension Reduction - Visualizationpoloclub.gatech.edu/cse6242/2013spring/lectures/...Feb 12, 2013 · Dimension Reduction CSE 6242 A / CS 4803 DVA Feb 12, 2013 Guest Lecturer: Jaegul

CS 4803 / 7643: Deep Learning - College of Computing...a black person bail (C) Dhruv Batra & Zsolt Kira Slide Credit: David Madras 48. Example –Word Embeddings • Fairness is morally

Provisioning / en / A31008-M2212-R910-2-7643 / cover front ... · Provisioning / en / A31008-M2212-R910-2-7643 / intro.fm / 04.04.2012 Introduction 3 Template A4, Version 1, 03.04.2012

Teaneck conservatories 201 345-4803

4803-Operacions Bàsiques i Seguretat en el Laboratori Químic

Math 8803/4803, Spring 2008: Discrete Mathematical Biologypeople.math.gatech.edu/~heitsch/Teaching/Sp08/Lectures/lect9.pdfMath 8803/4803, Spring 2008: Discrete Mathematical Biology

HBML 4803 Perbandingan Tatabahasa Melayu

CS 4803 / 7643: Deep Learning · CS 4803 / 7643: Deep Learning Dhruv Batra Georgia Tech Topics: – Linear Classifiers – Loss Functions. Administrativia ... d e l i n g E r r o

CS 4803 / 7643: Deep Learning - Georgia Institute of ...neural network with weights Q-network Architecture Current state st: 84x84x4 stack of last 4 frames (after RGB->grayscale conversion,

Consola ST33xx-68 7643-000 Rivo M

Mobile Applications And Services for Converged Networks€¦ · © 2009 Georgia Institute of Technology - CS 4803/8803 IMS Mobile Applications And Services for Converged Networks

Lawn King - Lawnmowers, Ride on Mowers, Chainsaws & … › downloads › 78414290015_02_2013_13_28_41.pdffd220- 7643 50328 57682 zol 50328 7643 57586 105416 50496 57586 105464 rear

Mobile Applications And Services for Converged Networks · © 2011 Georgia Institute of Technology - CS 4803/8803 MAS Mobile Applications And Services for Converged Networks Russ

Decreto 4803 de 2011 Centro de Memoria

Forsiden - regjeringen.no · 2017. 12. 19. · Saksansv: AVD / 4803 / TUBA Dok.dato: 08122017 Jour.dato: 08122017 EUROPEAN EXTERNAL ACTION SERVICE Saksbeh: AVD / 4803 / TUBA Grad:

CS 7643: Deep Learning - Georgia Institute of Technology 28 28 Slide Credit: Fei-FeiLi, Justin Johnson, Serena Yeung, CS 231n 32 32 3 Convolution Layer activation maps 6 28 28 For