Lecture 8: Deep Learning Software - Stanford...

159
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017 Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017 1 Lecture 8: Deep Learning Software

Transcript of Lecture 8: Deep Learning Software - Stanford...

Page 1: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 20171

Lecture 8:Deep Learning Software

Page 2: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 20172 2

Administrative

- Project proposals were due Tuesday- We are assigning TAs to projects, stay tuned

- We are grading A1- A2 is due Thursday 5/4

- Remember to stop your instances when not in use- Only use GPU instances for the last notebook

Page 3: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017

Regularization: Add noise, then marginalize out

3

Last timeOptimization: SGD+Momentum, Nesterov, RMSProp, Adam

Regularization: Dropout

Image

Conv-64Conv-64MaxPool

Conv-128Conv-128MaxPool

Conv-256Conv-256MaxPool

Conv-512Conv-512MaxPool

Conv-512Conv-512MaxPool

FC-4096FC-4096

FC-C

Freeze these

Reinitialize this and train

Train

Test

Transfer Learning

Page 4: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 20174

Today

- CPU vs GPU- Deep Learning Frameworks

- Caffe / Caffe2- Theano / TensorFlow- Torch / PyTorch

4

Page 5: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 20175

CPU vs GPU

Page 6: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 20176

My computer

6

Page 7: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 20177

Spot the CPU!(central processing unit)

This image is licensed under CC-BY 2.0

7

Page 8: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 20178

Spot the GPUs!(graphics processing unit)

This image is in the public domain

8

Page 9: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 20179

NVIDIA AMDvs

9

Page 10: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201710

NVIDIA AMDvs

10

Page 11: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201711

CPU vs GPU# Cores Clock Speed Memory Price

CPU(Intel Core i7-7700k)

4(8 threads with hyperthreading)

4.4 GHz Shared with system $339

CPU(Intel Core i7-6950X)

10 (20 threads with hyperthreading)

3.5 GHz Shared with system $1723

GPU(NVIDIA Titan Xp)

3840 1.6 GHz 12 GB GDDR5X $1200

GPU(NVIDIA GTX 1070)

1920 1.68 GHz 8 GB GDDR5 $399

CPU: Fewer cores, but each core is much faster and much more capable; great at sequential tasks

GPU: More cores, but each core is much slower and “dumber”; great for parallel tasks

11

Page 12: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201712

Example: Matrix Multiplication

A x BB x C

A x C

=

12

Page 13: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201713

Programming GPUs

● CUDA (NVIDIA only)○ Write C-like code that runs directly on the GPU○ Higher-level APIs: cuBLAS, cuFFT, cuDNN, etc

● OpenCL○ Similar to CUDA, but runs on anything○ Usually slower :(

● Udacity: Intro to Parallel Programming https://www.udacity.com/course/cs344○ For deep learning just use existing libraries

13

Page 14: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201714

CPU vs GPU in practice

Data from https://github.com/jcjohnson/cnn-benchmarks

(CPU performance not well-optimized, a little unfair)

66x 67x 71x 64x 76x

Page 15: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201715

CPU vs GPU in practice

Data from https://github.com/jcjohnson/cnn-benchmarks

cuDNN much faster than “unoptimized” CUDA

2.8x 3.0x 3.1x 3.4x 2.8x

Page 16: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201716

CPU / GPU Communication

Model is here

Data is here

Page 17: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201717

CPU / GPU Communication

Model is here

Data is here

If you aren’t careful, training can bottleneck on reading data and transferring to GPU!

Solutions:- Read all data into RAM- Use SSD instead of HDD- Use multiple CPU threads

to prefetch data

Page 18: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201718

Deep Learning Frameworks

Page 19: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201719

Last year ...

Caffe (UC Berkeley)

Torch (NYU / Facebook)

Theano (U Montreal)

TensorFlow (Google)

Page 20: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201720

This year ...

Caffe (UC Berkeley)

Torch (NYU / Facebook)

Theano (U Montreal)

TensorFlow (Google)

Caffe2 (Facebook)

PyTorch (Facebook)

CNTK (Microsoft)

Paddle (Baidu)

MXNet (Amazon)Developed by U Washington, CMU, MIT, Hong Kong U, etc but main framework of choice at AWS

And others...

Page 21: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201721

Today

Caffe (UC Berkeley)

Torch (NYU / Facebook)

Theano (U Montreal)

TensorFlow (Google)

Caffe2 (Facebook)

PyTorch (Facebook)

Mostly these

A bit about these

CNTK (Microsoft)

Paddle (Baidu)

MXNet (Amazon)Developed by U Washington, CMU, MIT, Hong Kong U, etc but main framework of choice at AWS

And others...

Page 22: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201722

Recall: Computational Graphs

x

W

hinge loss

R

+ Ls (scores)

*

Page 23: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201723

input image

loss

weights

Figure copyright Alex Krizhevsky, Ilya Sutskever, and

Geoffrey Hinton, 2012. Reproduced with permission.

Recall: Computational Graphs

Page 24: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201724

Recall: Computational Graphs

Figure reproduced with permission from a Twitter post by Andrej Karpathy.

input image

loss

Page 25: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201725

The point of deep learning frameworks

(1) Easily build big computational graphs(2) Easily compute gradients in computational graphs(3) Run it all efficiently on GPU (wrap cuDNN, cuBLAS, etc)

25

Page 26: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201726

Computational Graphsx y z

*

a+

b

Σ

c

Numpy

26

Page 27: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201727

Computational Graphsx y z

*

a+

b

Σ

c

Numpy

27

Page 28: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201728

Computational Graphsx y z

*

a+

b

Σ

c

Numpy

Problems: - Can’t run on GPU- Have to compute

our own gradients

28

Page 29: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201729

Computational Graphsx y z

*

a+

b

Σ

c

Numpy

TensorFlow

29

Page 30: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201730

Computational Graphsx y z

*

a+

b

Σ

c

TensorFlow

Create forward computational graph

30

Page 31: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201731

Computational Graphsx y z

*

a+

b

Σ

c

TensorFlow

Ask TensorFlow to compute gradients

31

Page 32: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201732

Computational Graphsx y z

*

a+

b

Σ

c

TensorFlow

Tell TensorFlow to run on CPU

32

Page 33: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201733

Computational Graphsx y z

*

a+

b

Σ

c

TensorFlow

Tell TensorFlow to run on GPU

33

Page 34: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201734

Computational Graphsx y z

*

a+

b

Σ

c

PyTorch

34

Page 35: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201735

Computational Graphsx y z

*

a+

b

Σ

c

PyTorch

Define Variables to start building a computational graph

35

Page 36: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201736

Computational Graphsx y z

*

a+

b

Σ

c

PyTorch

Forward pass looks just like numpy

36

Page 37: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201737

Computational Graphsx y z

*

a+

b

Σ

c

PyTorch

Calling c.backward() computes all gradients

37

Page 38: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201738

Computational Graphsx y z

*

a+

b

Σ

c

PyTorch

Run on GPU by casting to .cuda()

38

Page 39: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201739

PyTorchTensorFlowNumpy

39

Page 40: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201740

TensorFlow(more detail)

40

Page 41: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201741

TensorFlow: Neural Net

Running example: Train a two-layer ReLU network on random data with L2 loss

41

Page 42: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201742

TensorFlow: Neural Net

(Assume imports at the top of each snipppet)

42

Page 43: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201743

TensorFlow: Neural Net

43

First define computational graph

Then run the graph many times

Page 44: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201744

TensorFlow: Neural Net

Create placeholders for input x, weights w1 and w2, and targets y

44

Page 45: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201745

TensorFlow: Neural Net

Forward pass: compute prediction for y and loss (L2 distance between y and y_pred)

No computation happens here - just building the graph!

45

Page 46: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201746

TensorFlow: Neural Net

Tell TensorFlow to compute loss of gradient with respect to w1 and w2.

Again no computation here - just building the graph

46

Page 47: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201747

TensorFlow: Neural Net

Now done building our graph, so we enter a session so we can actually run the graph

47

Page 48: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201748

TensorFlow: Neural Net

Create numpy arrays that will fill in the placeholders above

48

Page 49: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201749

TensorFlow: Neural Net

Run the graph: feed in the numpy arrays for x, y, w1, and w2; get numpy arrays for loss, grad_w1, and grad_w2

49

Page 50: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201750

TensorFlow: Neural Net

Train the network: Run the graph over and over, use gradient to update weights

50

Page 51: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201751

TensorFlow: Neural Net

Train the network: Run the graph over and over, use gradient to update weights

Problem: copying weights between CPU / GPU each step

51

Page 52: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201752

TensorFlow: Neural Net

Change w1 and w2 from placeholder (fed on each call) to Variable (persists in the graph between calls)

52

Page 53: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201753

TensorFlow: Neural Net

Add assign operations to update w1 and w2 as part of the graph!

53

Page 54: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201754

TensorFlow: Neural Net

Run graph once to initialize w1 and w2

Run many times to train

54

Page 55: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201755

TensorFlow: Neural Net

Problem: loss not going down! Assign calls not actually being executed!

55

Page 56: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201756

TensorFlow: Neural Net

Add dummy graph node that depends on updates

Tell graph to compute dummy node

56

Page 57: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201757

TensorFlow: Optimizer

Can use an optimizer to compute gradients and update weights

Remember to execute the output of the optimizer!

57

Page 58: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201758

TensorFlow: Loss

Use predefined common lossees

58

Page 59: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201759

TensorFlow: Layers

Use Xavier initializer

tf.layers automatically sets up weight and (and bias) for us!

59

Page 60: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201760

Keras: High-Level WrapperKeras is a layer on top of TensorFlow, makes common things easy to do

(Also supports Theano backend)

60

Page 61: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201761

Keras: High-Level Wrapper

Define model object as a sequence of layers

61

Page 62: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201762

Keras: High-Level Wrapper

Define optimizer object

62

Page 63: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201763

Keras: High-Level Wrapper

Build the model, specify loss function

63

Page 64: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201764

Keras: High-Level Wrapper

Train the model with a single line!

64

Page 65: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201765

TensorFlow: Other High-Level Wrappers

Keras (https://keras.io/)

TFLearn (http://tflearn.org/)

TensorLayer (http://tensorlayer.readthedocs.io/en/latest/)

tf.layers (https://www.tensorflow.org/api_docs/python/tf/layers)

TF-Slim (https://github.com/tensorflow/models/tree/master/inception/inception/slim)

tf.contrib.learn (https://www.tensorflow.org/get_started/tflearn)

Pretty Tensor (https://github.com/google/prettytensor)

65

Page 66: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201766

TensorFlow: Other High-Level Wrappers

Keras (https://keras.io/)

TFLearn (http://tflearn.org/)

TensorLayer (http://tensorlayer.readthedocs.io/en/latest/)

tf.layers (https://www.tensorflow.org/api_docs/python/tf/layers)

TF-Slim (https://github.com/tensorflow/models/tree/master/inception/inception/slim)

tf.contrib.learn (https://www.tensorflow.org/get_started/tflearn)

Pretty Tensor (https://github.com/google/prettytensor)

Ships with TensorFlow

66

Page 67: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201767

TensorFlow: Other High-Level Wrappers

Keras (https://keras.io/)

TFLearn (http://tflearn.org/)

TensorLayer (http://tensorlayer.readthedocs.io/en/latest/)

tf.layers (https://www.tensorflow.org/api_docs/python/tf/layers)

TF-Slim (https://github.com/tensorflow/models/tree/master/inception/inception/slim)

tf.contrib.learn (https://www.tensorflow.org/get_started/tflearn)

Pretty Tensor (https://github.com/google/prettytensor)

Ships with TensorFlow

From Google

67

Page 68: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201768

TensorFlow: Other High-Level Wrappers

Keras (https://keras.io/)

TFLearn (http://tflearn.org/)

TensorLayer (http://tensorlayer.readthedocs.io/en/latest/)

tf.layers (https://www.tensorflow.org/api_docs/python/tf/layers)

TF-Slim (https://github.com/tensorflow/models/tree/master/inception/inception/slim)

tf.contrib.learn (https://www.tensorflow.org/get_started/tflearn)

Pretty Tensor (https://github.com/google/prettytensor)

Sonnet (https://github.com/deepmind/sonnet)

Ships with TensorFlow

From Google

From DeepMind

68

Page 69: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201769

TensorFlow: Pretrained Models

TF-Slim: (https://github.com/tensorflow/models/tree/master/slim/nets)

Keras: (https://github.com/fchollet/deep-learning-models)

69

Image

Conv-64Conv-64MaxPool

Conv-128Conv-128MaxPool

Conv-256Conv-256MaxPool

Conv-512Conv-512MaxPool

Conv-512Conv-512MaxPool

FC-4096FC-4096

FC-C

Freeze these

Reinitialize this and train

Transfer Learning

Page 70: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201770

TensorFlow: TensorboardAdd logging to code to record loss, stats, etcRun server and get pretty graphs!

Page 71: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201771

TensorFlow: Distributed Version

https://www.tensorflow.org/deploy/distributed

Split one graph over multiple machines!

Page 72: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201772

Side Note: TheanoTensorFlow is similar in many ways to Theano (earlier framework from Montreal)

Page 73: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201773

Side Note: TheanoDefine symbolic variables (similar to TensorFlow placeholder)

Page 74: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201774

Side Note: Theano

Forward pass: compute predictions and loss

Page 75: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201775

Side Note: Theano

Forward pass: compute predictions and loss(no computation performed yet)

Page 76: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201776

Side Note: Theano

Ask Theano to compute gradients for us (no computation performed yet)

Page 77: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201777

Side Note: Theano

Compile a function that computes loss, scores, and gradients from data and weights

Page 78: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201778

Side Note: Theano

Run the function many times to train the network

Page 79: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201779

PyTorch(Facebook)

79

Page 80: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201780

PyTorch: Three Levels of Abstraction

Tensor: Imperative ndarray, but runs on GPUVariable: Node in a computational graph; stores data and gradientModule: A neural network layer; may store state or learnable weights

80

Page 81: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201781

PyTorch: Three Levels of Abstraction

Tensor: Imperative ndarray, but runs on GPUVariable: Node in a computational graph; stores data and gradientModule: A neural network layer; may store state or learnable weights

Numpy array

Tensor, Variable, Placeholder

tf.layers, or TFSlim, or TFLearn, or Sonnet, or ….

TensorFlow equivalent

81

Page 82: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201782

PyTorch: TensorsPyTorch Tensors are just like numpy arrays, but they can run on GPU.

No built-in notion of computational graph, or gradients, or deep learning.

Here we fit a two-layer net using PyTorch Tensors:

82

Page 83: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201783

PyTorch: TensorsCreate random tensors for data and weights

83

Page 84: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201784

PyTorch: Tensors

Forward pass: compute predictions and loss

84

Page 85: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201785

PyTorch: Tensors

Backward pass: manually compute gradients

85

Page 86: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201786

PyTorch: Tensors

Gradient descent step on weights

86

Page 87: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201787

PyTorch: Tensors

To run on GPU, just cast tensors to a cuda datatype!

87

Page 88: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201788

PyTorch: Autograd

A PyTorch Variable is a node in a computational graph

x.data is a Tensor

x.grad is a Variable of gradients (same shape as x.data)

x.grad.data is a Tensor of gradients

88

Page 89: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201789

PyTorch: Autograd

PyTorch Tensors and Variables have the same API!

Variables remember how they were created (for backprop)

89

Page 90: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201790

PyTorch: Autograd

We will not want gradients (of loss) with respect to data

Do want gradients with respect to weights

90

Page 91: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201791

PyTorch: Autograd

Forward pass looks exactly the same as the Tensor version, but everything is a variable now

91

Page 92: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201792

PyTorch: Autograd

Compute gradient of loss with respect to w1 and w2(zero out grads first)

92

Page 93: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201793

PyTorch: Autograd

Make gradient step on weights

93

Page 94: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201794

PyTorch: New Autograd Functions

Define your own autograd functions by writing forward and backward for Tensors

(similar to modular layers in A2)

94

Page 95: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201795

PyTorch: New Autograd Functions

Can use our new autograd function in the forward pass

95

Page 96: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201796

PyTorch: nn

Higher-level wrapper for working with neural nets

Similar to Keras and friends … but only one, and it’s good =)

96

Page 97: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201797

PyTorch: nn

Define our model as a sequence of layers

nn also defines common loss functions

97

Page 98: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201798

PyTorch: nn

Forward pass: feed data to model, and prediction to loss function

98

Page 99: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 201799

PyTorch: nn

Backward pass: compute all gradients

99

Page 100: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017100

PyTorch: nn

Make gradient step on each model parameter

100

Page 101: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017101

PyTorch: optim

Use an optimizer for different update rules

101

Page 102: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017102

PyTorch: optim

Update all parameters after computing gradients

102

Page 103: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017103

PyTorch: nnDefine new ModulesA PyTorch Module is a neural net layer; it inputs and outputs Variables

Modules can contain weights (as Variables) or other Modules

You can define your own Modules using autograd!

103

Page 104: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017104

PyTorch: nnDefine new Modules

Define our whole model as a single Module

104

Page 105: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017105

PyTorch: nnDefine new Modules

Initializer sets up two children (Modules can contain modules)

105

Page 106: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017106

PyTorch: nnDefine new Modules

Define forward pass using child modules and autograd ops on Variables

No need to define backward - autograd will handle it

106

Page 107: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017107

PyTorch: nnDefine new Modules

Construct and train an instance of our model

107

Page 108: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017108

PyTorch: DataLoaders

A DataLoader wraps a Dataset and provides minibatching, shuffling, multithreading, for you

When you need to load custom data, just write your own Dataset class

108

Page 109: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017109

PyTorch: DataLoaders

Iterate over loader to form minibatches

Loader gives Tensors so you need to wrap in Variables

109

Page 110: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017110

PyTorch: Pretrained Models

Super easy to use pretrained models with torchvision https://github.com/pytorch/vision

110

Page 111: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017111

PyTorch: Visdom

This image is licensed under CC-BY 4.0; no changes were made to the image

Somewhat similar to TensorBoard: add logging to your code, then visualized in a browser

Can’t visualize computational graph structure (yet?)

https://github.com/facebookresearch/visdom

Page 112: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017112

Aside: TorchDirect ancestor of PyTorch (they share a lot of C backend)

Written in Lua, not Python

PyTorch has 3 levels of abstraction: Tensor, Variable, and Module

Torch only has 2: Tensor, Module

More details: Check 2016 slides

Page 113: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017113

Aside: TorchBuild a model as a sequence of layers, and a loss function

Page 114: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017114

Aside: Torch

Define a callback that inputs weights, produces loss and gradient on weights

Page 115: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017115

Aside: Torch

Forward: compute scores and loss

Page 116: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017116

Aside: Torch

Backward: compute gradient

(no autograd, need to pass grad_scores around)

Page 117: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017117

Aside: Torch

Pass callback to optimizer over and over

Page 118: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017118

Torch vs PyTorch

Torch(-) Lua(-) No autograd(+) More stable(+) Lots of existing code(0) Fast

PyTorch(+) Python(+) Autograd(-) Newer, still changing(-) Less existing code(0) Fast

Page 119: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017119

Torch vs PyTorch

Torch(-) Lua(-) No autograd(+) More stable(+) Lots of existing code(0) Fast

PyTorch(+) Python(+) Autograd(-) Newer, still changing(-) Less existing code(0) Fast

Conclusion: Probably use PyTorch for new projects

Page 120: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017120

Static vs Dynamic GraphsTensorFlow: Build graph once, then run many times (static)

PyTorch: Each forward pass defines a new graph (dynamic)

Build graph

Run each iteration

New graph each iteration

120

Page 121: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017121

Static vs Dynamic: OptimizationWith static graphs, framework can optimize the graph for you before it runs!

ConvReLUConvReLUConvReLU

The graph you wrote

Conv+ReLU

Equivalent graph with fused operations

Conv+ReLUConv+ReLU

Page 122: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017122

Static vs Dynamic: Serialization

Once graph is built, can serialize it and run it without the code that built the graph!

Graph building and execution are intertwined, so always need to keep code around

Static Dynamic

Page 123: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017123

Static vs Dynamic: Conditional

y = w1 * x if z > 0w2 * x otherwise

123

Page 124: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017124

Static vs Dynamic: Conditional

y = w1 * x if z > 0w2 * x otherwise

PyTorch: Normal Python

124

Page 125: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017125

Static vs Dynamic: Conditional

y = w1 * x if z > 0w2 * x otherwise

PyTorch: Normal Python

TensorFlow: Special TF control flow operator!

125

Page 126: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017126

Static vs Dynamic: Loops

yt = (yt-1+ xt) * wy0

x1 x2 x3

+ * + * +

w

*

126

Page 127: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017127

Static vs Dynamic: Loops

yt = (yt-1+ xt) * wy0

x1 x2 x3

+ * + * +

w

*PyTorch: Normal Python

127

Page 128: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017128

Static vs Dynamic: Loops

yt = (yt-1+ xt) * w

PyTorch: Normal Python

TensorFlow: Special TF control flow

128

Page 129: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017129

Dynamic Graphs in TensorFlow

Looks et al, “Deep Learning with Dynamic Computation Graphs”, ICLR 2017https://github.com/tensorflow/fold

TensorFlow Fold make dynamic graphs easier in TensorFlow through dynamic batching

Page 130: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017130

Dynamic Graph Applications

Karpathy and Fei-Fei, “Deep Visual-Semantic Alignments for Generating Image Descriptions”, CVPR 2015Figure copyright IEEE, 2015. Reproduced for educational purposes.

- Recurrent networks

Page 131: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017131

Dynamic Graph Applications

- Recurrent networks- Recursive networks

The cat ate a big rat

Page 132: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017132

Dynamic Graph Applications

- Recurrent networks- Recursive networks- Modular Networks

What color is the cat?

find[cat]

query[color]

white

This image is in the public domainAndreas et al, “Neural Module Networks”, CVPR 2016Andreas et al, “Learning to Compose Neural Networks for Question Answering”, NAACL 2016

Page 133: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017133

Dynamic Graph Applications

- Recurrent networks- Recursive networks- Modular Networks

Are there more cats than dogs? find[cat]

count

no

find[dog]

count

compare

This image is in the public domainAndreas et al, “Neural Module Networks”, CVPR 2016Andreas et al, “Learning to Compose Neural Networks for Question Answering”, NAACL 2016

Page 134: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017134

Dynamic Graph Applications

- Recurrent networks- Recursive networks- Modular Networks- (Your creative idea here)

Page 135: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017135

Caffe(UC Berkeley)

135

Page 136: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017136

● Core written in C++● Has Python and MATLAB bindings● Good for training or finetuning

feedforward classification models● Often no need to write code!● Not used as much in research anymore,

still popular for deploying models

Caffe Overview

136

Page 137: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017137

No need to write code!1. Convert data (run a script)2. Define net (edit prototxt)3. Define solver (edit prototxt)4. Train (with pretrained weights) (run a script)

Caffe: Training / Finetuning

Page 138: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017138

Caffe step 1: Convert Data

● DataLayer reading from LMDB is the easiest● Create LMDB using convert_imageset● Need text file where each line is

○ “[path/to/image.jpeg] [label]”● Create HDF5 file yourself using h5py

Page 139: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017139

Caffe step 1: Convert Data

● ImageDataLayer: Read from image files● WindowDataLayer: For detection● HDF5Layer: Read from HDF5 file● From memory, using Python interface● All of these are harder to use (except Python)

Page 140: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017140

Caffe step 2: Define Network (prototxt)

Page 141: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017141

Caffe step 2: Define Network (prototxt)

https://github.com/KaimingHe/deep-residual-networks/blob/master/prototxt/ResNet-152-deploy.prototxt

● .prototxt can get ugly for big models

● ResNet-152 prototxt is 6775 lines long!

● Not “compositional”; can’t easily define a residual block and reuse

Page 142: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017142

Caffe step 3: Define Solver (prototxt)

● Write a prototxt file defining a SolverParameter

● If finetuning, copy existing solver.prototxt file○ Change net to be your net○ Change snapshot_prefix

to your output○ Reduce base learning rate

(divide by 100)○ Maybe change max_iter

and snapshot

Page 143: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017143

Caffe step 4: Train!

./build/tools/caffe train \ -gpu 0 \ -model path/to/trainval.prototxt \ -solver path/to/solver.prototxt \ -weights path/to/pretrained_weights.caffemodel

https://github.com/BVLC/caffe/blob/master/tools/caffe.cpp

Page 144: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017144

Caffe step 4: Train!

./build/tools/caffe train \ -gpu 0 \ -model path/to/trainval.prototxt \ -solver path/to/solver.prototxt \ -weights path/to/pretrained_weights.caffemodel

-gpu -1 for CPU-only -gpu all for multi-gpu

https://github.com/BVLC/caffe/blob/master/tools/caffe.cpp

Page 145: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017145

Caffe Model Zoo

AlexNet, VGG, GoogLeNet, ResNet, plus others

https://github.com/BVLC/caffe/wiki/Model-Zoo

Page 146: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017146

Caffe Python Interface

Not much documentation… Read the code! Two most important files:● caffe/python/caffe/_caffe.cpp:

○ Exports Blob, Layer, Net, and Solver classes● caffe/python/caffe/pycaffe.py

○ Adds extra methods to Net class

Page 147: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017147

Caffe Python Interface

Good for:● Interfacing with numpy● Extract features: Run net forward● Compute gradients: Run net backward (DeepDream, etc)● Define layers in Python with numpy (CPU only)

Page 148: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017148

● (+) Good for feedforward networks● (+) Good for finetuning existing networks● (+) Train models without writing any code!● (+) Python interface is pretty useful!● (+) Can deploy without Python● (-) Need to write C++ / CUDA for new GPU layers● (-) Not good for recurrent networks● (-) Cumbersome for big networks (GoogLeNet, ResNet)

Caffe Pros / Cons

Page 149: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017149

Caffe2(Facebook)

149

Page 150: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017150

Caffe2 Overview

● Very new - released a week ago =)● Static graphs, somewhat similar to TensorFlow● Core written in C++● Nice Python interface● Can train model in Python, then serialize and deploy

without Python● Works on iOS / Android, etc

Page 151: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017151

Google:TensorFlow

Facebook:PyTorch +Caffe2

“One framework to rule them all” Research Production

Page 152: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017152

My Advice:TensorFlow is a safe bet for most projects. Not perfect but has huge community, wide usage. Maybe pair with high-level wrapper (Keras, Sonnet, etc)I think PyTorch is best for research. However still new, there can be rough patches.Use TensorFlow for one graph over many machinesConsider Caffe, Caffe2, or TensorFlow for production deploymentConsider TensorFlow or Caffe2 for mobile

Page 153: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017153

Next Time: CNN Architecture Case Studies

Page 154: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017154

Page 155: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017155

Page 156: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017156

Caffe step 2: Define Network (prototxt)Layers and Blobs often have same name!

Page 157: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017157

Caffe step 2: Define Network (prototxt)Layers and Blobs often have same name!

Learning rates (weight + bias)

Regularization (weight + bias)

Page 158: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017158

Caffe step 2: Define Network (prototxt)Layers and Blobs often have same name!

Learning rates (weight + bias)

Regularization (weight + bias)

Number ofoutput classes

Page 159: Lecture 8: Deep Learning Software - Stanford Universitycs231n.stanford.edu/slides/2017/cs231n_2017_lecture8.pdfLecture 8: Deep Learning Software Fei-Fei Li & Justin Johnson & Serena

Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 8 - April 27, 2017159

Caffe step 2: Define Network (prototxt)Layers and Blobs often have same name!

Learning rates (weight + bias)

Regularization (weight + bias)

Number ofoutput classes

Set these to 0 to freeze a layer