NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks...

148
NEURAL NETS FOR VISION CVPR 2012 Tutorial on Deep Learning Part III Marc'Aurelio Ranzato - [email protected] www.cs.toronto.edu/~ranzato

Transcript of NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks...

Page 1: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

NEURAL NETS FOR VISIONCVPR 2012 Tutorial on Deep Learning

Part III

Marc'Aurelio Ranzato [email protected]/~ranzato

Page 2: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

2

Building an Object Recognition System

“CAR”

CLASSIFIERFEATURE

EXTRACTOR

IDEA: Use data to optimize features for the given task.

Ranzato

Page 3: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

3

Building an Object Recognition System

“CAR”

CLASSIFIER

What we want: Use parameterized function such that a) features are computed efficiently b) features can be trained efficiently

Ranzato

f X ;

Page 4: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

4

Building an Object Recognition System

“CAR”END-TO-END

RECOGNITIONSYSTEM

– Everything becomes adaptive. – No distiction between feature extractor and classifier.– Big non-linear system trained from raw pixels to labels.

Ranzato

Page 5: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

5

Building an Object Recognition System

“CAR”END-TO-END

RECOGNITIONSYSTEM

Q: How can we build such a highly non-linear system?

Ranzato

A: By combining simple building blocks we can make more and more complex systems.

Page 6: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

6

Building A Complicated Function

Ranzato

sin x

cos x

log x

exp x

x3log cos exp sin3x

Simple Functions

One Example of Complicated Function

Page 7: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

7

Building A Complicated Function

Ranzato

sin x

cos x

log x

exp x

x3log cos exp sin3x

Simple Functions

– Function composition is at the core of deep learning methods. – Each “simple function” will have parameters subject to training.

One Example of Complicated Function

Page 8: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

8

Implementing A Complicated Function

Ranzato

log cos exp sin3x

Complicated Function

sin x x3 exp x cos x log x

Page 9: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

9

Intuition Behind Deep Neural Nets

“CAR”

Ranzato

Page 10: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

10

Intuition Behind Deep Neural Nets

“CAR”

Ranzato

NOTE: Each black box can have trainable parameters. Their composition makes a highly non-linear system.

Page 11: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

11

Intuition Behind Deep Neural Nets

“CAR”

Ranzato

NOTE: System produces a hierarchy of features.

Intermediate representations/features

Page 12: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

12

Ranzato

“CAR”

Lee et al. “Convolutional DBN's for scalable unsup. learning...” ICML 2009

Intuition Behind Deep Neural Nets

Page 13: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

13

Ranzato

“CAR”

Lee et al. ICML 2009

Intuition Behind Deep Neural Nets

Page 14: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

14

Ranzato

“CAR”

Lee et al. ICML 2009

Intuition Behind Deep Neural Nets

Page 15: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

15

IDEA # 1Learn features from data

IDEA # 2Use differentiable functions that produce

features efficiently

IDEA # 3End-to-end learning:

no distinction between feature extractor and classifier

IDEA # 4“Deep” architectures:

cascade of simpler non-linear modules

KEY IDEAS OF NEURAL NETS

Ranzato

Page 16: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

16

KEY QUESTIONS

- What is the input-output mapping?

- How are parameters trained?

- How computational expensive is it?

- How well does it work?

Ranzato

Page 17: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

17

Outline- Neural Networks for Supervised Training

- Architecture- Loss function

- Neural Networks for Vision: Convolutional & Tiled- Unsupervised Training of Neural Networks- Extensions:

- semi-supervised / multi-task / multi-modal

- Comparison to Other Methods- boosting & cascade methods- probabilistic models

- Large-Scale Learning with Deep Neural Nets

Ranzato

Page 18: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

18

Outline- Neural Networks for Supervised Training

- Architecture- Loss function

- Neural Networks for Vision: Convolutional & Tiled- Unsupervised Training of Neural Networks- Extensions:

- semi-supervised / multi-task / multi-modal

- Comparison to Other Methods- boosting & cascade methods- probabilistic models

- Large-Scale Learning with Deep Neural Nets

Ranzato

Page 19: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

19

Linear Classifier: SVMInput:

Binary label:

Parameters:

Output prediction:

Loss:

X ∈RD

y

W ∈RD

W T X

Ranzato

L=12∥W∥

2max [0,1−W

TX y ]

L

W T X y

Hinge Loss

1

Page 20: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

20

Linear Classifier: Logistic Regression

Ranzato

L=12∥W∥

2 log 1exp −W

TX y

L

W T X y

Log Loss

X ∈RD

y

W ∈RD

W T X

Input:

Binary label:

Parameters:

Output prediction:

Loss:

Page 21: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

21

Logistic Regression: Probabilistic Interpretation

p y=1∣X =1

1e−WT X

Ranzato

L=− log p y∣X

Q: What is the gradient of w.r.t. ?L W

Input:

Binary label:

Parameters:

Output prediction:

Loss:

X ∈RD

y

W ∈RD

W T X

1

Page 22: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

22

Logistic Regression: Probabilistic Interpretation

p y=1∣X =1

1e−WT X

Ranzato

L= log 1exp −W T X y

Q: What is the gradient of w.r.t. ?L W

Input:

Binary label:

Parameters:

Output prediction:

Loss:

X ∈RD

y

W ∈RD

W T X

1

Page 23: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

23

Ranzato

sin x

cos x

log x

exp x

x3−log

1

1e−WT X

Simple Functions

Complicated Function

Page 24: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

24

Logistic Regression: Computing Loss

Ranzato

−log 1

1e−WT X

Complicated Function

W T X1

1e−u−log p

Lu p

Page 25: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

25

Chain Rule

Ranzato

dLdx

x y

Given and ,

What is ?

y x dL /dy

dL /dx

dLdy

Page 26: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

26

Chain Rule

Ranzato

dLdx

dLdy

x y

dLdx

=dLdy

⋅dydx

Given and ,

What is ?

y x dL /dy

dL /dx

Page 27: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

27

Chain Rule

Ranzato

dLdx

dLdy

x y

All needed information is local!

dLdx

=dLdy

⋅dydx

Given and ,

What is ?

y x dL /dy

dL /dx

Page 28: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

28

Ranzato

Logistic Regression: Computing Gradients

Ranzato

W T X1

1e−u−log p

Lu p

dLdp

X

−1p

Page 29: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

29

Ranzato

Logistic Regression: Computing Gradients

Ranzato

W T X1

1e−u−log p

Lu p

dpdu

dLdp

X

p1− p −1p

Page 30: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

30

Ranzato

Logistic Regression: Computing Gradients

Ranzato

W T X1

1e−u−log p

Lu p

dudW

dpdu

dLdp

dLdW

=dLdp

⋅dpdu

⋅dudW

= p−1 X

X

−1p

p1− pX

Page 31: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

31

Ranzato

What Did We Learn?

Ranzato

LogisticRegression

- Logistic Regression- How to compute gradients of complicated functions

Page 32: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

32

Ranzato

LogisticRegression

LogisticRegression

Neural Network

Page 33: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

33

Neural Network

Ranzato

– A neural net can be thought of as a stack of logistic regression classifiers. Each input is the output of the previous layer.

LogisticRegression

LogisticRegression

LogisticRegression

NOTE: intermediate units can be thought of as linear classifiers trained with implicit target values.

Page 34: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

34

Key Computations: F-Prop / B-Prop

F-PROP

X Z

Ranzato

Page 35: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

35

{∂Z∂ X

,∂Z∂

}

∂ L∂ X

∂ L∂Z

∂L∂

Ranzato

B-PROP

Key Computations: F-Prop / B-Prop

Page 36: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

36

Neural Net: Training

Ranzato

Layer 3Layer 2Layer 1

A) Compute loss on small mini-batch

Page 37: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

37

Neural Net: Training

Ranzato

Layer 3Layer 2Layer 1

F-PROP

A) Compute loss on small mini-batch

Page 38: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

38

Neural Net: Training

Ranzato

Layer 3Layer 2Layer 1

F-PROP

A) Compute loss on small mini-batch

Page 39: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

39

Neural Net: Training

Ranzato

Layer 3Layer 2Layer 1

F-PROP

A) Compute loss on small mini-batch

Page 40: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

40

Neural Net: Training

Ranzato

Layer 3Layer 2Layer 1

A) Compute loss B) Compute gradient w.r.t. parameters

B-PROP

A) Compute loss on small mini-batch

Page 41: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

41

Neural Net: Training

Ranzato

Layer 3Layer 2Layer 1

B-PROP

A) Compute loss B) Compute gradient w.r.t. parametersA) Compute loss on small mini-batch

Page 42: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

42

Neural Net: Training

Ranzato

Layer 3Layer 2Layer 1

B-PROP

A) Compute loss B) Compute gradient w.r.t. parametersA) Compute loss on small mini-batch

Page 43: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

43

Neural Net: Training

Ranzato

Layer 3Layer 2Layer 1

A) Compute loss B) Compute gradient w.r.t. parametersC) Use gradient to update parameters −

dLd

A) Compute loss on small mini-batch

Page 44: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

44

NEURAL NET: ARCHITECTURE

Ranzato

W j∈RM×N , b j∈R

N

h j h j1h j1=W j1

T h jb j1

h j∈RM , h j1∈R

N

Page 45: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

45

Ranzato

h j h j1h j1=W j1

T h jb j1

x =1

1e−x

NEURAL NET: ARCHITECTURE

x

x

Page 46: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

46

Ranzato

NEURAL NET: ARCHITECTURE

x = tanh x

h j h j1h j1=W j1

T h jb j1

x

x

Page 47: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

47

is equivalent to

Ranzato

Graphical Notations

f X ;W

W

X h

hk is called feature, hidden unit, neuron or code unit

Page 48: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

48

MOST COMMON ARCHITECTURE

yX

Errory

NOTE: Multi-layer neural nets with more than two layers are nowadays called deep nets!!

Ranzato

Page 49: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

49

NOTE: User must specify number of layers, number of hidden units, type of layers and loss function.

Ranzato

MOST COMMON ARCHITECTURE

yX

Errory

NOTE: Multi-layer neural nets with more than two layers are nowadays called deep nets!!

Page 50: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

50

MOST COMMON LOSSES

L=12∑i=1

N y i− y i

2

Square Euclidean Distance (regression):

Ranzato

yX

Errory

y , y∈RN

Page 51: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

51

Cross Entropy (classification):

L=−∑i=1

Ny i log y i

Ranzato

MOST COMMON LOSSES

yX

Errory

y , y∈[0,1]N , ∑i=1

Ny i=1, ∑i=1

Ny i=1

Page 52: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

52

1: User specifies loss based on the task.

Ranzato

NEURAL NETS FACTS

2: Any optimization algorithm can be chosen for training.

3: Cost of F-Prop and B-Prop is similar and proportional to the number of layers and their size.

Page 53: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

53

Toy Code: Neural Net Trainer% F-PROPfor i = 1 : nr_layers - 1 [h{i} jac{i}] = logistic(W{i} * h{i-1} + b{i});endh{nr_layers-1} = W{nr_layers-1} * h{nr_layers-2} + b{nr_layers-1};prediction = softmax(h{l-1});

% CROSS ENTROPY LOSSloss = - sum(sum(log(prediction) .* target));

% B-PROPdh{l-1} = prediction - target;for i = nr_layers – 1 : -1 : 1 Wgrad{i} = dh{i} * h{i-1}'; bgrad{i} = sum(dh{i}, 2); dh{i-1} = (W{i}' * dh{i}) .* jac{i-1}; end

% UPDATEfor i = 1 : nr_layers - 1 W{i} = W{i} – (lr / batch_size) * Wgrad{i}; b{i} = b{i} – (lr / batch_size) * bgrad{i}; end Ranzato

Page 54: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

54

TOY EXAMPLE: SYNTHETIC DATA1 input & 1 output100 hidden units in each layer

Ranzato

Page 55: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

55

1 input & 1 output3 hidden layers

Ranzato

TOY EXAMPLE: SYNTHETIC DATA

Page 56: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

56

1 input & 1 output3 hidden layers, 1000 hiddens Regression of cosine

Ranzato

TOY EXAMPLE: SYNTHETIC DATA

Page 57: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

57

1 input & 1 output3 hidden layers, 1000 hiddens Regression of cosine

Ranzato

TOY EXAMPLE: SYNTHETIC DATA

Page 58: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

58

Outline- Neural Networks for Supervised Training

- Architecture- Loss function

- Neural Networks for Vision: Convolutional & Tiled- Unsupervised Training of Neural Networks- Extensions:

- semi-supervised / multi-task / multi-modal

- Comparison to Other Methods- boosting & cascade methods- probabilistic models

- Large-Scale Learning with Deep Neural Nets

Ranzato

Page 59: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

59

Example: 1000x1000 image 1M hidden units

10^12 parameters!!!

- Spatial correlation is local- Better to put resources elsewhere!

Ranzato

FULLY CONNECTED NEURAL NET

Page 60: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

60

LOCALLY CONNECTED NEURAL NET

Ranzato

Example: 1000x1000 image 1M hidden units Filter size: 10x10

100M parameters

Page 61: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

61

LOCALLY CONNECTED NEURAL NET

Ranzato

Example: 1000x1000 image 1M hidden units Filter size: 10x10

100M parameters

Page 62: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

62

LOCALLY CONNECTED NEURAL NET

Ranzato

STATIONARITY? Statistics is similar at different locations

Example: 1000x1000 image 1M hidden units Filter size: 10x10

100M parameters

Page 63: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

63

CONVOLUTIONAL NET

Share the same parameters across different locations: Convolutions with learned kernels

Ranzato

Page 64: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

64

Learn multiple filters.

E.g.: 1000x1000 image 100 Filters Filter size: 10x10

10K parameters

Ranzato

CONVOLUTIONAL NET

Page 65: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

65

NEURAL NETS FOR VISION

A standard neural net applied to images:- scales quadratically with the size of the input- does not leverage stationarity

Solution:- connect each hidden unit to a small patch of the input- share the weight across hidden unitsThis is called: convolutional network.

LeCun et al. “Gradient-based learning applied to document recognition” IEEE 1998

Ranzato

Page 66: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

66

Let us assume filter is an “eye” detector.

Q.: how can we make the detection robust to the exact location of the eye?

Ranzato

CONVOLUTIONAL NET

Page 67: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

67

By “pooling” (e.g., max or average) filterresponses at different locations we gainrobustness to the exact spatial locationof features.

Ranzato

CONVOLUTIONAL NET

Page 68: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

68

CONV NETS: EXTENSIONSOver the years, some new modules have proven to be very effective when plugged into conv-nets:

- L2 Pooling

- Local Contrast Normalization

h i1, x , y=∑ j , k ∈N x , y h i , j , k2

h i1, x , y=hi , x , y−mi , N x , y

i , N x , y

layer i1layer i

x , yN x , y

layer i1layer i

x , yN x , y

Jarrett et al. “What is the best multi-stage architecture for object recognition?” ICCV 2009 Ranzato

Page 69: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

69

CONV NETS: L2 POOLING

-1 0 0 0 0

+1

Kavukguoglu et al. “Learning invariant features ...” CVPR 2009 Ranzato

h i h i1∑i=1

5⋅i

2

L2 Pooling

Page 70: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

70

CONV NETS: L2 POOLING

0 0 0 0+1

+1

Kavukguoglu et al. “Learning invariant features ...” CVPR 2009 Ranzato

h i h i1∑i=1

5⋅i

2

L2 Pooling

Page 71: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

71

LOCAL CONTRAST NORMALIZATION

h i1, x , y=hi , x , y−mi , N x , y

i , N x , y

0

1

0

-1

0

0

0.5

0

-0.5

0

LCN

Ranzato

Page 72: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

72

1

11

1

-9

1

LCN

0

0.5

0

-0.5

0

Ranzato

h i1, x , y=hi , x , y−mi , N x , y

i , N x , y

LOCAL CONTRAST NORMALIZATION

Page 73: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

73

L2 Pooling & Local Contrast Normalizationhelp learning more invariant representations!

CONV NETS: EXTENSIONS

Ranzato

Page 74: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

74

CONV NETS: TYPICAL ARCHITECTURE

Filtering Pooling LCN

One stage (zoom)

LinearLayer

Whole system

1st stage 2nd stage 3rd stage

Input Image

ClassLabels

Ranzato

Page 75: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

75

CONV NETS: TRAINING

Algorithm:Given a small mini-batch- FPROP- BPROP- PARAMETER UPDATE

Since convolutions and sub-sampling are differentiable, we can use standard back-propagation.

Ranzato

Page 76: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

76

CONV NETS: EXAMPLES

- Object category recognition Boureau et al. “Ask the locals: multi-way local pooling for image recognition” ICCV 2011- Segmentation Turaga et al. “Maximin learning of image segmentation” NIPS 2009- OCR Ciresan et al. “MCDNN for Image Classification” CVPR 2012- Pedestrian detection Kavukcuoglu et al. “Learning convolutional feature hierarchies for visual recognition” NIPS 2010- Robotics Sermanet et al. “Mapping and planning..with long range perception” IROS 2008

Ranzato

Page 77: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

77

LIMITATIONS & SOLUTIONS

- requires lots of labeled data to train

- difficult optimization

- scalability

Ranzato

+ unsupervised learning

+ layer-wise training

+ distributed training

Page 78: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

78

Outline- Neural Networks for Supervised Training

- Architecture- Loss function

- Neural Networks for Vision: Convolutional & Tiled- Unsupervised Training of Neural Networks- Extensions:

- semi-supervised / multi-task / multi-modal

- Comparison to Other Methods- boosting & cascade methods- probabilistic models

- Large-Scale Learning with Deep Neural Nets

Ranzato

Page 79: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

79

BACK TO LOGISTIC REGRESSION

Ranzato

Error

input

target

prediction

Page 80: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

80

Unsupervised Learning

Ranzato

Error

input prediction

?

Page 81: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

81

Unsupervised Learning

Ranzato

Q: How should we train the input-output mapping if we do not have target values?

A: Code has to retain information from the input but only if this is similar to training samples.

By better representing only those inputs that are similar to training samples we hope to extract interesting structure (e.g., structure of manifold where data live).

input code

Page 82: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

82

Unsupervised Learning

Ranzato

Q: How to constrain the model to represent training samples better than other data points?

Page 83: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

83

Unsupervised Learning

Ranzato

– reconstruct the input from the code & make code compact (auto-encder with bottle-neck).

– reconstruct the input from the code & make code sparse (sparse auto-encoders)

– add noise to the input or code (denoising auto-encoders)

– make sure that the model defines a distribution that normalizes to 1 (RBM).

see work in LeCun, Ng, Fergus, Lee, Yu's labs

see work in Y. Bengio, Lee's lab

see work in Y. Bengio, Hinton, Lee, Salakthudinov's lab

Page 84: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

84

AUTO-ENCODERS NEURAL NETS

Ranzato

encoder

Error

input reconstructiondecoder

code

– input higher dimensional than code - error: ||reconstruction - input|| - training: back-propagation

2

Page 85: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

85

SPARSE AUTO-ENCODERS

Ranzato

encoder

Error

input

code

– sparsity penalty: ||code||- error: ||reconstruction - input||

- training: back-propagation

SparsityPenalty

1

- loss: sum of squared reconstruction error and sparsity

decoderreconstruction

2

Page 86: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

86

SPARSE AUTO-ENCODERS

Ranzato

encoder

Error

input

code

– input: code:

- loss:

SparsityPenalty

Le et al. “ICA with reconstruction cost..” NIPS 2011

h=W T XX

L X ;W =∥W h−X∥2∑ j∣h j∣

decoderreconstruction

Page 87: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

87

How To Use Unsupervised Learning

Ranzato

1) Given unlabeled data, learn features

Page 88: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

88

How To Use Unsupervised Learning

Ranzato

1) Given unlabeled data, learn features2) Use encoder to produce features and train another layer on the top

Page 89: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

89

How To Use Unsupervised Learning

Ranzato

Layer-wise training of a feature hierarchy

1) Given unlabeled data, learn features2) Use encoder to produce features and train another layer on the top

Page 90: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

90

How To Use Unsupervised Learning

Ranzato

1) Given unlabeled data, learn features2) Use encoder to produce features and train another layer on the top3) feed features to classifier & train just the classifier

Reduced overfitting since features are learned in unsupervised way!

input

label

prediction

Page 91: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

91

How To Use Unsupervised Learning

Ranzato

1) Given unlabeled data, learn features2) Use encoder to produce features and train another layer on the top3) feed features to classifier & jointly train the whole system

Given enough data, this usually yields the best results: end-to-end learning!

input

label

prediction

Page 92: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

92

Outline- Neural Networks for Supervised Training

- Architecture- Loss function

- Neural Networks for Vision: Convolutional & Tiled- Unsupervised Training of Neural Networks- Extensions:

- semi-supervised / multi-task / multi-modal

- Comparison to Other Methods- boosting & cascade methods- probabilistic models

- Large-Scale Learning with Deep Neural Nets

Ranzato

Page 93: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

93

Semi-Supervised Learning

Ranzato

airplane

truck

deer

frog

bird

Page 94: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

94

Semi-Supervised Learning

Ranzato

airplane

truck

deer

frog

bird

LOTS & LOTS OF UNLABELED DATA!!!

Page 95: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

95

Ranzato

Semi-Supervised Learning

input

prediction

Weston et al. “Deep learning via semi-supervised embedding” ICML 2008

label

Loss = supervised_error + unsupervised_error

Page 96: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

96

Multi-Task Learning

Ranzato

Face detection is hard because of lighting, pose, but also occluding goggles.

Face detection could made be easier by face identification.

The identification task may help the detection task.

Page 97: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

97

Multi-Task Learning- Easy to add many error terms to loss function.- Joint learning of related tasks yields better representations.

Example of architecture:

Collobert et al. “NLP (almost) from scratch” JMLR 2011 Ranzato

Page 98: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

98

Multi-Modal Learning

Ranzato

Audio and Video streams are often complimentary to each other.

E.g., audio can provide important clues to improve visual recognition, and vice versa.

Page 99: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

99

Multi-Modal Learning

- Weak assumptions on input distribution- Fully adaptive to data

Ngiam et al. “Multi-modal deep learning” ICML 2011 Ranzato

Example of architecture:modality #1

modality #2

modality #3

Page 100: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

100

Outline- Neural Networks for Supervised Training

- Architecture- Loss function

- Neural Networks for Vision: Convolutional & Tiled- Unsupervised Training of Neural Networks- Extensions:

- semi-supervised / multi-task / multi-modal

- Comparison to Other Methods- boosting & cascade methods- probabilistic models

- Large-Scale Learning with Deep Neural Nets

Ranzato

Page 101: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

101

Boosting & Forests

Deep Nets:- single highly non-linear system- “deep” stack of simpler modules- all parameters are subject to learning

Boosting & Forests:- sequence of “weak” (simple) classifiers that are linearly combined to produce a powerful classifier- subsequent classifiers do not exploit representations of earlier classifiers, it's a “shallow” linear mixture

- typically features are not learned Ranzato

Page 102: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

102

Properties Deep Nets Boosting

Hierarchical features

Easy to parallelize

End-to-end learning

Fast training

Fast at test time

Leverage unlab. data

Adaptive features

Ranzato

Page 103: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

103

Deep Neural-Nets VS Probabilistic ModelsDeep Neural Nets:- mean-field approximations of intractable probabilistic models- usually more efficient- typically more unconstrained (partition function has to be replaced by other constraints, e.g. sparsity).

Hierarchical Probabilistic Models (DBN, DBM, etc.):- in the most interesting cases, they are intractable- they better deal with uncertainty- they can be easily combined

Ranzato

Page 104: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

104

Example: Auto-Encoder

E [Z∣X ]= W T Xbe

Neural Net:

Probabilistic Model (Gaussian RBM):

X=W d Zbd

Z= W eT Xbe

E [ X ∣Z ]=W Zbd

Ranzato

code

reconstruction

Page 105: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

105

Properties Deep Nets Probab. Models

Hierarchical features

Models uncertainty

End-to-end learning

Fast training

Fast at test time

Leverage unlab. data

Adaptive features

Ranzato

Page 106: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

106

Outline- Neural Networks for Supervised Training

- Architecture- Loss function

- Neural Networks for Vision: Convolutional & Tiled- Unsupervised Training of Neural Networks- Extensions:

- semi-supervised / multi-task / multi-modal

- Comparison to Other Methods- boosting & cascade methods- probabilistic models

- Large-Scale Learning with Deep Neural Nets

Ranzato

Page 107: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

107

Tera-Scale Deep Learning @ Google

Observation #1: more features always improve performance unless data is scarce.

Observation #2: deep learning methods have higher capacity and have the potential to model data better.

Q #1: Given lots of data and lots of machines, can we scale up deep learning methods?

Q #2: Will deep learning methods perform much better?

Ranzato

Page 108: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

108

The Challenge

– best optimizer in practice is on-line SGD which is naturally sequential, hard to parallelize.

– layers cannot be trained independently and in parallel, hard to distribute

– model can have lots of parameters that may clog the network, hard to distribute across machines

A Large Scale problem has: – lots of training samples (>10M)– lots of classes (>10K) and – lots of input dimensions (>10K).

Ranzato

Page 109: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

109

Our Solution

input

1st layer

2nd layer

Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012 Ranzato

Page 110: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

110

1st machine

2nd

machine3rd

machine

Ranzato

Our Solution

input

1st layer

2nd layer

Page 111: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

111

MODEL PARALLELISM

Ranzato

Our Solution

1st machine

2nd

machine3rd

machine

Page 112: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

112

Distributed Deep Nets

Deep Net MODEL PARALLELISM

input #1

RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012

Page 113: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

113

Distributed Deep Nets

MODEL PARALLELISM

+DATA

PARALLELISM

RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012

input #1input #2

input #3

Page 114: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

114

Asynchronous SGD

PARAMETER SERVER

1st replica 2nd replica 3rd replicaRanzato

Page 115: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

115

Asynchronous SGD

PARAMETER SERVER

1st replica 2nd replica 3rd replicaRanzato

∂ L∂1

Page 116: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

116

Asynchronous SGD

PARAMETER SERVER

1st replica 2nd replica 3rd replicaRanzato

1

Page 117: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

117

Asynchronous SGD

PARAMETER SERVER(update parameters)

1st replica 2nd replica 3rd replicaRanzato

Page 118: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

118

Asynchronous SGD

PARAMETER SERVER

1st replica 2nd replica 3rd replicaRanzato

∂ L∂2

Page 119: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

119

Asynchronous SGD

PARAMETER SERVER

1st replica 2nd replica 3rd replicaRanzato

2

Page 120: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

120

Asynchronous SGD

PARAMETER SERVER(update parameters)

1st replica 2nd replica 3rd replicaRanzato

Page 121: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

121

Unsupervised Learning With 1B Parameters

DATA: 10M youtube (unlabeled) frames of size 200x200.

RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012

Page 122: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

122

Deep Net:– 3 stages

– each stage consists of local filtering, L2 pooling, LCN - 18x18 filters- 8 filters at each location- L2 pooling and LCN over 5x5 neighborhoods

– training jointly the three layers by:- reconstructing the input of each layer- sparsity on the code

RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012

Unsupervised Learning With 1B Parameters

Page 123: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

123

Deep Net:– 3 stages

– each stage consists of local filtering, L2 pooling, LCN - 18x18 filters- 8 filters at each location- L2 pooling and LCN over 5x5 neighborhoods

– training jointly the three layers by:- reconstructing the input of each layer- sparsity on the code

RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012

Unsupervised Learning With 1B Parameters

1B parameters!!!

Page 124: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

124

Validating Unsupervised Learning

RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012

The network has seen lots of objects during training, but without any label.

Q.: how can we validate unsupervised learning?

Q.: Did the network form any high-level representation? E.g., does it have any neuron responding for faces?

– build validation set with 50% faces, 50% random images- study properties of neurons

Page 125: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

125

Validating Unsupervised Learning

1st stage 2nd stage 3rd stage

neu

ron

respo

nses

RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012

Page 126: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

126

Top Images For Best Face Neuron

Ranzato

Page 127: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

127

Best Input For Face Neuron

RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012

Page 128: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

128

Unsupervised + Supervised (ImageNet)

1st stage 2nd stage 3rd stage

Input Image y

COST

y

RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012

Page 129: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

129

Object Recognition on ImageNet

IMAGENET v.2011 (16M images, 20K categories)

METHOD ACCURACY %

Weston & Bengio 2011

Deep Net (from random)

Deep Net (from unsup.)

9.3

13.6

15.8

Linear Classifier on deep features 13.1

RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012

Page 130: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

130

Top Inputs After Supervision

Ranzato

Page 131: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

131

Ranzato

Top Inputs After Supervision

Page 132: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

132

Experiments: and many more...

- automatic speech recognition

- natural language processing

- biomed applications

- finance

Generic learning algorithm!!

Ranzato

Page 133: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

133

ReferencesTutorials & Background Material

Ranzato

– Yoshua Bengio, Learning Deep Architectures for AI, Foundations and Trends in Machine Learning, 2(1), pp.1-127, 2009.– LeCun, Chopra, Hadsell, Ranzato, Huang: A Tutorial on Energy-Based Learning, in Bakir, G. and Hofman, T. and Schölkopf, B. and Smola, A. and Taskar, B. (Eds), Predicting Structured Data, MIT Press, 2006

Convolutional Nets– LeCun, Bottou, Bengio and Haffner: Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11):2278-2324, November 1998– Jarrett, Kavukcuoglu, Ranzato, LeCun: What is the Best Multi-Stage Architecture for Object Recognition?, Proc. International Conference on Computer Vision (ICCV'09), IEEE, 2009- Kavukcuoglu, Sermanet, Boureau, Gregor, Mathieu, LeCun: Learning Convolutional Feature Hierachies for Visual Recognition, Advances in Neural Information Processing Systems (NIPS 2010), 23, 2010

Page 134: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

134

Unsupervised Learning

Ranzato

– ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning. Le, Karpenko, Ngiam, Ng. In NIPS*2011– Rifai, Vincent, Muller, Glorot, Bengio, Contracting Auto-Encoders: Explicit invariance during feature extraction, in: Proceedings of the Twenty-eight International Conference on Machine Learning (ICML'11), 2011- Vincent, Larochelle, Lajoie, Bengio, Manzagol, Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion, Journal of Machine Learning Research, 11:3371--3408, 2010.- Gregor, Szlam, LeCun: Structured Sparse Coding via Lateral Inhibition, Advances in Neural Information Processing Systems (NIPS 2011), 24, 2011- Kavukcuoglu, Ranzato, LeCun. "Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition". ArXiv 1010.3467 2008

- Hinton, Krizhevsky, Wang, Transforming Auto-encoders, ICANN, 2011

Multi-modal Learning– Multimodal deep learning, Ngiam, Khosla, Kim, Nam, Lee, Ng. In Proceedings of the Twenty-Eighth International Conference on Machine Learning, 2011.

Page 135: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

135

– Gregor, LeCun “Emergence of complex-like cells in a temporal product network with local receptive fields” Arxiv. 2009– Ranzato, Mnih, Hinton “Generating more realistic images using gated MRF's” NIPS 2010– Le, Ngiam, Chen, Chia, Koh, Ng “Tiled convolutional neural networks” NIPS 2010

Locally Connected Nets

Ranzato

Distributed Learning– Le, Ranzato, Monga, Devin, Corrado, Chen, Dean, Ng. "Building High-Level Features Using Large Scale Unsupervised Learning". International Conference of Machine Learning (ICML 2012), Edinburgh, 2012.

Papers on Scene Parsing– Farabet, Couprie, Najman, LeCun, “Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers”, in Proc. of the International Conference on Machine Learning (ICML'12), Edinburgh, Scotland, 2012.

– Farabet, Couprie, Najman, LeCun, “Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers”, in Proc. of the International Conference on Machine Learning (ICML'12), Edinburgh, Scotland, 2012.- Socher, Lin, Ng, Manning, “Parsing Natural Scenes and Natural Language with Recursive Neural Networks”. International Conference of Machine Learning (ICML 2011) 2011.

Page 136: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

136

Papers on Object Recognition

Ranzato

- Boureau, Le Roux, Bach, Ponce, LeCun: Ask the locals: multi-way local pooling for image recognition, Proc. International Conference on Computer Vision 2011- Sermanet, LeCun: Traffic Sign Recognition with Multi-Scale Convolutional Networks, Proceedings of International Joint Conference on Neural Networks (IJCNN'11)- Ciresan, Meier, Gambardella, Schmidhuber. Convolutional Neural Network Committees For Handwritten Character Classification. 11th International Conference on Document Analysis and Recognition (ICDAR 2011), Beijing, China.- Ciresan, Meier, Masci, Gambardella, Schmidhuber. Flexible, High Performance Convolutional Neural Networks for Image Classification. International Joint Conference on Artificial Intelligence IJCAI-2011.Papers on Action Recognition

– Learning hierarchical spatio-temporal features for action recognition with independent subspace analysis, Le, Zou, Yeung, Ng. In Computer Vision and Pattern Recognition (CVPR), 2011Papers on Segmentation– Turaga, Briggman, Helmstaedter, Denk, Seung Maximin learning of image segmentation. NIPS, 2009.

Page 137: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

137

Papers on Vision for Robotics

Ranzato

– Hadsell, Sermanet, Scoffier, Erkan, Kavackuoglu, Muller, LeCun: Learning Long-Range Vision for Autonomous Off-Road Driving, Journal of Field Robotics, 26(2):120-144, February 2009,

– Serre, Wolf, Bileschi, Riesenhuber, Poggio. Robust Object Recognition with Cortex-like Mechanisms, IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 3, 411-426, 2007.- Pinto, Doukhan, DiCarlo, Cox "A high-throughput screening approach to discovering good forms of biologically inspired visual representation." {PLoS} Computational Biology. 2009

Papers on Biological Inspired Vision

Deep Convex Nets & Deconv-Nets– Deng, Yu. “Deep Convex Network: A Scalable Architecture for Speech Pattern Classification.” Interspeech, 2011.- Zeiler, Taylor, Fergus "Adaptive Deconvolutional Networks for Mid and High Level Feature Learning." ICCV. 2011

Page 138: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

138

Papers on Embedded ConvNets for Real-Time Vision Applications

Ranzato

– Farabet, Martini, Corda, Akselrod, Culurciello, LeCun: NeuFlow: A Runtime Reconfigurable Dataflow Processor for Vision, Workshop on Embedded Computer Vision, CVPR 2011,

Papers on Image Denoising Using Neural Nets– Burger, Schuler, Harmeling: Image Denoisng: Can Plain Neural Networks Compete with BM3D?, Computer Vision and Pattern Recognition, CVPR 2012,

Page 139: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

139

Software & LinksDeep Learning website

Matlab code for R-ICA unsupervised algorithm

Python-based learning library

C++ code for ConvNets

Lush learning library which includes ConvNets

Torch7: learning library that supports neural net training

Ranzato

– http://deeplearning.net/

– http://eblearn.sourceforge.net/

– http://deeplearning.net/

- http://lush.sourceforge.net/

– http://eblearn.sourceforge.net/

- http://deeplearning.net/software/theano/

- http://ai.stanford.edu/~quocle/rica_release.zip

- http://lush.sourceforge.net/

http://www.torch.ch

Page 140: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

140

Software & Links

Code used to generate demo for this tutorial

Ranzato

- http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/

Page 141: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

141

Acknowledgements

Ranzato

Quoc Le, Andrew Ng

Jeff Dean, Kai Chen, Greg Corrado, Matthieu Devin, Mark Mao, Rajat Monga, Paul Tucker, Samy Bengio

Yann LeCun, Pierre Sermanet, Clement Farabet

Page 142: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

142

Visualizing Learned Features

Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato

Q: can we interpret the learned features?

Page 143: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

143

Visualizing Learned Features

Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato

W h=W 1h1W 2h2reconstruction:

0.9 + 0.7 + 0.5 + 1.0 + ...=≈

Columns of show what each code unit represents.W

Page 144: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

144

Visualizing Learned Features

Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato

1st layer features

Page 145: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

145

Visualizing Learned Features

Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato

Q: How about the second layer features?

A: Similarly, each second layer code unit can be visualized by taking its bases and then projecting those bases in image space through the first layer decoder.

Page 146: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

146

Visualizing Learned Features

Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato

Q: How about the second layer features?

Missing edges have 0 weight.Light gray nodes have zero value.

A: Similarly, each second layer code unit can be visualized by taking its bases and then projecting those bases in image space through the first layer decoder.

Page 147: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

147

Visualizing Learned Features

Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato

1st layer features

2nd layer features

Q: how are these images computed?

Page 148: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/tutorials/deep_learning... · - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision:

148

Example of Feature Learning

Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007

1st layer features

2nd layer features

Ranzato