Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim...

52
Statistical Inference and Information Theory Laboratory 1 Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST

Transcript of Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim...

Page 1: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

1

Introduction to Deep Learning

Junmo Kim

School of Electrical Engineering

2016. 12. 1

KAIST

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA

Page 2: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

2

Artificial Neural Network

http://pharmacyebooks.com/wp-content/uploads/2010/10/Artificial-neural-networks-fundamentals-computing-design-and_page3_image1.png http://employees.csbsju.edu/ltennison/PSYC340/LTP.jpg

Perceptron

Page 3: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

3

History of Neural Networks

• First generation (1958~): perceptrons (F. Rosenblatt, 1958) – Criticized by Marvin Minsky about XOR problem

• Second generation (1986~) : multilayer perceptrons

– Trained by back-propagating error signal (1986)

– Mostly used shallow network with 1 hidden layer

• Third generation (2006~ ): deep learning – Deep belief nets (Hinton, 2006)

– Deep neural network (DNN), convolutional neural network (CNN)

Page 4: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

4

Difficulty of Training Deep Neural Network

• Vanishing gradient problem: problems with non-linear activation

– Gradient is progressively getting more dilute

– Below top few layers, correction signal is minimal

Page 5: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

5

Difficulty of Training Deep Neural Network

• Typically requires lots of labeled data

• Overfitting problem: Given limited amounts of labeled data, training via

back-propagation does not work well

– Deep networks trained with back-propagation sometimes perform worse than

shallow networks (Overfitting)

Figure: Bishop

Page 6: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

6

What is Deep Learning ?

If # of hidden layers <= 1, Shallow network. >=2 : deep network

Deep learning is a procedure of training a deep network.

If # of hidden layers <=1 shallow network

If # of hidden layers >=2 deep network

Slide: Hinton

Page 7: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

7

Genealogy of Deep Learning

Restricted Boltzmann

Machine (RBM) Perceptron

Multilayer Perceptron

Deep Neural Network (DNN)

Deep Belief

Network (DBN)

Deep Boltzmann Machine (DBM)

Convolutional RBM (CRBM)

Convolutional DBN (CDBN) Convolutional

Neural Network (CNN, ConvNet)

Stochastic Deterministic Shallow

Deep

Unsupervised Supervised

Rectified Linear Unit (ReLU) Big Data

Regularization 1st breakthrough

2nd breakthrough

Page 8: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

8

Breakthrough in 2006

Slide: Bengio

Page 9: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

9

Deep Learning @ Google : Google Brain

Google Brain: Google’deep learning project

By Prof. Andrew Ng in 2011

Neural network can learn high level concept like ‘cat’ just watching YouTube video.

Page 10: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

10

10 million 200x200 images

1 billion parameters

Page 11: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

11

Training procedure (2012)

What features can we learn if we train a massive model on a massive amount of data. Can we learn a “grandmother cell”?

• Train on 10 million images (YouTube)

• 1000 machines (16,000 cores) for 1 week.

• Test on novel images

Training set (YouTube) Test set (FITW + ImageNet) Slide: Ng

Page 12: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

12

Top stimuli from the test set Optimal stimulus by numerical optimization

The face neuron

Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012

Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012

Slide: Ng

Page 13: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

13

Cat neuron

Top Stimuli from the test set Average of top stimuli from test set

Slide: Ng

Page 14: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

14

Replacing Feature Extraction Stage

• Traditional approach Based on Hand-crafted features

• Deep learning automatically discover hierarchical feature representation feature learning

Feature Extraction

Classifier Input Output

SIFT

Deep Network Input Output

Page 15: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

15

Multiple Levels of Feature Representation

• Visible layer (Raw input image, pixels) 1st layer 2nd layer 3rd layer

• Represents more and more abstract features of the raw input, e.g., edges, local shapes, object parts,

objects, etc.

Page 16: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

16

Genealogy of Deep Learning

Restricted Boltzmann

Machine (RBM) Perceptron

Multilayer Perceptron

Deep Neural Network (DNN)

Deep Belief

Network (DBN)

Deep Boltzmann Machine (DBM)

Convolutional RBM (CRBM)

Convolutional DBN (CDBN) Convolutional

Neural Network (CNN, ConvNet)

Stochastic Deterministic Shallow

Deep

Unsupervised Supervised

Rectified Linear Unit (ReLU) Big Data

Regularization 1st breakthrough

2nd breakthrough

Page 17: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

17

Recent Advances: Rectified Linear Unit (ReLU) 2010, 2011

• Deep neural net (DNN) with rectified linear unit (ReLU) activation function sparse activation

• ReLU has slope 1 at positive input ReLU solves gradient vanishing problem.

h3

v

slope: 1

f(x) =

½x x ¸ 0

0 x < 0

Page 18: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

18

Rectified Linear Unit (ReLU)

h3= W

3W

2W

1v = Av

h3= A 1v

h3= A 2v

h3= A 3v

Slide: Ranzato x

y

z

(x; y)! z

in out

Page 19: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

19

Recent Advances: Rectified Linear Unit (ReLU) 2010, 2011

h3=

·. . .

¸

| {z }4£ 2

[. . . ]|{z}2£ 2

[¢¢¢¢¢¢]| {z }

2£ 4

v = A 1v

h3 =

·. . .

¸

| {z }4£ 2

[. . .]|{z}2£ 3

[¢¢¢¢]| {z }3£ 4

v = A2vh3= A 3v

v

h3

h3

v

h3 = A1vh3 = A2v

Page 20: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

20

Speech Recognition (2011)

According to Google researchers, the voice error rate in the new version of Android--after

adding insights from deep learning--stands at 25% lower than previous versions of the

software.

Slide: Bengio

Page 21: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

21

Large Scale Image Classification (2012)

Same model as LeCun’89 but:

Page 22: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

22

Deep Learning in the News

Slide: Bengio

Page 23: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

23

Deep Learning in the News

Page 24: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

24

Deep Learning in the News

Page 25: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

25

DeepFace (CVPR 2014)

• Face recognition pipeline: – Detect align represent classify

• Alignment: employed explicit 3D face modeling

• Representation: a 9-layer deep neural network – More than 120 million parameters

– Locally connected layers without weight sharing

– Trained with 4 million labeled face images

• 97% recognition rate on LFW

Taigman et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR’14

Page 26: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

26

V Mnih et al. Nature 518, 529-533 (2015) doi:10.1038/nature14236

Deep Learning @ Google : DeepMind

https://www.youtube.com/watch?v=V1eYniJ0Rnk

Page 27: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

27

Deep Learning @ Google : DeepMind

Page 28: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

28

AlphaGo

Tree search: look ahead next moves

Policy network: select next good moves

Value network: decide who is winning based on the current board patterns

Policy & value networks are convolutional neural networks, which perform pattern

recognition.

Page 29: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

29

Applications: Large Scale Image Classification

• ImageNet – Over 15 million labeled high-resolution

images – Roughly 22,000 categories – Collected from the web – Labeled by human labelers using

Amazon’s Mechanical Turk crowd-sourcing tool.

• ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) – Uses a subset of imageNet

• 1000 categories • 1.2 million training images • 50,000 validation images • 150,000 test images

– Report two error rates: • Top-1 and top-5

Page 30: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

30

Applications: Large Scale Image Classification

• ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky,

2012)

– Trained a large, deep convolutional neural network : 650,000 neurons, 60 million parameters.

– It classifies the 1.2 million high-resolution images into the 1000 different classes

– Top-5 error rate : 15.3 % (cf. 2nd best: 26.2%)

Classification results (five most probable labels)

Page 31: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

31

GoogLeNet

GoogLeNet (2014)

Slide: Christian Szegedy

Page 32: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

32

GoogLeNet (2014)

Slide: Christian Szegedy

Zeiler-Fergus Architecture (1 tower)

Convolution Pooling Softmax Other

Page 33: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

33

GoogLeNet (2014)

Slide: Christian Szegedy

Page 34: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

34

Classification failure cases

Groundtruth: ????

Slide: Christian Szegedy

Page 35: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

35

Classification failure cases

Groundtruth: coffee mug

Slide: Christian Szegedy

Page 36: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

36

Classification failure cases

Groundtruth: coffee mug GoogLeNet: ● table lamp ● lamp shade ● printer ● projector ● desktop computer

Slide: Christian Szegedy

Page 37: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

37

Large Scale Image Classification (2015)

Page 38: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

38

Microsoft Research Asia (MSRA)

Slide: Kaiming He

Page 39: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

39

Microsoft Research Asia (MSRA)

Slide: Kaiming He

Page 40: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

40

Microsoft Research Asia (MSRA)

Slide: Kaiming He

Page 41: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

41

Microsoft Research Asia (MSRA)

Slide: Kaiming He

Page 42: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

42

Microsoft Research Asia (MSRA)

Slide: Kaiming He

Page 43: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

43

Microsoft Research Asia (MSRA)

Slide: Kaiming He

Page 44: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

44

Microsoft Research Asia (MSRA)

Slide: Kaiming He

Page 45: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

45

Microsoft Research Asia (MSRA)

Slide: Kaiming He

Page 46: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

46

AI Painter : Artistic Style Transfer

https://www.instapainting.com/ai-painter

Page 47: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

47

Language Modeling

Fei-Fei Li & Andrej Karpathy, CS231n: Convolutional Neural Networks for Visual Recognition, Winter 2015. (Stanford University)

p (next wordjprevious words) e.g. sentence completion

Page 48: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

48

Machine Translation

Fei-Fei Li & Andrej Karpathy, CS231n: Convolutional Neural Networks for Visual Recognition, Winter 2015. (Stanford University)

conceptual sumary of ABC<EOS>

p (next wordjprevious words)

Page 49: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

49

Image Sentence Datasets

Page 50: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

50

Image Captioning

Page 51: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

51

Image Captioning

Page 52: Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim School of Electrical Engineering 2016. 12. 1 KAIST TexPoint fonts used in EMF. Read

Statistical Inference and Information Theory Laboratory

52

Summary

• Deep learning can discover hierarchical feature representation from data.

• Depth matters for large scale visual recognition – Lots of non-linearity

• Beyond image classification – Artistic style transfer

– Natural language modeling

– Machine translation

– Image captioning