Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim...

Statistical Inference and Information Theory Laboratory

1

Introduction to Deep Learning

Junmo Kim

School of Electrical Engineering

2016. 12. 1

KAIST

TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA


2

Artificial Neural Network

http://pharmacyebooks.com/wp-content/uploads/2010/10/Artificial-neural-networks-fundamentals-computing-design-and_page3_image1.png http://employees.csbsju.edu/ltennison/PSYC340/LTP.jpg

Perceptron

http://pharmacyebooks.com/wp-content/uploads/2010/10/Artificial-neural-networks-fundamentals-computing-design-and_page3_image1.png
















http://employees.csbsju.edu/ltennison/PSYC340/LTP.jpg


3

History of Neural Networks

• First generation (1958~): perceptrons (F. Rosenblatt, 1958) – Criticized by Marvin Minsky about XOR problem

• Second generation (1986~) : multilayer perceptrons

– Trained by back-propagating error signal (1986)

– Mostly used shallow network with 1 hidden layer

• Third generation (2006~ ): deep learning – Deep belief nets (Hinton, 2006)

– Deep neural network (DNN), convolutional neural network (CNN)


4

Difficulty of Training Deep Neural Network

• Vanishing gradient problem: problems with non-linear activation

– Gradient is progressively getting more dilute

– Below top few layers, correction signal is minimal


5

Difficulty of Training Deep Neural Network

• Typically requires lots of labeled data

• Overfitting problem: Given limited amounts of labeled data, training via

back-propagation does not work well

– Deep networks trained with back-propagation sometimes perform worse than

shallow networks (Overfitting)

Figure: Bishop


6

What is Deep Learning ?

If # of hidden layers <= 1, Shallow network. >=2 : deep network

Deep learning is a procedure of training a deep network.

If # of hidden layers <=1 shallow network

If # of hidden layers >=2 deep network

Slide: Hinton


7

Genealogy of Deep Learning

Restricted Boltzmann

Machine (RBM) Perceptron

Multilayer Perceptron

Deep Neural Network (DNN)

Deep Belief

Network (DBN)

Deep Boltzmann Machine (DBM)

Convolutional RBM (CRBM)

Convolutional DBN (CDBN) Convolutional

Neural Network (CNN, ConvNet)

Stochastic Deterministic Shallow

Deep

Unsupervised Supervised

Rectified Linear Unit (ReLU) Big Data

Regularization 1st breakthrough

2nd breakthrough


8

Breakthrough in 2006

Slide: Bengio


9

Deep Learning @ Google : Google Brain

Google Brain: Google’deep learning project

By Prof. Andrew Ng in 2011

Neural network can learn high level concept like ‘cat’ just watching YouTube video.


10

10 million 200x200 images

1 billion parameters


11

Training procedure (2012)

What features can we learn if we train a massive model on a massive amount of data. Can we learn a “grandmother cell”?

• Train on 10 million images (YouTube)

• 1000 machines (16,000 cores) for 1 week.

• Test on novel images

Training set (YouTube) Test set (FITW + ImageNet) Slide: Ng


12

Top stimuli from the test set Optimal stimulus by numerical optimization

The face neuron

Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012

Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012

Slide: Ng


13

Cat neuron

Top Stimuli from the test set Average of top stimuli from test set

Slide: Ng


14

Replacing Feature Extraction Stage

• Traditional approach Based on Hand-crafted features

• Deep learning automatically discover hierarchical feature representation feature learning

Feature Extraction

Classifier Input Output

SIFT

Deep Network Input Output


15

Multiple Levels of Feature Representation

• Visible layer (Raw input image, pixels) 1st layer 2nd layer 3rd layer

• Represents more and more abstract features of the raw input, e.g., edges, local shapes, object parts,

objects, etc.


16

Genealogy of Deep Learning

Restricted Boltzmann

Machine (RBM) Perceptron

Multilayer Perceptron

Deep Neural Network (DNN)

Deep Belief

Network (DBN)

Deep Boltzmann Machine (DBM)

Convolutional RBM (CRBM)

Convolutional DBN (CDBN) Convolutional

Neural Network (CNN, ConvNet)

Stochastic Deterministic Shallow

Deep

Unsupervised Supervised

Rectified Linear Unit (ReLU) Big Data

Regularization 1st breakthrough

2nd breakthrough


17

Recent Advances: Rectified Linear Unit (ReLU) 2010, 2011

• Deep neural net (DNN) with rectified linear unit (ReLU) activation function sparse activation

• ReLU has slope 1 at positive input ReLU solves gradient vanishing problem.

h3

v

slope: 1

f(x) =

½x x ¸ 0

0 x < 0


18

Rectified Linear Unit (ReLU)

h3= W

3W

2W

1v = Av

h3= A 1v

h3= A 2v

h3= A 3v

Slide: Ranzato x

y

z

(x; y)! z

in out


19

Recent Advances: Rectified Linear Unit (ReLU) 2010, 2011

h3=

·. . .

¸

| {z }4£ 2

[. . . ]|{z}2£ 2

[¢¢¢¢¢¢]| {z }

2£ 4

v = A 1v

h3 =

·. . .

¸

| {z }4£ 2

[. . .]|{z}2£ 3

[¢¢¢¢]| {z }3£ 4

v = A2vh3= A 3v

v

h3

h3

v

h3 = A1vh3 = A2v


20

Speech Recognition (2011)

According to Google researchers, the voice error rate in the new version of Android--after

adding insights from deep learning--stands at 25% lower than previous versions of the

software.

Slide: Bengio


21

Large Scale Image Classification (2012)

Same model as LeCun’89 but:


22

Deep Learning in the News

Slide: Bengio


23



24



25

DeepFace (CVPR 2014)

• Face recognition pipeline: – Detect align represent classify

• Alignment: employed explicit 3D face modeling

• Representation: a 9-layer deep neural network – More than 120 million parameters

– Locally connected layers without weight sharing

– Trained with 4 million labeled face images

• 97% recognition rate on LFW

Taigman et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR’14


26

V Mnih et al. Nature 518, 529-533 (2015) doi:10.1038/nature14236

Deep Learning @ Google : DeepMind

https://www.youtube.com/watch?v=V1eYniJ0Rnk

https://www.youtube.com/watch?v=V1eYniJ0Rnk


27

Deep Learning @ Google : DeepMind


28

AlphaGo

Tree search: look ahead next moves

Policy network: select next good moves

Value network: decide who is winning based on the current board patterns

Policy & value networks are convolutional neural networks, which perform pattern

recognition.


29

Applications: Large Scale Image Classification

• ImageNet – Over 15 million labeled high-resolution

images – Roughly 22,000 categories – Collected from the web – Labeled by human labelers using

Amazon’s Mechanical Turk crowd-sourcing tool.

• ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) – Uses a subset of imageNet

• 1000 categories • 1.2 million training images • 50,000 validation images • 150,000 test images

– Report two error rates: • Top-1 and top-5


30

Applications: Large Scale Image Classification

• ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky,

2012)

– Trained a large, deep convolutional neural network : 650,000 neurons, 60 million parameters.

– It classifies the 1.2 million high-resolution images into the 1000 different classes

– Top-5 error rate : 15.3 % (cf. 2nd best: 26.2%)

Classification results (five most probable labels)


31

GoogLeNet

GoogLeNet (2014)

Slide: Christian Szegedy


32

GoogLeNet (2014)


Zeiler-Fergus Architecture (1 tower)

Convolution Pooling Softmax Other


33

GoogLeNet (2014)



34

Classification failure cases

Groundtruth: ????



35


Groundtruth: coffee mug



36


Groundtruth: coffee mug GoogLeNet: ● table lamp ● lamp shade ● printer ● projector ● desktop computer



37

Large Scale Image Classification (2015)


38

Microsoft Research Asia (MSRA)

Slide: Kaiming He


39


Slide: Kaiming He


40


Slide: Kaiming He


41


Slide: Kaiming He


42


Slide: Kaiming He


43


Slide: Kaiming He


44


Slide: Kaiming He


45


Slide: Kaiming He


46

AI Painter : Artistic Style Transfer

https://www.instapainting.com/ai-painter





47

Language Modeling

Fei-Fei Li & Andrej Karpathy, CS231n: Convolutional Neural Networks for Visual Recognition, Winter 2015. (Stanford University)

p (next wordjprevious words) e.g. sentence completion


48

Machine Translation

Fei-Fei Li & Andrej Karpathy, CS231n: Convolutional Neural Networks for Visual Recognition, Winter 2015. (Stanford University)

conceptual sumary of ABC<EOS>

p (next wordjprevious words)


49

Image Sentence Datasets


50

Image Captioning


51

Image Captioning


52

Summary

• Deep learning can discover hierarchical feature representation from data.

• Depth matters for large scale visual recognition – Lots of non-linearity

• Beyond image classification – Artistic style transfer

– Natural language modeling

– Machine translation

– Image captioning

Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim...

Documents

Transcript of Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim...