Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim...
Transcript of Introduction to Deep Learning - DGIST · 2017-01-02 · Introduction to Deep Learning Junmo Kim...
Statistical Inference and Information Theory Laboratory
1
Introduction to Deep Learning
Junmo Kim
School of Electrical Engineering
2016. 12. 1
KAIST
TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA
Statistical Inference and Information Theory Laboratory
2
Artificial Neural Network
http://pharmacyebooks.com/wp-content/uploads/2010/10/Artificial-neural-networks-fundamentals-computing-design-and_page3_image1.png http://employees.csbsju.edu/ltennison/PSYC340/LTP.jpg
Perceptron
Statistical Inference and Information Theory Laboratory
3
History of Neural Networks
• First generation (1958~): perceptrons (F. Rosenblatt, 1958) – Criticized by Marvin Minsky about XOR problem
• Second generation (1986~) : multilayer perceptrons
– Trained by back-propagating error signal (1986)
– Mostly used shallow network with 1 hidden layer
• Third generation (2006~ ): deep learning – Deep belief nets (Hinton, 2006)
– Deep neural network (DNN), convolutional neural network (CNN)
Statistical Inference and Information Theory Laboratory
4
Difficulty of Training Deep Neural Network
• Vanishing gradient problem: problems with non-linear activation
– Gradient is progressively getting more dilute
– Below top few layers, correction signal is minimal
Statistical Inference and Information Theory Laboratory
5
Difficulty of Training Deep Neural Network
• Typically requires lots of labeled data
• Overfitting problem: Given limited amounts of labeled data, training via
back-propagation does not work well
– Deep networks trained with back-propagation sometimes perform worse than
shallow networks (Overfitting)
Figure: Bishop
Statistical Inference and Information Theory Laboratory
6
What is Deep Learning ?
If # of hidden layers <= 1, Shallow network. >=2 : deep network
Deep learning is a procedure of training a deep network.
If # of hidden layers <=1 shallow network
If # of hidden layers >=2 deep network
Slide: Hinton
Statistical Inference and Information Theory Laboratory
7
Genealogy of Deep Learning
Restricted Boltzmann
Machine (RBM) Perceptron
Multilayer Perceptron
Deep Neural Network (DNN)
Deep Belief
Network (DBN)
Deep Boltzmann Machine (DBM)
Convolutional RBM (CRBM)
Convolutional DBN (CDBN) Convolutional
Neural Network (CNN, ConvNet)
Stochastic Deterministic Shallow
Deep
Unsupervised Supervised
Rectified Linear Unit (ReLU) Big Data
Regularization 1st breakthrough
2nd breakthrough
Statistical Inference and Information Theory Laboratory
8
Breakthrough in 2006
Slide: Bengio
Statistical Inference and Information Theory Laboratory
9
Deep Learning @ Google : Google Brain
Google Brain: Google’deep learning project
By Prof. Andrew Ng in 2011
Neural network can learn high level concept like ‘cat’ just watching YouTube video.
Statistical Inference and Information Theory Laboratory
10
10 million 200x200 images
1 billion parameters
Statistical Inference and Information Theory Laboratory
11
Training procedure (2012)
What features can we learn if we train a massive model on a massive amount of data. Can we learn a “grandmother cell”?
• Train on 10 million images (YouTube)
• 1000 machines (16,000 cores) for 1 week.
• Test on novel images
Training set (YouTube) Test set (FITW + ImageNet) Slide: Ng
Statistical Inference and Information Theory Laboratory
12
Top stimuli from the test set Optimal stimulus by numerical optimization
The face neuron
Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012
Le, et al., Building high-level features using large-scale unsupervised learning. ICML 2012
Slide: Ng
Statistical Inference and Information Theory Laboratory
13
Cat neuron
Top Stimuli from the test set Average of top stimuli from test set
Slide: Ng
Statistical Inference and Information Theory Laboratory
14
Replacing Feature Extraction Stage
• Traditional approach Based on Hand-crafted features
• Deep learning automatically discover hierarchical feature representation feature learning
Feature Extraction
Classifier Input Output
SIFT
Deep Network Input Output
Statistical Inference and Information Theory Laboratory
15
Multiple Levels of Feature Representation
• Visible layer (Raw input image, pixels) 1st layer 2nd layer 3rd layer
• Represents more and more abstract features of the raw input, e.g., edges, local shapes, object parts,
objects, etc.
Statistical Inference and Information Theory Laboratory
16
Genealogy of Deep Learning
Restricted Boltzmann
Machine (RBM) Perceptron
Multilayer Perceptron
Deep Neural Network (DNN)
Deep Belief
Network (DBN)
Deep Boltzmann Machine (DBM)
Convolutional RBM (CRBM)
Convolutional DBN (CDBN) Convolutional
Neural Network (CNN, ConvNet)
Stochastic Deterministic Shallow
Deep
Unsupervised Supervised
Rectified Linear Unit (ReLU) Big Data
Regularization 1st breakthrough
2nd breakthrough
Statistical Inference and Information Theory Laboratory
17
Recent Advances: Rectified Linear Unit (ReLU) 2010, 2011
• Deep neural net (DNN) with rectified linear unit (ReLU) activation function sparse activation
• ReLU has slope 1 at positive input ReLU solves gradient vanishing problem.
h3
v
slope: 1
f(x) =
½x x ¸ 0
0 x < 0
Statistical Inference and Information Theory Laboratory
18
Rectified Linear Unit (ReLU)
h3= W
3W
2W
1v = Av
h3= A 1v
h3= A 2v
h3= A 3v
Slide: Ranzato x
y
z
(x; y)! z
in out
Statistical Inference and Information Theory Laboratory
19
Recent Advances: Rectified Linear Unit (ReLU) 2010, 2011
h3=
·. . .
¸
| {z }4£ 2
[. . . ]|{z}2£ 2
[¢¢¢¢¢¢]| {z }
2£ 4
v = A 1v
h3 =
·. . .
¸
| {z }4£ 2
[. . .]|{z}2£ 3
[¢¢¢¢]| {z }3£ 4
v = A2vh3= A 3v
v
h3
h3
v
h3 = A1vh3 = A2v
Statistical Inference and Information Theory Laboratory
20
Speech Recognition (2011)
According to Google researchers, the voice error rate in the new version of Android--after
adding insights from deep learning--stands at 25% lower than previous versions of the
software.
Slide: Bengio
Statistical Inference and Information Theory Laboratory
21
Large Scale Image Classification (2012)
Same model as LeCun’89 but:
Statistical Inference and Information Theory Laboratory
22
Deep Learning in the News
Slide: Bengio
Statistical Inference and Information Theory Laboratory
23
Deep Learning in the News
Statistical Inference and Information Theory Laboratory
24
Deep Learning in the News
Statistical Inference and Information Theory Laboratory
25
DeepFace (CVPR 2014)
• Face recognition pipeline: – Detect align represent classify
• Alignment: employed explicit 3D face modeling
• Representation: a 9-layer deep neural network – More than 120 million parameters
– Locally connected layers without weight sharing
– Trained with 4 million labeled face images
• 97% recognition rate on LFW
Taigman et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification, CVPR’14
Statistical Inference and Information Theory Laboratory
26
V Mnih et al. Nature 518, 529-533 (2015) doi:10.1038/nature14236
Deep Learning @ Google : DeepMind
https://www.youtube.com/watch?v=V1eYniJ0Rnk
Statistical Inference and Information Theory Laboratory
27
Deep Learning @ Google : DeepMind
Statistical Inference and Information Theory Laboratory
28
AlphaGo
Tree search: look ahead next moves
Policy network: select next good moves
Value network: decide who is winning based on the current board patterns
Policy & value networks are convolutional neural networks, which perform pattern
recognition.
Statistical Inference and Information Theory Laboratory
29
Applications: Large Scale Image Classification
• ImageNet – Over 15 million labeled high-resolution
images – Roughly 22,000 categories – Collected from the web – Labeled by human labelers using
Amazon’s Mechanical Turk crowd-sourcing tool.
• ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) – Uses a subset of imageNet
• 1000 categories • 1.2 million training images • 50,000 validation images • 150,000 test images
– Report two error rates: • Top-1 and top-5
Statistical Inference and Information Theory Laboratory
30
Applications: Large Scale Image Classification
• ImageNet Classification with Deep Convolutional Neural Networks (Krizhevsky,
2012)
– Trained a large, deep convolutional neural network : 650,000 neurons, 60 million parameters.
– It classifies the 1.2 million high-resolution images into the 1000 different classes
– Top-5 error rate : 15.3 % (cf. 2nd best: 26.2%)
Classification results (five most probable labels)
Statistical Inference and Information Theory Laboratory
31
GoogLeNet
GoogLeNet (2014)
Slide: Christian Szegedy
Statistical Inference and Information Theory Laboratory
32
GoogLeNet (2014)
Slide: Christian Szegedy
Zeiler-Fergus Architecture (1 tower)
Convolution Pooling Softmax Other
Statistical Inference and Information Theory Laboratory
33
GoogLeNet (2014)
Slide: Christian Szegedy
Statistical Inference and Information Theory Laboratory
34
Classification failure cases
Groundtruth: ????
Slide: Christian Szegedy
Statistical Inference and Information Theory Laboratory
35
Classification failure cases
Groundtruth: coffee mug
Slide: Christian Szegedy
Statistical Inference and Information Theory Laboratory
36
Classification failure cases
Groundtruth: coffee mug GoogLeNet: ● table lamp ● lamp shade ● printer ● projector ● desktop computer
Slide: Christian Szegedy
Statistical Inference and Information Theory Laboratory
37
Large Scale Image Classification (2015)
Statistical Inference and Information Theory Laboratory
38
Microsoft Research Asia (MSRA)
Slide: Kaiming He
Statistical Inference and Information Theory Laboratory
39
Microsoft Research Asia (MSRA)
Slide: Kaiming He
Statistical Inference and Information Theory Laboratory
40
Microsoft Research Asia (MSRA)
Slide: Kaiming He
Statistical Inference and Information Theory Laboratory
41
Microsoft Research Asia (MSRA)
Slide: Kaiming He
Statistical Inference and Information Theory Laboratory
42
Microsoft Research Asia (MSRA)
Slide: Kaiming He
Statistical Inference and Information Theory Laboratory
43
Microsoft Research Asia (MSRA)
Slide: Kaiming He
Statistical Inference and Information Theory Laboratory
44
Microsoft Research Asia (MSRA)
Slide: Kaiming He
Statistical Inference and Information Theory Laboratory
45
Microsoft Research Asia (MSRA)
Slide: Kaiming He
Statistical Inference and Information Theory Laboratory
46
AI Painter : Artistic Style Transfer
https://www.instapainting.com/ai-painter
Statistical Inference and Information Theory Laboratory
47
Language Modeling
Fei-Fei Li & Andrej Karpathy, CS231n: Convolutional Neural Networks for Visual Recognition, Winter 2015. (Stanford University)
p (next wordjprevious words) e.g. sentence completion
Statistical Inference and Information Theory Laboratory
48
Machine Translation
Fei-Fei Li & Andrej Karpathy, CS231n: Convolutional Neural Networks for Visual Recognition, Winter 2015. (Stanford University)
conceptual sumary of ABC<EOS>
p (next wordjprevious words)
Statistical Inference and Information Theory Laboratory
49
Image Sentence Datasets
Statistical Inference and Information Theory Laboratory
50
Image Captioning
Statistical Inference and Information Theory Laboratory
51
Image Captioning
Statistical Inference and Information Theory Laboratory
52
Summary
• Deep learning can discover hierarchical feature representation from data.
• Depth matters for large scale visual recognition – Lots of non-linearity
• Beyond image classification – Artistic style transfer
– Natural language modeling
– Machine translation
– Image captioning