Impact of Deep Learning
• Speech Recogni4on
• Computer Vision
• Language Understanding
• Recommender Systems
• Drug Discovery and Medical Image Analysis
[Courtesy of R. Salakhutdinov]
Very Large Scale Use of DBN’s Data: 10 million 200x200 unlabeled images, sampled from YouTube Training: use 1000 machines (16000 cores) for 1 week Learned network: 3 multi-stage layers, 1.15 billion parameters Achieves 15.8% (was 9.5%) accuracy classifying 1 of 20k ImageNet items
[Quoc Le, et al., ICML, 2012]
Real images that most excite the feature:
Image synthesized to most excite the feature:
Restricted Boltzmann Machines
RBM is a Markov Random Field with:
• Stochas4c binary hidden variables • Bipar4te connec4ons.
Pair-‐wise Unary
• Stochas4c binary visible variables
Markov random fields, Boltzmann machines, log-‐linear models.
Image visible variables
hidden variables Graphical Models: Powerful framework for represen4ng dependency structure between random variables.
Feature Detectors
[Courtesy, R. Salakhutdinov]
Model Learning
Image visible units
Hidden units
Given a set of i.i.d. training examples , we want to learn
model parameters .
Maximize log-‐likelihood objec4ve:
Deriva4ve of the log-‐likelihood:
[Courtesy, R. Salakhutdinov]
Image
Low-‐level features: Edges
Input: Pixels
Built from unlabeled inputs.
Deep Boltzmann Machines
(Salakhutdinov & Hinton, Neural Computation 2012)[Courtesy, R. Salakhutdinov]
Image
Higher-‐level features: Combina4on of edges
Low-‐level features: Edges
Input: Pixels
Built from unlabeled inputs.
Deep Boltzmann Machines
Learn simpler representa4ons, then compose more complex ones
(Salakhutdinov 2008, Salakhutdinov & Hinton 2012)[Courtesy, R. Salakhutdinov]
Model Formula4on
h3
h2
h1
v
W3
W2
W1
Input
Same as RBMs
requires approximate inference to train, but it can be done… and scales to millions of examples
[Courtesy, R. Salakhutdinov]
Samples Generated by the Model Model-‐Generated Samples
Data
[Courtesy, R. Salakhutdinov]
Training Data
Handwri4ng Recogni4on
Learning Algorithm Error
Logis4c regression 12.0% K-‐NN 3.09% Neural Net (Pla_ 2005) 1.53% SVM (Decoste et.al. 2002) 1.40% Deep Autoencoder (Bengio et. al. 2007)
1.40%
Deep Belief Net (Hinton et. al. 2006)
1.20%
DBM 0.95%
Learning Algorithm Error
Logis4c regression 22.14% K-‐NN 18.92% Neural Net 14.62% SVM (Larochelle et.al. 2009) 9.70% Deep Autoencoder (Bengio et. al. 2007)
10.05%
Deep Belief Net (Larochelle et. al. 2009)
9.68%
DBM 8.40%
MNIST Dataset Op4cal Character Recogni4on 60,000 examples of 10 digits 42,152 examples of 26 English le_ers
Permuta4on-‐invariant version.
[Courtesy, R. Salakhutdinov]
3-‐D object Recogni4on Learning Algorithm Error Logis4c regression 22.5% K-‐NN (LeCun 2004) 18.92% SVM (Bengio & LeCun 2007) 11.6% Deep Belief Net (Nair & Hinton 2009)
9.0%
DBM 7.2%
Pa_ern Comple4on
NORB Dataset: 24,000 examples
[Courtesy, R. Salakhutdinov]
Learning Shared Representa4ons Across Sensory Modali4es
“Concept”
sunset, pacific ocean, baker beach, seashore,
ocean
[Courtesy, R. Salakhutdinov]
0 0 1 0
0 Dense, real-‐valued image features
Gaussian model Replicated Sojmax
Mul4modal DBM
Word counts
(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014) [Courtesy, R. Salakhutdinov]
Mul4modal DBM
0 0 1 0
0 Dense, real-‐valued image features
Gaussian model Replicated Sojmax
Word counts
(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014) [Courtesy, R. Salakhutdinov]
Gaussian model Replicated Sojmax
0 0 1 0
0
Mul4modal DBM
Word counts
Dense, real-‐valued image features
(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014) [Courtesy, R. Salakhutdinov]
0 0 1 0
0 Dense, real-‐valued image features
Word counts
Gaussian model Replicated Sojmax
Mul4modal DBM
Bo_om-‐up +
Top-‐down
(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014) [Courtesy, R. Salakhutdinov]
0 0 1 0
0 Dense, real-‐valued image features
Word counts
Gaussian model Replicated Sojmax
Mul4modal DBM
Bo_om-‐up +
Top-‐down
(Srivastava & Salakhutdinov, NIPS 2012, JMLR 2014) [Courtesy, R. Salakhutdinov]
Text Generated from Images
canada, nature, sunrise, ontario, fog, mist, bc, morning
insect, bu_erfly, insects, bug, bu_erflies, lepidoptera
graffi4, streetart, stencil, s4cker, urbanart, graff, sanfrancisco
portrait, child, kid, ritra_o, kids, children, boy, cute, boys, italy
dog, cat, pet, ki_en, puppy, ginger, tongue, ki_y, dogs, furry
sea, france, boat, mer, beach, river, bretagne, plage, bri_any
Given
Generated Given
Generated
[Courtesy, R. Salakhutdinov]
Text Generated from Images Given
Generated
water, glass, beer, bo_le, drink, wine, bubbles, splash, drops, drop
portrait, women, army, soldier, mother, postcard, soldiers
obama, barackobama, elec4on, poli4cs, president, hope, change, sanfrancisco, conven4on, rally
Images Selected from Text
water, red, sunset
nature, flower, red, green
blue, green, yellow, colors
chocolate, cake
Given
Retrieved
[Courtesy, R. Salakhutdinov]
Summary • Efficient learning algorithms for Deep Learning Models. Learning
more adap4ve, robust, and structured representa4ons.
• Deep models improve the current state-‐of-‐the art in many applica4on domains: Ø Object recogni4on and detec4on, text and image retrieval, handwri_en
character and speech recogni4on, and others.
HMM decoder
Speech RecogniGon
sunset, pacific ocean, beach, seashore
MulGmodal Data
CapGon GeneraGon
Text & image retrieval / Object recogniGon
Learning a Category Hierarchy
mosque, tower, building, cathedral, dome, castle
Image Tagging
[Courtesy, R. Salakhutdinov]
Top Related