Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987...

33
Deep Learning: Back To The Future

Transcript of Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987...

Page 1: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

Deep Learning: Back To The Future

Page 2: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

Hinton NIPS 2012 Talk Slide (More Or Less)

What was hot in 1987  Neural networks

What happened in ML since 1987  Computers got faster  Larger data sets became available

What is hot 25 years later  Neural networks

… but they are informed by graphical models!

Page 3: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

Brief History Of Machine Learning

1960s Perceptrons

1969 Minsky & Papert book

1985-1995 Neural Nets and Back Propagation

1995- Support-Vector Machines

2000- Bayesian Models

2013- Deep Networks

Page 4: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

What My Lecture

Looked Like In 1987

Page 5: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.
Page 6: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.
Page 7: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.
Page 8: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.
Page 9: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.
Page 10: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.
Page 11: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.
Page 12: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.
Page 13: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

The Limitations Of Two Layer Networks

Many problems can’t be learned without a layer of intermediate or hidden units.

Problem  Where does training signal come from?Teacher specifies target outputs, not target hidden unit activities.

If you could learn input->hidden andhidden->output connections, you couldlearn new representations!

  But how do hidden units get an error signal?

Page 14: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.
Page 15: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.
Page 16: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.
Page 17: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

Why Stop At One Hidden Layer?

E.g., vision hierarchy for recognizing handprinted text

  Word output layer  Character hidden layer 3  Stroke hidden layer 2  Edge hidden layer 1  Pixel input layer

Page 18: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

Demos

Yann LeCun’s LeNet5  http://yann.lecun.com/exdb/lenet/index.html

Page 19: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

Why Deeply Layered Networks Fail

Credit assignment problem  How is a neuron in layer 2 supposed to know what it should output until all the neurons above it do something sensible?

  How is a neuron in layer 4 supposed to know what it should output until all the neurons below it do something sensible?

Mathematical manifestation  Error gradients get squashed as they are passedback through a deep network

Page 20: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

Solution

Traditional method of training  Random initial weights

Alternative  Do unsupervised learning layer by layer to get weights in a sensible configuration for the statistics of the input.

  Then when net is trained in a supervised fashion, credit assignment will be easier.

Page 21: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

Autoencoder Networks Self-supervised training procedure

Given a set of input vectors (no target outputs)

Map input back to itself via a hidden layer bottleneck

How to achieve bottleneck?

Fewer neurons

Sparsity constraint

Information transmission constraint (e.g., add noise to unit, or shut off randomly, a.k.a. dropout)

Page 22: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

Autoencoder CombinesAn Encoder And A Decoder

Encoder

Decoder

Page 23: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

Stacked Autoencoders

Note that decoders can be stacked to produce a generative model of the domain

copy

...

deep network

Page 24: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

Neural Net Can BeViewed As A Graphical Model

Deterministic neuron

Stochastic neuron

x1 x2 x4x3

y

Page 25: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

Boltzmann Machine(Hinton & Sejnowski, circa 1985)

Undirected graphical model

Each node is a stochastic neuron

Potential function defined on each pair of neurons

Algorithms were developed fordoing inference for special casesof the architecture.

E.g., Restricted Boltzmann Machine

2 layers

Completely interconnected betweenlayers

No connections within layer

Page 26: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

Punch Line

Deep network can be implemented as a multilayer restricted Boltzmann machine

  Sequential layer-to-layer training procedure  Training requires probabilistic inference  Update rule: ‘contrastive divergence’

Different research groups preferdifferent neural substrate, but itdoesn’t really matter if you usedeterministic neural net vs. RBM

Page 27: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

From Ng’s group

Page 28: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

Suskever, Martens, Hinton (2011)Generating Text From A Deep Belief Net

Wikipedia  The meaning of life is the tradition of the ancient human reproduction: it is less favorable to the good boy for when to remove her bigger. In the show’s agreement unanimously resurfaced. The wild pasteured with consistent street forests were incorporated by the 15th century BE. In 1996 the primary rapford undergoes an effort that the reserve conditioning, written into Jewish cities, sleepers to incorporate the .St Eurasia that activates the population. Mar??a Nationale, Kelli, Zedlat-Dukastoe, Florendon, Ptu’s thought is. To adapt in most parts of North America, the dynamic fairy Dan please believes, the free speech are much related to the

NYT  while he was giving attention to the second advantage of school building a 2-for-2 stool killed by the Cultures saddled with a half- suit defending the Bharatiya Fernall ’s office . Ms . Claire Parters will also have a history temple for him to raise jobs until naked Prodiena to paint baseball partners , provided people to ride both of Manhattan in 1978 , but what was largely directed to China in 1946 , focusing on the trademark period is the sailboat yesterday and comments on whom they obtain overheard within the 120th anniversary , where many civil rights defined , officials said early that forms , ” said Bernard J. Marco Jr. of Pennsylvania , was monitoring New York

Page 29: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

2013 News

No need to use unsupervised training or probabilistic models if…

You use clever tricks of the neural net trade, i.e.,

Back propagation with

deep networks

rectified linear units

dropout

weight maxima

Page 30: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

Krizhevsky, Sutskever, & Hinton

ImageNet competition

15M images in 22k categories

For contest, 1.2M images in 1k categories

Classification: can you name object in 5 guesses?

Page 31: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

Page 32: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.
Page 33: Deep Learning: Back To The Future. Hinton NIPS 2012 Talk Slide (More Or Less) What was hot in 1987 Neural networks What happened in ML since 1987 Computers.

2012 Results

2013: Down to 11% error