Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August...
Transcript of Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August...
![Page 1: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/1.jpg)
Deep Learning for AIYoshua Bengio
August 28th, 2017 @ DS3
Data Science Summer School
![Page 2: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/2.jpg)
A new revolution
seems to be in the
work after the
industrial revolution.
And Machine
Learning,
especially Deep
Learning, is at
the epicenter of
this revolution.
![Page 3: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/3.jpg)
Deep Learning Breakthroughs
Computers have made hugestrides in perception, manipulating language, playinggames, reasoning, ...
3
![Page 4: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/4.jpg)
Intelligence Needs Knowledge
• Learning:
powerful way to transfer knowledge to intelligent agents
4
• Failure of classical AI: a lot of knowledge is intuitive
• Solution: get knowledge from data & experience
![Page 5: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/5.jpg)
Machine Learning, AI & No Free Lunch• Five key ingredients for ML towards AI
1. Lots & lots of data
2. Very flexible models
3. Enough computing power
4. Computationally efficient inference
5. Powerful priors that can defeat the curse of dimensionality
5
![Page 6: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/6.jpg)
Bypassing the curse of dimensionalityWe need to build compositionality into our ML models
Just as human languages exploit compositionality to give representations and meanings to complex ideas
Exploiting compositionality can give an exponential gain in representational power
Distributed representations / embeddings: feature learning
Deep architecture: multiple levels of feature learning
Prior assumption: compositionality is useful to describe the world around us efficiently6
![Page 7: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/7.jpg)
Distributed Representations: The Power of Compositionality – Part 1
• Distributed (possibly sparse) representations, learned from data, can capture the meaning of the data and state
• Parallel composition of features: can be exponentially advantageous
7
DistributedNot Distributed
![Page 8: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/8.jpg)
Deep Representations: The Power of Compositionality –Part 2
• Learned function seen as a composition of simpler operations, e.g. inspired by neural computation
• Hierarchy of features, concepts, leading to more abstract factors enabling better generalization
• Again, theory shows this can be exponentially advantageous
8
Why multiple layers? The world is compositional
![Page 9: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/9.jpg)
Anything New with Deep Learning since the Neural Nets of the 90s?
• Rectified linear units instead of sigmoids, enable training much deeper networks by backprop (Glorot & Bengio AISTATS 2011)
• Some forms of noise (like dropout) are powerful regularizersyielding superior generalization abilities
• Success of deep convnets trained on large labeled image datasets
• Success of recurrent nets with more memory, with gating units
• Attention mechanisms liberate neural nets from fixed-size inputs9
![Page 10: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/10.jpg)
What’s New with Deep Learning?
• Progress in unsupervised generative neural nets allows them to synthesize a diversity images, sounds and text imitating unlabeled images, sounds or text
10
GeneratorNetwork
DiscriminatorNetwork
FakeImage
RealImage
TrainingSet
RandomVector
RandomIndex
GANs (NIPS’2014)
![Page 11: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/11.jpg)
What’s New with Deep Learning?
• Incorporating the idea of attention, using GATING units, has unlocked a breakthrough in machine translation:
Neural Machine Translation
• Now in Google Translate:
11
Lower-level
Higher-level
Softmax over lowerlocations conditionedon context at lower andhigher locations
Humanevaluation
humantranslation
n-gramtranslation
currentneural nettranslation
(ICLR’2015)
![Page 12: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/12.jpg)
What’s New with Deep Learning?
• Attention has also opened the door to neural nets which can write to and read from a memory
• 2 systems: • Cortex-like (state controller and representations)
• System 1, intuition, fast heuristic answer
• Hippocampus-like (memory) + prefrontal cortex
• System 2, slow, logical, sequential
12
write
read
• Memory-augmented networks gave rise to
• Systems which reason• Sequentially combining several selected pieces of
information (from the memory) in order to obtain a conclusion
• Systems which answer questions• Accessing relevant facts and combining them
![Page 13: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/13.jpg)
We are starting to better understand whydeep learning is working
• Generalization:
• Distributed representations: (up to) exponential statisticaladvantage, if the world is compositional
• Depth, multiple layers: similar story, on top
• Optimization: MYTHS BUSTED
• Non-convexity & local min of the objective fn: not a curse
• Stochastic gradient descent is very efficient
• Additional human-inspired tricks: curriculum learning (ICML’2009)
13
NIPS’2014
NIPS’2014
ICLR’2014
![Page 14: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/14.jpg)
Still Far from Human-Level AI
• Industrial successes mostly based on supervisedlearning
• Learning superficial clues, not generalizing well outsideof training contexts, easy to fool trained networks: • Current models cheat by picking on surface regularities
• Still unable to discover higher-level abstractions atmultiple time scales, very long-term dependencies
• Still relying heavily on smooth differentiable predictors(using backprop, the workhose of deep learning)
![Page 15: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/15.jpg)
Humans outperform machines at unsupervisedlearning
• Humans are very good at unsupervised learning, e.g. a 2 year old knows intuitive physics
• Babies construct an approximate but sufficiently reliable model of physics, how do they manage that? Note that they interact with the world, not just observe it.
![Page 16: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/16.jpg)
Latent Variables and Abstract Representations
• Encoder/decoder view: mapsbetween low & high-levels
• Encoder does inference: interpretthe data at the abstract level
• Decoder can generate new configurations
• Encoder flattens and disentanglesthe data manifold
16
encoder decoder P(x|h)
P(h)
data space
Q(h|x) Abstract representationspace
![Page 17: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/17.jpg)
x and y representdifferent modalities, e.g., image, text, sound…
Can provide 0-shot generalization to new categories (values of y)
17
Maps BetweenRepresentations
(Larochelle et al AAAI 2008)
![Page 18: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/18.jpg)
18
![Page 19: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/19.jpg)
Convolutional GANs
Strided convolutions, batch normalization, onlyconvolutional layers, ReLU and leaky ReLU
19
(Radford et al, arXiv 1511.06343)
![Page 20: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/20.jpg)
GAN: Interpolating in Latent SpaceIf the model is good (unfolds the manifold), interpolatingbetween latent values yields plausible images.
20
Under review as aconference paper at ICLR 2016
Figure 7: Vector arithmetic for visual concepts. For each column, the Z vectors of samples areaveraged. Arithmetic was then performed on the mean vectors creating a new vector Y . The centersample on the right hand side is produce by feeding Y as input to the generator. To demonstratethe interpolation capabilities of the generator, uniform noise sampled with scale +-0.25 was addedto Y to produce the 8 other samples. Applying arithmetic in the input space (bottom two examples)results in noisy overlap due to misalignment.
9
![Page 21: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/21.jpg)
Combining Iterative Sampling from Denoising Auto-Encoders with GAN
Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space
Anh Nguyen, Jason Yosinski, Yoshua Bengio, Alexey Dosovitskiy, Jeff Clune
(submitted to CVPR 2017) arXiv:1612.00005
21
227 x 227 ImageNet GENERATED IMAGES of category Volcano
(cheatinga bit by using lots of labeled data during training)
![Page 22: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/22.jpg)
Plug & Play Generative Networks
22
bird
ant
lemon
volcano
High-ResolutionSamples227 x 227
![Page 23: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/23.jpg)
What’s Missing
• More autonomous learning, betterunsupervised learning
• Discovering the underlying causal factors
• Model-based RL which extends to completely new situations by unrolling powerful predictive modelswhich can help reason about rarely observed dangerousstates
• Sufficient computational power for models large enoughto capture human-level knowledge
![Page 24: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/24.jpg)
What’s Missing
• Autonomously discovering multiple time scales to handlevery long-term dependencies
• Actually understanding language (also solves generating), requiring enough world knowledge / commonsense
• Neural nets which really understand the notions of object, agent, action, etc.
• Large-scale knowledge representation allowing one-shotlearning as well as discovering new abstractions and explanations by ‘compiling’ previous observations
![Page 25: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/25.jpg)
Acting to Guide Representation Learning
• What is a good latent representation?
• Disentangling the underlying factors of representationso that computers make sense of the world
• Some factors (e.g. objects) correspond to ‘independently controllable’ aspects of the world
• Can only be discovered by acting in the world
![Page 26: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/26.jpg)
The Future of Deep AI
• Scientific progress is slow and continuous, but social and economic impact can be disruptive
• Many fundamental research questions are in front of us, withmuch uncertainty about when we will crack them, but we will
• Importance of continued investment in basic & exploratory AI research, for both practical (recruitment) short-term and long-term reasons
• Let us continue to keep the field open and fluid, be mindful of social impacts, and make sure AI will bloom for the benefit of all
26
![Page 27: Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August 28th, 2017 @ DS3 Data Science Summer School. A new revolution ... unlocked a breakthrough](https://reader034.fdocuments.net/reader034/viewer/2022042409/5f26b5f7feb6291bc322aebd/html5/thumbnails/27.jpg)
Montreal Institute for Learning Algorithms