On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big...
Transcript of On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big...
![Page 1: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/1.jpg)
On Deep LearningAn Overview
João Pereira
October 2019PDEng Data Science Talks
![Page 2: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/2.jpg)
Outline – Part 1
• Introduction to deep learning• Types of learning• Historical Perspective• Neural networks• Training Neural Networks• Optimization Challenges• Regularization strategies• Generalization
João Pereira 2
15/10/2019
![Page 3: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/3.jpg)
Outline – Part 2
• Convolutional Neural Networks (CNNs)• Recurrent Neural Networks (RNNs)
• Vanilla RNNs• LSTMs and GRUs
• Sequence to Sequence Models (Seq2Seq)• Attention Mechanisms• Applications
João Pereira 3
22/10/2019
![Page 4: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/4.jpg)
Part 1
João Pereira 4
![Page 5: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/5.jpg)
Big Picture
João Pereira 5
Machine Learning
Artificial Intelligence
Neural Networks
Deep Learning
![Page 6: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/6.jpg)
Learning from data
João Pereira 6
![Page 7: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/7.jpg)
Inputs and Outputs
• Inputs• Words, sentences• Images, videos• Observations, time series• Voice
• Outputs• Translation• Class label
João Pereira 7
![Page 8: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/8.jpg)
What is learning?
• Learning or training refers to estimate the parameters of a model.
João Pereira 8
Training Validation TestDataset
![Page 9: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/9.jpg)
Machine learning is about generalizing to unseen data.
João Pereira 9
![Page 10: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/10.jpg)
What data is out there?
João Pereira 10
Unlabeled data
Labeled data
Cost
Availability
Abundant
Accessible
Scarce
Expensive
![Page 11: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/11.jpg)
Data in the PDEng Data Science
João Pereira 11
Unlabeled data
Labeled data
Cost
Availability
Abundant
Accessible
Scarce
Expensive
![Page 12: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/12.jpg)
Types of learning
• Supervised learning• Unsupervised learning• Reinforcement learning
João Pereira 12
![Page 13: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/13.jpg)
LeCun’s Cake Analogy
João Pereira 13
![Page 14: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/14.jpg)
Supervised Learning
• Given a dataset D of inputs x and labeled targets y, learn to predict y from x.
• Most successful paradigm in deep learning.
João Pereira 14
ModelDogCat
Dog Cat
![Page 15: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/15.jpg)
Unsupervised Learning
• Given only the inputs , models and find
João Pereira 15
Model
PatternsStructureClusters
Generation
![Page 16: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/16.jpg)
Why Did Deep Learning Become So Popular?
Important breakthroughs in AI-class of problems
• Speech recognition• Image classification• Machine translation• Chatbots• Self-driving cars• Playing games
João Pereira 16
![Page 17: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/17.jpg)
Why Now?
Among other reasons:• Fifty years of AI research in machine learning• Explosion of convenient compute power
(e.g., GPUs, TPUs)• Lots of (labeled) data• New software and tools• Strong investments and interest of large organizations
João Pereira 17
![Page 19: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/19.jpg)
João Pereira 19
Yann LeCunNew York University
Facebook AI Research
Turing Award
Yoshua BengioUniversité de Montreal
Geoffrey HintonUniversity of Toronto
Google Brain
Deep Learning Heroes
![Page 20: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/20.jpg)
Why Deep Learning?
João Pereira 20
Input OutputFeature Extraction Classification
DuckNot a Duck
Classifier
Input Output
DuckNot a Duck
Traditional Machine Learning
Deep Learning
Feature Extraction + Classification
Deep Neural Network
![Page 21: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/21.jpg)
Why Deep Learning?
• Hand-crafting features is time consuming and not scalable.• We can learn the underlying features from data, automatically.
João Pereira 21
Zeiler & Fergus, 2013Example by Yann LeCun
Deep Neural Network
![Page 22: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/22.jpg)
Deep learning is about representation learning!Learning good features for downstream tasks (regression, classification, …)Hand-crafting vs learning features
João Pereira 22
![Page 23: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/23.jpg)
Historical Perspective
João Pereira 23
RosenblattPerceptron
1957
Minsky & PapertAI Winter begins CNNs
19891986
Backpropagation LSTMs
1995 1997
SVMs AEs
2006 2012
ImageNetContest
1969
VAEsGANs
2014
![Page 24: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/24.jpg)
PerceptronStructural building block of deep learning
João Pereira 24
![Page 25: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/25.jpg)
The Perceptron
João Pereira 25
Input Weights Sum Non-Linearity Output
OutputPrediction Non-linearity Bias Inputs Weights
Rosenblatt, 1958
![Page 26: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/26.jpg)
The Perceptron
João Pereira 24
Input Weights Sum Non-Linearity Output
In matrix form:
![Page 27: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/27.jpg)
Activation FunctionsSigmoidHyperbolic TangentRectified Linear UnitSoftmax
João Pereira 27
![Page 28: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/28.jpg)
Activation Functions
João Pereira 28
Recap…Activation function
![Page 29: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/29.jpg)
Activation Functions
• The goal of the activation functions is to introduce non-linearities into the network
• The are many of these functions• They are all non-linear
João Pereira 29
![Page 30: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/30.jpg)
Activation Functions
João Pereira 30
Rectified Linear UnitReLU Sigmoid
1
tf.keras.activations.sigmoid(x)tf.keras.activations.relu(x)
Hyperbolic Tangent
1
tf.keras.activations.tanh(x)
![Page 31: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/31.jpg)
Activation Functions: Softmax
João Pereira 31
Neural Network
TargetsProbabilitiesCat
Dog
Output
Softmax
tf.keras.activations.softmax(x)
![Page 32: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/32.jpg)
Building Neural Networks with Perceptrons
João Pereira 32
![Page 33: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/33.jpg)
Single-Layer Neural Network
João Pereira 33
Output LayerInput Layer Hidden Layer
![Page 34: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/34.jpg)
Properties
Neural networks are universal function approximators
Universal Approximation Theorem
A feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function.
João Pereira 34
Cybenko, 1989
![Page 35: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/35.jpg)
Properties
Neural networks are universal function approximators
Universal Approximation Theorem
A feed-forward network with a single hidden layer containing a finite number of neurons can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function.
João Pereira 35
Cybenko, 1989
In a nutshell:They can approximate many functions up to any precision.
![Page 36: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/36.jpg)
Training Neural NetworksLoss FunctionsGradient DescentStochastic Gradient Descent
João Pereira 36
![Page 37: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/37.jpg)
Example: Wine Quality
João Pereira 37
Volatile Acidity
Alcohol
Good
Not Good
![Page 38: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/38.jpg)
Example: Wine Quality
• How to quantify prediction error?
João Pereira 38
Prediction Target
Good (1)
Not Good (0)
![Page 39: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/39.jpg)
Loss
João Pereira 39
Prediction Target
The loss measures the cost due to wrong predictions.
![Page 40: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/40.jpg)
Loss
João Pereira 40
Target
This is the empirical loss.Aka
Objective functionCost functionEmpirical risk
Prediction Dataset
![Page 41: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/41.jpg)
Binary Cross Entropy Loss
• Used in classification problems with models that output a probability between 0 and 1.
João Pereira 41tf.keras.losses.BinaryCrossentropy()
Prediction Target
![Page 42: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/42.jpg)
Mean Squared Error Loss
• Used in regression problems with models that output continuous real numbers.
João Pereira 42
Prediction Target
tf.keras.losses.MSE()
Wine Quality [%]
![Page 43: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/43.jpg)
Unsupervised Deep Learning
João Pereira 43
![Page 44: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/44.jpg)
Autoencoders
• Most used model for representation learning.
João Pereira 44
Encoder Decoder
Unsupervised Learning
Latent space
![Page 45: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/45.jpg)
Autoencoders
João Pereira 45
Reducing the Dimensionality of Data with Neural Networks, Hinton & Salakhutdinov, 2006
![Page 46: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/46.jpg)
An autoencoder with linear activations does principal component analysis!
João Pereira 46
![Page 47: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/47.jpg)
Unsupervised Representation Learning
Tip #1: Always “look” at your data before designing your model!
João Pereira 47
Based on Marc’Aurelio Ranzato´s (Facebook AI Research) tutorial on Unsupervised Deep Learning, NeurIPS’18
Mean & Covariance analysisPCA (check eigenvalue decay)t-sne visualization
![Page 48: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/48.jpg)
Unsupervised Representation Learning
Tip #1: Always “look” at your data before designing your model!
Learned features may be useful for down-stream tasks.(e.g., classification, regression, …)
João Pereira 48
Based on Marc’Aurelio Ranzato´s (Facebook AI Research) tutorial on Unsupervised Deep Learning, NeurIPS’18
Mean & Covariance analysisPCA (check eigenvalue decay)t-sne visualization
Representation learning using unsupervised learning
![Page 49: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/49.jpg)
Unsupervised Representation Learning
Tip #2: PCA and K-Means are very often a strong baseline.
João Pereira 49
Based on Marc’Aurelio Ranzato´s (Facebook AI Research) tutorial on Unsupervised Deep Learning, NeurIPS’18
![Page 50: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/50.jpg)
Coffee Break16:14 + 15min
João Pereira 50
![Page 51: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/51.jpg)
Optimizing the Loss
We want to find the neural network weights that achieve the lowest loss
Or simply
João Pereira 51
Weights
Training Examples
Based on first class of MIT 6.S191 by Alexander Amini
![Page 52: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/52.jpg)
Optimizing the Loss
João Pereira 52
Random initialization
![Page 53: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/53.jpg)
Optimizing the Loss
João Pereira 53
Compute gradient
![Page 54: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/54.jpg)
Optimizing the Loss
João Pereira 54
Compute gradient
![Page 55: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/55.jpg)
Optimizing the Loss
João Pereira 55
Take a small step in the opposite direction of the gradient
![Page 56: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/56.jpg)
“It's like walking in the mountains in a fog and following the direction of steepest descent to reach the village in the valley”
João Pereira 56
Yann LeCun
![Page 57: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/57.jpg)
Gradient DescentThis procedure is called gradient descent.
1. Initialize weights randomly.
2. Repeat until convergence.
2.1. Computer gradient
2.2. Update weights
3. Return weights.
João Pereira 57
![Page 58: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/58.jpg)
Stochastic Gradient Descent
1. Initialize weights randomly.
2. Repeat until convergence.
2.1. Choose 1 datapoint randomly
2.2. Computer gradient
2.3. Update weights
3. Return weights.
João Pereira 58
tf.keras.optimizer.SGD(x)
![Page 59: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/59.jpg)
Stochastic Gradient Descent
1. Initialize weights randomly.
2. Repeat until convergence.
2.1. Choose mini-batch of M samples randomly
2.2. Computer gradient
2.3. Update weights
3. Return weights.
João Pereira 59
tf.keras.optimizer.SGD(x)
![Page 60: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/60.jpg)
Stochastic Gradient Descent
Many variants of SGDAdamAdaGradAdaDeltaRMSProp
Mini-batch training is faster (parallel computation on GPUs).
João Pereira 60
tf.keras.optimizer.Adam()
tf.keras.optimizer.RMSProp()
tf.keras.optimizer.Adagrad()
tf.keras.optimizer.AdaDelta()
![Page 61: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/61.jpg)
Backpropagation
How to compute ?
• Employs the chain rule for deriving the gradient.• There are tools that compute these gradients automatically.• Check Hinton’s course for details.• Very good interactive explanation of backpropagation.
João Pereira 61
https://google-developers.appspot.com/machine-learning/crash-course/backprop-scroll/
![Page 62: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/62.jpg)
In real-life deep learning...
João Pereira 62
Training neural networks is hard!
![Page 63: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/63.jpg)
Learning Rate
João Pereira
How to choose ?
![Page 64: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/64.jpg)
Learning Rate
• Small learning makes learning slow and gets stuck in local minima.
• Large learning rates overshoot, are unstable and diverge.
Tips:• Try many learning rates and choose the best• Adaptive learning rate
• How large gradient is…• How fast learning occurs…
João Pereira 64
![Page 65: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/65.jpg)
Optimization ChallengesOverfitting and UnderfittingBias-Variance Trade-Off
João Pereira 65
![Page 66: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/66.jpg)
Bias-Variance Trade-off
66
Overfitting
Model Complexity
ErrorUnderfitting
Bias2Test Error
VarianceTrain Error
![Page 67: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/67.jpg)
RegularizationDropoutEarly Stopping
João Pereira 67
![Page 68: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/68.jpg)
Why do we need regularization?
• To make our models generalize better to unseen data.• We are going to look at two strategies:
• Dropout• Early stopping
João Pereira 68
![Page 69: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/69.jpg)
Dropout
João Pereira 69
Randomly set activations to zero.Typically around 50%.
![Page 70: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/70.jpg)
Dropout
João Pereira 70
Randomly set activations to zero.Typically around 50%.
tf.keras.layers.Dropout(p)
![Page 71: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/71.jpg)
Early Stopping
• Early stop training before overfitting.
João Pereira 71Training Steps
LossTest
Train
OverfittingUnderfitting
![Page 72: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/72.jpg)
Generalization in Deep Learning
João Pereira 72
![Page 73: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/73.jpg)
Generalization in Deep Learning
João Pereira 73
Understanding Deep Learning Requires Rethinking Generalization, Zhang et al., 2017
Large models overfit very little ???
![Page 74: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/74.jpg)
Generalization in Deep Learning
João Pereira 74
Reconciling modern machine-learning practice and the classical bias–variance trade-off, Belkin et al., 2019
The bias-variance trade-off was just the first episode of the series!
![Page 75: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/75.jpg)
Bias-Variance Trade-off
On Deep Learning – João Pereira 75
Overfitting
Number parameters
Error
Training
Test
![Page 76: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/76.jpg)
Part 2Convolutional Neural NetworksRecurrent Neural Networks
João Pereira 76
![Page 77: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/77.jpg)
Motivation
• In feed-forward neural networks we assume data is independent and identically distributed in space or in time.
• In many applications, data has spatial or temporal dependencies.
João Pereira 77
Spatial
Convolutional Neural Networks
Temporal
Recurrent Neural Networks
![Page 78: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/78.jpg)
Convolutional Neural Networks
João Pereira 78
![Page 79: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/79.jpg)
Images
• An image is a matrix of numbers.
João Pereira 79
2 4 3 2 4 3 ... 33 8 5 3 8 5 71 2 1 1 2 1 52 4 3 2 4 3 23 8 5 3 8 5 11 2 1 1 2 1 3... ... 45 6 8 2 4 6 8 9
Values between 0 and 255
200
200200x200=40k pixels
![Page 80: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/80.jpg)
Images
• An image is a matrix of numbers.
João Pereira 80
2 4 3 2 4 33 8 5 3 8 51 2 1 1 2 12 4 3 2 4 33 8 5 3 8 51 2 1 1 2 1...8 4 2 1 7 9 9 7
Values between 0 and 255
5 4 3 2 4 36 8 5 3 8 59 2 1 1 2 12 4 3 2 4 34 8 5 3 8 57 2 1 1 2 143 8 5 3 8 5 1
2 4 3 2 4 3 ... 33 8 5 3 8 5 71 2 1 1 2 1 52 4 3 2 4 3 23 8 5 3 8 5 11 2 1 1 2 1 3... ... 45 6 8 2 4 6 8 9
200 x 200 x 3 = 120k
RB
G
![Page 81: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/81.jpg)
Tasks in Computer Vision
RegressionOutput is a continuous value.
ClassificationOutput is a class.
Applications:Classification, segmentation, localization, detection.
João Pereira 81
![Page 82: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/82.jpg)
Examples
João Pereira 82Slide from Stanford CS231N, 2017, Lecture 11
![Page 83: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/83.jpg)
Why do we need convolutional neural networks?
João Pereira 83
![Page 84: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/84.jpg)
João Pereira 84
200
20040k
Requires1.6 billion parameters
Why do we need convolutional neural networks?
Fully connected layer
![Page 85: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/85.jpg)
João Pereira 85
200
20040k
Why do we need convolutional neural networks?
Locally connected layer
Requires4 million parameters
10x10
10x10
10x10
![Page 86: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/86.jpg)
Convolutional Layer Idea
João Pereira 86
200
20040k
Use multiple filters to extract different kinds of features.
Spatially share the same parameters across different locations.
Intuition: important features can happen anywhere in the image.
![Page 87: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/87.jpg)
Idea
João Pereira 87
![Page 88: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/88.jpg)
Idea
João Pereira 88
![Page 89: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/89.jpg)
Idea
João Pereira 89
![Page 90: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/90.jpg)
What is a convolution?
1 0 10 0 11 0 1
João Pereira 90
⨀
2 4 3 5 6 83 8 5 7 5 11 2 1 5 6 72 4 6 2 3 43 5 7 2 5 66 8 9 7 9 8
FilterKernel
Feature Detector
Feature MapActivation Map
Convolved Feature
=
Input Image
![Page 91: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/91.jpg)
What is a convolution?
1 0 10 0 11 0 1
João Pereira 91
⨀
2 4 3 5 6 83 8 5 7 5 11 2 1 5 6 72 4 6 2 3 43 5 7 2 5 66 8 9 7 9 8
FilterKernel
Feature Detector
Feature MapActivation Map
Convolved Feature
=12 0 1 0
0 0 0 0
1 1 1 0
1 0 1 1
Input Image2x1 + 4x0 + 3x1 + 5x1 + 8x0 + 3x0 + 1x1 + 2x0 + 1x1 = 12
![Page 92: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/92.jpg)
Convolutional Layer
• Weights of the filters are learned during training• We need to define:
• Number of filters• Filter dimensions• Padding• Stride
João Pereira 92
1 0 10 0 11 0 1
![Page 93: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/93.jpg)
Padding
João Pereira 93
0 0 0 0 00 0 1 2 00 3 4 5 00 6 7 8 00 0 0 0 0
0 12 3⨀
0 3 8 4
9 19 25 10
21 37 43 16
6 7 8 0
Padding with zeros
=
• Used to avoid loosing pixels around the image.
2 4 33 8 51 2 1
![Page 94: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/94.jpg)
Stride
• Is the number of rows and columns traversed per slide.
João Pereira 94
0 0 0 0 00 0 1 2 00 3 4 5 00 6 7 8 00 0 0 0 0
0 12 3⨀ =
2 4 33 8 51 2 1
0 86 8
Vertical: stride 3Horizontal: stride 2
![Page 95: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/95.jpg)
In a nutshell...
Feed-forward neural networks applied to images:• Scale quadratically with the input sizeSolution: Use a convolutional layer• Connect each hidden unit to a small patch of the input.• Share weights across space.A network built with these structure is called a Convolutional Neural Network (CNN)
João Pereira 95
LeCun et al., 1998, Gradient-based learning applied to document recognitionBased on Marc'Aurelio Ranzato’s slides, Facebook AI Research, Image Classification with Deep Learning
![Page 96: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/96.jpg)
Non-Linearity
• Introduce non-linearities after convolutional layers.• The rectified linear unit (ReLU) works very well.
João Pereira 96
tf.keras.activations.relu(x)
Replaces negative values by zero.
![Page 97: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/97.jpg)
Pooling Layer
• Pooling promotes robustness to the specific location of the features.
Example:
Max, Avg, L2-pooling, ...
João Pereira 97
2 4 3 5 6 33 8 5 7 5 11 2 1 5 6 72 4 6 2 3 43 5 2 2 5 66 4 5 7 9 8
8 75 9
Max poolingFilters 3x3
Stride 3
Dimension reduced
![Page 98: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/98.jpg)
Representation Learning with CNNs
João Pereira 98
Low Level Features Medium Level Features High Level Features
Conv. Layer 1 Conv. Layer 2 Conv. Layer 3
![Page 99: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/99.jpg)
Progress in CNN Architectures
João Pereira 99
GoogLeNet, 2014Szegedy et al.
VGG, 2014Zisserman, Simonyan
AlexNet, 2012Krizhevsky, Sutskever, and Hinton
ResNet, 2015He
![Page 100: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/100.jpg)
Importance of Depth
João Pereira 100Link
![Page 101: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/101.jpg)
Deep learning is in production
João Pereira 101
![Page 102: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/102.jpg)
Coffee Break16:16 + 15min
João Pereira 102
![Page 103: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/103.jpg)
Recurrent Neural Networks
João Pereira 103
![Page 104: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/104.jpg)
Why do we need recurrent neural networks?
In many applications, data has a sequential nature:Sensor time seriesNatural speechWords in sentencesDNAVideos
What if data is not i.i.d. in time?
João Pereira 104
Example:My name is João
![Page 105: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/105.jpg)
Recap
• In feed-forward neural networks:
João Pereira 105
![Page 106: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/106.jpg)
Recurrent Neural Network
João Pereira 106
Feed-forward Neural Network Recurrent Neural Network
Bengio et al., 1994
![Page 107: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/107.jpg)
Recurrent Neural Network
João Pereira 107
RNNs capture the temporal dependencies of the data.
• Real-valued hidden state• Feedback connection• Parameters shared
across timestepsis typically a sigmoid or tanh
![Page 108: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/108.jpg)
RNN Unrolled
João Pereira 108
Bengio et al., 1994
![Page 109: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/109.jpg)
Notes
• is a vector representation of the entire sequence• Summarizes sequences into fixed-length vectors• Input sequences can have arbitrary length.• Final hidden state can be used as a feature vector for
classification or regression tasks.
João Pereira 109
![Page 110: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/110.jpg)
Long-Short Term Memory Networks
João Pereira 110
Hochreiter & Schmidhuber, 1997
• Proposed to solve the vanishing gradient problem.
• New cell and three gates.
![Page 111: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/111.jpg)
Sequence to Sequence Models (Seq2Seq)
João Pereira 111
![Page 112: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/112.jpg)
Sequence to Sequence Models
• In some applications, mapping sequences to sequences is useful.
Example: machine translation
• A sequence to sequence model is an encoder-decoder model.
João Pereira 112
My name is João Mijn naam is João
![Page 113: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/113.jpg)
Sequence to Sequence Models
João Pereira 113
Learned representation
Encoder
Decoder
My name is João
Mijn naam is João
![Page 114: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/114.jpg)
Attention
João Pereira 114
![Page 115: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/115.jpg)
Why do we need attention?
Problem: Fixed-size representation z is the bottleneck of seq2seq.Question: Is there a better way to pass the information from theencoder to the decoder?Solution: Attention Mechanism
João Pereira 115
Bahdanau, Cho, and Bengio, Neural Machine Translation by Jointly Learning to Align and TranslateLuong, Pham, and Manning, Effective Approaches to Attention-based Neural Machine Translation
![Page 116: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/116.jpg)
Attention Mechanism
João Pereira 116
My name is João
Context Vector
![Page 117: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/117.jpg)
Sequence to Sequence + Attention
João Pereira 117
My name is João
Context Vector Decoder
Mijn naam is João
Encoder
![Page 118: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/118.jpg)
Application Examples
Google´s Neural Machine Translation System
João Pereira 118
![Page 119: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/119.jpg)
Application Examples
Question Answering
João Pereira 119
![Page 120: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/120.jpg)
What was missing?
João Pereira 120
![Page 121: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/121.jpg)
Probabilistic deep learning models• Deep generative models: variational autoencoders and
generative adversarial networks
João Pereira 121
![Page 122: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/122.jpg)
References – Part 1
Papers/Books• A Few Useful Things to Know About Machine Learning,
Domingos, 2012 (Link)• Deep Learning, Ian Goodfellow et al. (Link)
Tutorials• Tensorflow and Deep Learning Without a PhD (Link)• Unsupervised Deep Learning, NeurIPS18 (Link)
João Pereira 122
![Page 123: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/123.jpg)
References – Part 1
Courses• Neural Networks for Machine Learning, Geoffrey Hinton, (Link)• EE-559 Deep Learning, EPFL (Link)• 6.S191 Deep Learning, MIT (Link)• CS230 Deep Learning, Stanford, Andrew Ng (Link)• Deep Learning, NYU, Yann LeCun (Link)
João Pereira 123
![Page 124: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/124.jpg)
CNN References – Part 2
Courses/Tutorials• Deep Learning for Computer Vision MIT 6.S191 (Link)• Convolutional Neural Networks Lecture, Stanford (Link)• Image Classification with Deep Learning (Link)• Deep Learning And The Future of AI, Yann LeCun (Link)• Deep dive into deep learning (Link)VideoDeep Learning and the Future of Artificial Intelligence (Link) How does the brain learn so much so quickly? (Link)
João Pereira 124
![Page 125: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/125.jpg)
RNN References – Part 2
Courses/Tutorials• LxMLS slides, Chris Dyer, DeepMind (Link)• Deep Sequence Modeling MIT 6.S191 (Link)
VideoDLRLSS, Recurrent Neural Networks, Yoshua Bengio (Link)
João Pereira 125
![Page 127: On Deep Learning - joao-pereira.pt · João Pereira 3 22/10/2019. Part 1 João Pereira 4. Big Picture João Pereira 5 Machine Learning Artificial Intelligence Neural Networks. Deep](https://reader035.fdocuments.net/reader035/viewer/2022071111/5fe6877600a98371f16eab74/html5/thumbnails/127.jpg)
Thank you for your attention!
João Pereira 127