Deep Learning Fundamentals - Cross Entropy
Transcript of Deep Learning Fundamentals - Cross Entropy
![Page 1: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/1.jpg)
Deep Learning FundamentalsApril 13, 2021
http://cross-entropy.net/ml530/Deep_Learning_1.pdf
![Page 2: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/2.jpg)
Agenda for Tonight
• Homework Review
• [DLP] Part I: Fundamentals of Deep Learning1. What is Deep Learning?
2. Before We Begin: the Mathematical Building Blocks of Neural Networks
3. Getting Started with Neural Networks
4. Fundamentals of Machine Learning
![Page 3: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/3.jpg)
https://twitter.com/DeepLearningAI_/status/1310595139933548546?s=20
![Page 4: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/4.jpg)
Deep Learning with Python
The cover of our text book is captioned “Habit of a Persian Lady in 1568”, from Thomas Jeffreys’ book, “A Collection of the Dresses of Different Nations”
![Page 5: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/5.jpg)
[DLP] Chapter 1: What is Deep Learning?
1. Artificial Intelligence, Machine Learning, and Deep Learninga. Artificial Intelligenceb. Machine Learningc. Learning Representations from
Datad. The “Deep” in Deep Learninge. Understanding How Deep
Learning Works, in Three Figuresf. What Deep Learning Has
Achieved So Farg. Don’t Believe the Short-Term
Hypeh. The Promise of AI
2. Before Deep Learning: a Brief History of Machine Learninga. Probabilistic Modelingb. Early Neural Networksc. Kernel Methodsd. Decision Trees, Random Forests,
and Gradient Boosting Machinese. Back to Neural Networksf. What Makes Deep Learning
Differentg. The Modern Machine Learning
Landscape
![Page 6: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/6.jpg)
[DLP] Chapter 1: What is Deep Learning?
3. Why Deep Learning? Why Now?a. Hardware
b. Data
c. Algorithms
d. A New Wave of Investment
e. The Democratization of Deep Learning
f. Will it Last?
• This chapter covers• High-level definitions of
fundamental concepts
• Timeline of the development of machine learning
• Key factors behind deep learning’s rising popularity and future potential
![Page 7: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/7.jpg)
Artificial Intelligence
• Concise definition: the effort to automate intellectual tasks normally performed by humans
• Initial take: expert rules• Fine for chess
• Difficult to develop rules for image classification, speech recognition, or language translation
Artificial Intelligence
![Page 8: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/8.jpg)
Relationships Between AI, ML, and DL
• Expert Rules
• Linear Regression
• Logistic Regression
• Random Forests
• Gradient Boosting
• Multi-Layer Perceptron Network
• Convolutional Neural Networks
• Recurrent Neural Networks
Artificial Intelligence
![Page 9: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/9.jpg)
Expert Rules Example
% Data: fruit(X) :- attributes(Y)
fruit(banana) :- colour(yellow), shape(crescent).
fruit(apple) :- (colour(green); colour(red)), shape(sphere), stem(yes).
fruit(lemon) :- colour(yellow), (shape(sphere);shape('tapered sphere')), acidic(yes).
fruit(lime) :- colour(green), shape(sphere), acidic(yes).
fruit(pear) :- colour(green), shape('tapered sphere').
fruit(plum) :- colour(purple), shape(sphere), stone(yes).
fruit(grape) :- (colour(purple);colour(green)), shape(sphere).
fruit(orange) :- colour(orange), shape(sphere).
fruit(satsuma) :- colour(orange), shape('flat sphere').
fruit(peach) :- colour(peach).
fruit(rhubarb) :- (colour(red); colour(green)), shape(stick).
fruit(cherry) :- colour(red), shape(sphere), stem(yes), stone(yes).
What is the value for colour?
[red, orange, yellow, green, purple, peach]
green
What is the value for shape?
[sphere, crescent, tapered sphere, flat sphere, stick]
stick
The fruit is rhubarb
http://www.paulbrownmagic.com/blog/simple_prolog_expert
Artificial Intelligence
![Page 10: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/10.jpg)
Machine Learning
• Ada Lovelace, 1843: “The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform.”
• Alan Turing, 1950: quoting Ada Lovelace, while pondering whether general-purpose computers could be capable of learning
trained versus programmed
Artificial Intelligence
![Page 11: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/11.jpg)
Learning Representations from Data
• Need three things (for supervised learning)• Input data points: structured data, image files, sound files, text documents
• Examples of expected output
• Way to measure whether the algorithm is doing a good job
• Input representation examples• Image as Red, Green, and Blue picture element (pixel) values
• Image as Hue, Saturation, and Value
https://en.wikipedia.org/wiki/HSL_and_HSV
Artificial Intelligence
![Page 12: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/12.jpg)
Example Data
• The inputs are the coordinates of our points
• The expected outputs are the colors of the points
• A way to measure whether our algorithm is doing a good job could be the percentage of points that are being classified correctly (accuracy)
Artificial Intelligence
![Page 13: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/13.jpg)
New Representation
import numpy as np
translation = np.array([ -2, -2 ])
theta = 0.25 * np.pi
Rotation = np.array([[ np.cos(theta), - np.sin(theta) ], [np.sin(theta), np.cos(theta) ]])
np.dot(Input + translation, Rotation)
Artificial Intelligence
![Page 14: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/14.jpg)
Deep Neural Network for Digit Classification
• Successive layers of increasingly meaningful representations
• Alternative names for deep learning• Layered representations learning
• Hierarchical representations learning
Artificial Intelligence
![Page 15: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/15.jpg)
Deep Representations Learned by aDigit-Classification Model
Artificial Intelligence
![Page 16: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/16.jpg)
How Deep Learning Works: Part 1 of 3
Neural Network Parameterized by its Weights
Artificial Intelligence
![Page 17: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/17.jpg)
How Deep Learning Works: Part 2 of 3
Loss Function Measures Quality of Network’s Output
Artificial Intelligence
![Page 18: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/18.jpg)
How Deep Learning Works: Part 3 of 3
Loss Score Used as Feedback Signal to Adjust Weights
Artificial Intelligence
![Page 19: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/19.jpg)
Deep Learning Achievements
• Near-human-level image classification
• Near-human-level speech recognition
• Near-human-level handwriting transcription
• Improved machine translation
• Improved text-to-speech conversion
• Digital assistants such as Google Now and Amazon Alexa
• Near-human-level autonomous driving
• Improved ad targeting, as used by Google, Baidu, and Bing
• Improved search results on the web
• Ability to answer natural-language questions
• Superhuman Go playing
Artificial Intelligence
![Page 20: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/20.jpg)
Deep Learning Hype
• Although some world-changing applications like autonomous cars are already within reach, many more are likely to remain elusive for a long time, such as believable dialogue systems, human-level machine translation across arbitrary languages, and human-level natural language understanding
• Previous AI “Winters”:1. XOR (eXclusive Or): my perception is the inability of perceptron to solve this
problem cast a shadow on AI (though this was understood at the time)2. By the early 90s, rule-based systems had proven expensive to maintain
difficult to scale, and limited in scope
• We are currently in the intense optimism phase of a new cycle
Artificial Intelligence
![Page 21: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/21.jpg)
Promise of AI
• Most of the research findings of deep learning aren’t yet applied to the full range of problems they can solve across industries• “Your doctor doesn’t use AI, and neither does your accountant” [I thought I
uploaded an image of a document during tax time]
• “Back in 1995, it would have been difficult to believe in the future impact of the internet”
• “In a not-so-distant future, AI will be your assistant; it will answer your questions, help educate your kids, and watch over your health. It will deliver your groceries to your door and drive you from point A to point B.”
• Don’t believe the short-term hype, but do believe in the long-term vision
Artificial Intelligence
![Page 22: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/22.jpg)
History: Probabilistic Modeling
• Naïve Bayes
• Logistic Regression
Machine Learning
![Page 23: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/23.jpg)
Early Neural Networks
• 1950s: Perceptron
• 1980s: Backpropagation
• Late 1980s: Yann LeCun’s work on MNIST
Machine Learning
![Page 24: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/24.jpg)
History: Kernel Method
• Example kernel method for classification
Machine Learning
![Page 25: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/25.jpg)
History: Decision Tree Ensembles
• Parameters that are learned are questions about the data“Is feature2 in the data greater than 3.5?”
Machine Learning
![Page 26: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/26.jpg)
History: Back to Neural Networks
• AlexNet was not the first fast GPU-implementation of a CNN to win an image recognition contest. A CNN on GPU by K. Chellapilla et al. (2006) was 4 times faster than an equivalent implementation on CPU.[6] A deep CNN of Dan Ciresan et al. (2011) at IDSIA was already 60 times faster[7] and achieved superhuman performance in August 2011.[8] Between May 15, 2011 and September 10, 2012, their CNN won no less than four image competitions.[9][10] They also significantly improved on the best performance in the literature for multiple image databases.[11]
• According to the AlexNet paper,[5] Ciresan's earlier net is "somewhat similar." Both were originally written with CUDA to run with GPU support. In fact, both are actually just variants of the CNN designs introduced by Yann LeCun et al. (1989)
https://en.wikipedia.org/wiki/AlexNet
Machine Learning
![Page 27: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/27.jpg)
Reasons for Deep Learning Success
• Incremental layer-by-layer way in which increasingly complex representations are developed
• Fact that these intermediate incremental representations are learned jointly
Machine Learning
![Page 28: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/28.jpg)
Why Deep Learning? Why Now?
• Three technical forces driving advances in machine learning• Hardware
• Datasets and benchmarks
• Algorithmic advances
• Following a scientific revolution, progress generally follows a sigmoid curve: it starts with a period of fast progress, which generally stabilizes as researchers hit hard limitations, and then, further improvements become incremental
Deep Learning
![Page 29: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/29.jpg)
Models to Try
• Deep Learning should be viewed as another tool in the toolbox
• There are many possible machine learning methods to apply
• Suggestion is to try …• Linear models
• Tree-based ensembles; e.g. random forests and gradient boosting
• Deep learning; e.g. feedforward, convolutional, and recurrent networks
• Gradient boosting and deep learning have won a lot of Kaggle competitions
Deep Learning
![Page 30: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/30.jpg)
[DLP] Chapter 2: Before We Begin, the Mathematical Building Blocks of Neural Networks
1. A first look at a neural network
2. Data representations for neural networks
3. The gears of neural networks: tensor operations
4. The engine of neural networks: gradient-based optimization
5. Looking back at our first example
6. Chapter summary
![Page 31: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/31.jpg)
MNIST Sample Digits
• The Modified (Segmented) National Institutes of Standards and Technology (MNIST) data set is part of the history of Deep Learning
• Yann LeCun (Bell Labs at the time) used this data to learn convolution filters in 1989
http://yann.lecun.com/exdb/mnist/
First Look
![Page 32: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/32.jpg)
Loading the MNIST Data
First Look
![Page 33: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/33.jpg)
Network Architecture and Compilation
First Look
![Page 34: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/34.jpg)
Preprocessing the Data
First Look
![Page 35: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/35.jpg)
Network Training and Evaluation
First Look
![Page 36: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/36.jpg)
Scalars (0D Tensors)
Another name for a number
Data Representations
shape = ()
![Page 37: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/37.jpg)
Vectors (1D Tensors)
Another name for a one-dimensional array of numbers
Data Representations
shape = (5,)
![Page 38: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/38.jpg)
Matrices (2D Tensors)
Another name for a two-dimensional array of numbers
Data Representations
shape = (3, 5)
![Page 39: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/39.jpg)
3D Tensors and Higher-Dimensional Tensors
Packing 2D tensors into an array creates a 3D tensor
Data Representations
shape = (3, 3, 5)
![Page 40: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/40.jpg)
Key Attributes of a Tensor
• Number of axes: sometimes called the rank; sometimes called the number of dimensions [the number of indices for specifying a cell]
• Shape: a list consisting of sizes for the axes of the tensor
• Data type• int32: typically used for word indices and class indices
• uint8: typically used for pixel values
• float32: typically used for numeric features
Data Representations
![Page 41: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/41.jpg)
Displaying a Digit
plt.imshow(Image.open(“filename.png”)) works too
Data Representations
![Page 42: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/42.jpg)
Manipulating Tensors in Numpy
Data Representations
![Page 43: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/43.jpg)
The Notion of Data Batches
The first axis is considered to be the batch axis
Data Representations
![Page 44: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/44.jpg)
Real-World Examples of Data Tensors
• Vector data: 2D tensors of shape (samples, features)
• Timeseries data or sequence data: 3D tensors of shape (samples, timesteps, features)
• Images: 4D tensor of shape (samples, height, width, channels) or (samples, channels, height, width)
• Video: 5D tensor of shape (samples, frames, height, width, channels) or (samples, frames, channels, height, width)
Data Representations
![Page 45: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/45.jpg)
Examples of Vector Data
Data Representations
![Page 46: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/46.jpg)
Examples of Timeseries Data or Sequence Data
Mel Frequency Cepstral Coefficient (MFCC) representation of audio clips fits here as well
Data Representations
![Page 47: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/47.jpg)
Image Data
• Tensorflow uses “channels last” format; while the no-longer-maintained Theano used “channels first” format
Data Representations
![Page 48: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/48.jpg)
Video Data
Data Representations
![Page 49: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/49.jpg)
The Gears of Neural Networks: Tensor Operations
Tensor Operations
![Page 50: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/50.jpg)
Element-wise Operations
Tensor Operations
![Page 51: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/51.jpg)
Broadcasting
• Broadcasting is used to add bias values to the product of an input matrix and a weight matrix
(samples, features) x (features, neurons) + (neurons)
= (samples, neurons) + neurons
• Broadcasting consists of two steps
Tensor Operations
![Page 52: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/52.jpg)
Example: Adding a Vector to a Matrix
Tensor Operations
![Page 53: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/53.jpg)
Tensor Dot: Part 1 of 3
Tensor Operations
![Page 54: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/54.jpg)
Tensor Dot: Part 2 of 3
Tensor Operations
![Page 55: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/55.jpg)
Tensor Dot: Part 3 of 3
Tensor Operations
![Page 56: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/56.jpg)
Tensor Reshaping: Part 1 of 2
Tensor Operations
![Page 57: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/57.jpg)
Tensor Reshaping: Part 2 of 2
Tensor Operations
![Page 58: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/58.jpg)
Geometric Interpretation of Tensor OperationsThis example illustrates something similar to what happens during a weight update operation based on momentum
Tensor Operations
![Page 59: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/59.jpg)
A Geometric Interpretation of Deep Learning
Uncrumpling a complicated manifold of data (think about the XOR problem)
Tensor Operations
![Page 60: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/60.jpg)
What’s a Derivative?
• A derivative quantifies the change in a function’s value as an input changes
• Produces the change needed for the path of steepest ascent
Gradient-Based Optimization
![Page 61: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/61.jpg)
Stochastic Gradient Descent: Part 1 of 3
Gradient-Based Optimization
![Page 62: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/62.jpg)
Stochastic Gradient Descent: Part 2 of 3
Gradient-Based Optimization
![Page 63: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/63.jpg)
Stochastic Gradient Descent: Part 3 of 3
Momentum can help us avoid local minima
Gradient-Based Optimization
![Page 64: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/64.jpg)
Chaining Derivatives: the Backpropagation AlgorithmWe take derivatives for the operations used as part of forward propagation (inference) to update weights
Book says:
Alternative version of chain rule:
𝜕𝑙𝑜𝑠𝑠
𝜕𝑤𝑒𝑖𝑔ℎ𝑡−1=
𝜕𝑙𝑜𝑠𝑠
𝜕𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛−1
𝜕𝑎𝑐𝑡𝑖𝑣𝑎𝑡𝑖𝑜𝑛−1𝜕𝑝𝑟𝑜𝑑𝑢𝑐𝑡−1
𝜕𝑝𝑟𝑜𝑑𝑢𝑐𝑡−1𝜕𝑤𝑒𝑖𝑔ℎ𝑡−1
Gradient-Based Optimization
![Page 65: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/65.jpg)
Network Review
Looking Back
![Page 66: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/66.jpg)
Chapter Summary
![Page 67: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/67.jpg)
Activation Functions
![Page 68: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/68.jpg)
[DLP] Chapter 3: Getting Started with Neural Networks1. Anatomy of a Neural Network
2. Introduction to Keras
3. Setting Up a Deep Learning Workstation
4. Classifying Movie Reviews: a Binary Classification Example
5. Classifying Newswires: a MultiClass Classification Example
6. Predicting House Prices: a Regression Example
7. Chapter Summary
![Page 69: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/69.jpg)
Neural Networks
Training involves …
Anatomy of a Neural Network
![Page 70: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/70.jpg)
Relationship between the Network, Layers, Loss Function, and Optimizer
Anatomy of a Neural Network
![Page 71: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/71.jpg)
Layers as the Lego Bricks of Deep Learning ☺
The first layer requires an input_shape parameter [the dimensions of a single observation], while additional layers do not require this parameter
Anatomy of a Neural Network
![Page 72: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/72.jpg)
Networks of Layers
• A deep learning model is a directed, acyclic graph of layers
• Most common instance is a “linear” (simple, sequential) stack of layers
• Other common instances include …• Two-branch networks; e.g. question goes down one branch and text passage
goes down another (or maybe multi-modal input, for example an image and a text description)
• Multi-head networks; e.g. we have one output predict whether a news article discusses “politics” and another output predict whether a news article discusses “health”
• Inception blocks; e.g. we want to use a few different convolution (input filtering) approaches in parallel
Anatomy of a Neural Network
![Page 73: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/73.jpg)
Loss Function and Optimizers
• Loss functions for this class include: cross entropy, mean squared error, mean absolute error, content loss, style loss, total variation loss, Kullback Leibler loss, temporal difference loss, actor loss, and critic loss
• Optimization functions for the class include: Stochastic Gradient Descent (SGD), Root Mean Squared (Gradient) Propagation (RMSProp), and Adaptive Moments (AdaM: RMSProp + Momentum)
Anatomy of a Neural Network
![Page 74: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/74.jpg)
Keras Features
Introduction to Keras
![Page 75: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/75.jpg)
Google Search Interest for Deep Learning Frameworks
Introduction to Keras
![Page 76: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/76.jpg)
Deep Learning Software and Hardware Stack
• Nvidia Graphics Processing Units (GPUs) and Google Tensor Processing Units (TPUs) support efficient deep learning
• Nvidia’s Common Unified Device Architecture (CUDA) Application Programming Interface (API) and the CUDA Deep Neural Network (DNN) library provide an interface to Nvidia GPUs
• Eigen library implements the Basic Linear Algebra Subprograms (BLAS) specification, allowing tensor manipulation on Central Processing Units (CPUs)
Theano and CNTK are no longer maintained
Introduction to Keras
![Page 77: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/77.jpg)
Typical Keras Workflow
Introduction to Keras
![Page 78: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/78.jpg)
Network Definition:Sequential Model versus the Functional APISame model with both methods …
Introduction to Keras
![Page 79: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/79.jpg)
Model Configuration and Training
Introduction to Keras
![Page 80: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/80.jpg)
Two Options for Getting Keras Running
Setting Up a Deep Learning Workstation
![Page 81: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/81.jpg)
Loading the Internet Movie DataBase (IMDB) Sentiment Analysis Datanum_words is the size of the vocabulary
Classifying Movie Reviews
![Page 82: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/82.jpg)
Decoding a Document
Classifying Movie Reviews
![Page 83: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/83.jpg)
Turning Lists of Integers into Tensors
Note: if more than one value in a row is one, we should refer to this as a multi-hot encoding
• one-hot encoding for identifying a class in a dense target vector
• multi-hot encoding to identify the tokens present in a document in a dense input vector
Classifying Movie Reviews
![Page 84: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/84.jpg)
Encoding the Integer Sequences into a Binary Matrix
Classifying Movie Reviews
![Page 85: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/85.jpg)
Architecture Decisions for Simple Feedforward Network• How many layers to use
• How many hidden units to choose for each layer
• Which activation functions to use• Do *not* forget to include activation functions: unexplained suboptimality
will ensue
Classifying Movie Reviews
![Page 86: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/86.jpg)
Common Activation Functions
Rectified Linear Unit (ReLU): max(x,0)
[no saturation issue]
Sigmoid: 1/(1+exp(-x))
[usually used for output layer]
Classifying Movie Reviews
![Page 87: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/87.jpg)
IMDB Network Architecture
• Features flow from bottom to top• An output is called a “head”
• Two hidden layers and an output layer with weights• It’s a deep neural network
• We’ll get to more than one hundred layers soon enough
Classifying Movie Reviews
![Page 88: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/88.jpg)
Model Definition
Classifying Movie Reviews
![Page 89: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/89.jpg)
Parameters and Outputs for a Dense Layer
• Parameters• (Number of Inputs from Previous Layer + 1) * (Number of “Units”)
• + 1 for bias weights: one for each “unit”
• We used to refer to “units” as neurons• The names have been changed to protect the innocent? Our approach was inspired by
neuroscience, but our brains aren’t using RMSProp ☺
• These are the same weight vectors we’ve come to know and love: projecting inputs to a new representation, one feature at a time [the number of “units” is the number of new features for the new representation]
• Output Shape• (Batch Size) x (Number of “Units”)
Classifying Movie Reviews
![Page 90: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/90.jpg)
Why Are Activation Functions Necessary?
Try omitting activation functions from 1) the output layer and 2) hidden layers … so you can recognize this issue later
Classifying Movie Reviews
![Page 91: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/91.jpg)
Compiling the Model
Classifying Movie Reviews
![Page 92: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/92.jpg)
Setting Aside a Validation Set
Classifying Movie Reviews
![Page 93: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/93.jpg)
Training the Model
Classifying Movie Reviews
![Page 94: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/94.jpg)
Plotting the Training and Validation Loss
Classifying Movie Reviews
![Page 95: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/95.jpg)
Where Do We Start Overfitting?
Classifying Movie Reviews
![Page 96: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/96.jpg)
Plotting the Training and Validation Accuracy
Classifying Movie Reviews
![Page 97: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/97.jpg)
Where Do We Start Overfitting?
Classifying Movie Reviews
![Page 98: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/98.jpg)
Retraining the Model from Scratch
Why are we “retraining the model from scratch”?
Classifying Movie Reviews
![Page 99: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/99.jpg)
Generating Predictions on New Data
Classifying Movie Reviews
![Page 100: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/100.jpg)
Ideas for Experiments
Classifying Movie Reviews
![Page 101: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/101.jpg)
Wrapping Up the IMDB Example
Classifying Movie Reviews
![Page 102: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/102.jpg)
Loading the Reuters Dataset
Classifying Newswires
![Page 103: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/103.jpg)
Decoding Newswires Back to Text
Classifying Newswires
![Page 104: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/104.jpg)
Preparing the Document Matrices
Classifying Newswires
![Page 105: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/105.jpg)
Preparing the Target Matrices
Classifying Newswires
![Page 106: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/106.jpg)
Defining the Model
Classifying Newswires
![Page 107: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/107.jpg)
Notes About the Architecture
Classifying Newswires
![Page 108: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/108.jpg)
Validating the Approach
Classifying Newswires
![Page 109: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/109.jpg)
Where Do We Start Overfitting?
Classifying Newswires
![Page 110: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/110.jpg)
Where Do We Start Overfitting?
Classifying Newswires
![Page 111: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/111.jpg)
Retraining a Model from Scratch
Classifying Newswires
Why are we “retraining a model from scratch”?
![Page 112: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/112.jpg)
Comparing to Random[and a Majority Classifier]
Nota bene (note well): 813 of the 2,356 test examples belonged to class 3
The accuracy of a majority classifier is 36.2%
Classifying Newswires
![Page 113: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/113.jpg)
Generating Predictions for New Data
Classifying Newswires
![Page 114: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/114.jpg)
Dense Versus Sparse Labels
Classifying Newswires
![Page 115: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/115.jpg)
Model With an Information Bottleneck
71% accuracy: an 8% absolute drop
Classifying Newswires
![Page 116: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/116.jpg)
Further Experiments
Classifying Newswires
![Page 117: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/117.jpg)
Wrapping Up
Classifying Newswires
![Page 118: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/118.jpg)
Loading the Boston Housing Dataset
1970s home prices in thousands of dollars
Predicting House Prices
![Page 119: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/119.jpg)
Brief Discussion of Bias
• Boston Housing dataset has been used by many popular textbooks
• The data explicitly offers a race-related variable for modeling• Avoid using proxy variables that lead to discrimination based on race, gender,
religion, etc
• Example: don’t ask about gender if all you really want to know is whether the candidate can lift X pounds
Predicting House Prices
http://lib.stat.cmu.edu/datasets/boston
![Page 120: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/120.jpg)
Normalizing the Data
Predicting House Prices
![Page 121: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/121.jpg)
Model Definition
Predicting House Prices
![Page 122: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/122.jpg)
3-Fold Cross-Validation
Predicting House Prices
![Page 123: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/123.jpg)
4-Fold Cross-Validation Implementation
Predicting House Prices
![Page 124: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/124.jpg)
Cross-Validation Loop
Predicting House Prices
![Page 125: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/125.jpg)
Cross-Validation Results
Predicting House Prices
![Page 126: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/126.jpg)
Alternative Implementation [saved history]
Predicting House Prices
![Page 127: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/127.jpg)
Plotting the Average Mean Absolute Error (MAE)
Predicting House Prices
![Page 128: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/128.jpg)
Visualization Suggestions
Predicting House Prices
![Page 129: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/129.jpg)
Smoothing the Curve
The smoothed_points expression should look familiar: RMSProp (0.9 for last squared gradient) and AdaM (0.999 for last gradient)
Predicting House Prices
![Page 130: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/130.jpg)
Plotting the Smoothed MAE
Predicting House Prices
![Page 131: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/131.jpg)
Training the Final Model
Predicting House Prices
![Page 132: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/132.jpg)
Wrapping Up
Predicting House Prices
![Page 133: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/133.jpg)
Chapter Summary
![Page 134: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/134.jpg)
[DLP] Chapter 4: Fundamentals of Machine Learning1. Four Branches of Machine Learning
2. Evaluating Machine Learning Models
3. Data Preprocessing, Feature Engineering, and Feature Learning
4. Overfitting and Underfitting
5. The Universal Workflow of Machine Learning
![Page 135: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/135.jpg)
Supervised Learning Examples
Four Branches of Machine Learning
![Page 136: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/136.jpg)
Unsupervised Learning
• Dimensionality Reduction
• Clustering
Four Branches of Machine Learning
![Page 137: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/137.jpg)
Self-Supervised Learning
• Learning Without Human Annotated Labels
• Autoencoders
• Trying to predict the next word given previous words
• Trying to predict the next frame given previous frames
Four Branches of Machine Learning
![Page 138: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/138.jpg)
Reinforcement Learning
• Google Deep Mind used reinforcement learning to create a model to play Atari games
• AlphaGo was created to play Go
• Occasional rewards
• Examples of possible applications include: self-driving cars, robotics, resource management, and education
Four Branches of Machine Learning
![Page 140: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/140.jpg)
Also from Yann LeCun …
https://t.co/2LSb622114
Four Branches of Machine Learning
![Page 141: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/141.jpg)
Classification and Regression Glossary
Four Branches of Machine Learning
![Page 142: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/142.jpg)
Classification and Regression Glossary
Four Branches of Machine Learning
![Page 143: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/143.jpg)
Simple Hold-out Validation Split
Evaluating Machine Learning Models
![Page 144: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/144.jpg)
Hold-out Validation Implementation[note the concatenation]
Evaluating Machine Learning Models
![Page 145: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/145.jpg)
K-Fold Cross-Validation
Used for smaller data sets• If K is too small, we’ll experience high bias (underfitting)
• If K is too large, we’ll experience high variance (overfitting)
Evaluating Machine Learning Models
![Page 146: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/146.jpg)
K-Fold Cross Validation Implementation
Evaluating Machine Learning Models
![Page 147: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/147.jpg)
Iterated K-Fold Cross-Validation with Shuffling
• history = []
• for i in range(iterationCount):• shuffle(data)
• history.append(crossValidation(data, K = k))
• Requires building iterationCount * K + 1 models
Evaluating Machine Learning Models
![Page 148: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/148.jpg)
Things to Keep in Mind
Evaluating Machine Learning Models
![Page 149: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/149.jpg)
Value Normalization
• Dividing by 255 was an example of min-max normalization:
• value = (value – min(value)) / (max(value) – min(value))
• The max pixel value was 255 and the min pixel value was 0
• Alternatively, you can use center-and-scale normalization:
[-1,1] is fine too
Consider removing outliers
Data Preprocessing, Feature Engineering, and Feature Learning
![Page 150: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/150.jpg)
Missing Values
• “In general, with neural networks, it’s safe to input missing values as 0, with the condition that zero isn’t a meaningful value”
• It’s possible to add indicator variables: 1 if missing; 0 otherwise
• If you expect missing values at test time, be sure to train with missing values:• We train like we deploy and deploy like we train
Data Preprocessing, Feature Engineering, and Feature Learning
![Page 151: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/151.jpg)
Feature Engineering Example
Three different inputs for the “What time is it?” model …
Why no radius on the polar coordinates?
Data Preprocessing, Feature Engineering, and Feature Learning
![Page 152: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/152.jpg)
Feature Engineering
• Does this mean you don’t have to worry about feature engineering as long as you’re using deep neural networks?
• No …
Data Preprocessing, Feature Engineering, and Feature Learning
![Page 153: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/153.jpg)
Original versus Lower Capacity Model
Original Model:16 units for each hidden layer
Lower Capacity Model: 4 units for each hidden layer
Overfitting and Underfitting
![Page 154: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/154.jpg)
Original versus Lower Capacity Model
Smaller network starts overfitting later and it’s performance degrades more slowly
Overfitting and Underfitting
![Page 155: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/155.jpg)
Original versus Higher Capacity Model:Validation Data
Validation Loss Noisierfor Higher Capacity Model(512 versus 16 units for each hidden layer)
Overfitting and Underfitting
![Page 156: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/156.jpg)
Original versus Higher Capacity Model:Training DataMore capacity gives a model the ability to more quickly model the training data, but it also makes it susceptible to overfitting
Overfitting and Underfitting
![Page 157: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/157.jpg)
Regularization [for Smaller Weights]
Overfitting and Underfitting
![Page 158: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/158.jpg)
Example for Effect of Weight Regularization
• Note: the goal of weight regularization is to improvegeneralization performance ☺
• Use “metric” rather than “loss” for comparing generalization performance; e.g. regularized crossentropy can be used for the loss function with crossentropyused for the evaluation metric [this allows us to use the same evaluation function when comparing performance of models on validation data]
Overfitting and Underfitting
![Page 159: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/159.jpg)
Additional Weight Regularizers for Keras
Overfitting and Underfitting
![Page 160: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/160.jpg)
Adding Dropout(dropoutRate)
Overfitting and Underfitting
![Page 161: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/161.jpg)
Adding Dropout to the IMDB Network
Overfitting and Underfitting
![Page 162: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/162.jpg)
Recap of Most Common Ways to Prevent Overfitting
Overfitting and Underfitting
![Page 163: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/163.jpg)
Define the Problem
Universal Workflow of Machine Learning
![Page 164: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/164.jpg)
Hypothesis
Universal Workflow of Machine Learning
![Page 165: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/165.jpg)
Choosing a Measure of Success
• Accuracy
• Precision and Recall
• Area Under the Receiver Operating Characteristic (ROC) Curve (AUC)
• Maximize Recall subject to a constraint on the False Positive Rate?
• Mean Average Precision
Universal Workflow of Machine Learning
![Page 166: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/166.jpg)
Deciding on an Evaluation Protocol
Universal Workflow of Machine Learning
![Page 167: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/167.jpg)
Preparing Your Data
Universal Workflow of Machine Learning
![Page 168: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/168.jpg)
Key Choices for Your First Iteration
Universal Workflow of Machine Learning
![Page 169: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/169.jpg)
Choosing the Last-Layer Activation and Loss Function
Universal Workflow of Machine Learning
![Page 170: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/170.jpg)
How Big Should the Model Be?
Developing a model that overfits …
Universal Workflow of Machine Learning
![Page 171: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/171.jpg)
Regularizing the Model
Universal Workflow of Machine Learning
![Page 172: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/172.jpg)
Tuning the Model
We call these hyperparameters to distinguish them from the parameters of the model; i.e. the weights.
Note: We tune against validation data. Much like the “private leaderboard”, we only get one look at test perf.
Universal Workflow of Machine Learning
![Page 173: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/173.jpg)
Chapter Summary
![Page 174: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/174.jpg)
plot_model() and tensorboard
# To use plot_model() to generate ".png" file:
# $ sudo apt install python-pydot
# $ pip install pydot
# $ pip install graphviz
# To review tensorboard output:
# Start the tensorboard server ...
# $ tensorboard --logdir=logs --bind_all
# Use browser to navigate to tensorboard server ...
# http://host:6006/
# Tensorboard reference: https://github.com/tensorflow/tensorboard
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout
from tensorflow.keras.utils import plot_model
from tensorflow.keras.callbacks import TensorBoard
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(60000, 784).astype("float32") / 255
x_test = x_test.reshape(10000, 784).astype("float32") / 255
model = Sequential()
model.add(Dense(512, activation="relu", input_shape=(784,), name = "hidden0"))
model.add(Dropout(0.2, name = "dropout0"))
model.add(Dense(512, activation="relu", name = "hidden1"))
model.add(Dropout(0.2, name = "dropout1"))
model.add(Dense(10, activation="softmax", name = "output"))
model.summary()
plot_model(model, to_file = "model.png", show_shapes = True)
model.compile(loss = "sparse_categorical_crossentropy",
optimizer = "rmsprop",
metrics =[ "accuracy" ])
history = model.fit(x_train, y_train,
batch_size = 128, epochs = 20,
validation_split = 0.1,
callbacks = [ TensorBoard(log_dir = "logs", histogram_freq = 5) ])
score = model.evaluate(x_test, y_test)
print(f"Test loss: {score[0]}\tTest accuracy: {score[1]}")
Bonus Material
![Page 175: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/175.jpg)
![Page 176: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/176.jpg)
TensorBoard Scalars
![Page 177: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/177.jpg)
TensorBoard Graphs
![Page 178: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/178.jpg)
TensorBoard DistributionsFrom top to bottom the lines represent: [maximum, 93%, 84%, 69%, 50%, 31%, 16%, 7%, minimum]
![Page 179: Deep Learning Fundamentals - Cross Entropy](https://reader034.fdocuments.net/reader034/viewer/2022050306/626f4a41bac1b4784c069685/html5/thumbnails/179.jpg)
TensorBoard Histograms