Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization...
Transcript of Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization...
![Page 1: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/1.jpg)
Optimization Algorithms
Mini-batchgradient descentdeeplearning.ai
![Page 2: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/2.jpg)
Andrew Ng
Batch vs. mini-batch gradient descentVectorization allows you to efficiently compute on m examples.
![Page 3: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/3.jpg)
Andrew Ng
Mini-batch gradient descent
![Page 4: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/4.jpg)
Optimization Algorithms
Understanding mini-batch
gradient descentdeeplearning.ai
![Page 5: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/5.jpg)
Andrew Ng
Training with mini batch gradient descent
# iterations
cost
Batch gradient descent
mini batch # (t)
cost
Mini-batch gradient descent
![Page 6: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/6.jpg)
Andrew Ng
Choosing your mini-batch size
![Page 7: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/7.jpg)
Andrew Ng
Choosing your mini-batch size
![Page 8: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/8.jpg)
![Page 9: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/9.jpg)
Andrew Ng
![Page 10: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/10.jpg)
Andrew Ng
![Page 11: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/11.jpg)
Optimization Algorithms
Understanding exponentially
weighted averagesdeeplearning.ai
![Page 12: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/12.jpg)
Andrew Ng
Exponentially weighted averages
days
temperature
!" = $!"%& + (1 − $),"
![Page 13: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/13.jpg)
Andrew Ng
Exponentially weighted averages
!"## = 0.9!(( + 0.1+"##!(( = 0.9!(, + 0.1+((!(, = 0.9!(- + 0.1+(,
…
!/ = 0!/1" + (1 − 0)+/
![Page 14: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/14.jpg)
Andrew Ng
Implementing exponentially weighted averages!" = 0!% = &!" + (1 − &)-%
…
!/ = &!% + (1 − &)-/!0 = &!/ + (1 − &)-0
![Page 15: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/15.jpg)
Optimization Algorithms
Bias correctionin exponentially
weighted averagedeeplearning.ai
![Page 16: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/16.jpg)
Andrew Ng
Bias correction
days
temperature
!" = $!"%& + (1 − $),"
![Page 17: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/17.jpg)
Optimization Algorithms
Gradient descent with momentumdeeplearning.ai
![Page 18: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/18.jpg)
Andrew Ng
Gradient descent example
![Page 19: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/19.jpg)
Andrew Ng
Implementation details
!"# = %!"# + 1 − % )*!"+ = %!"+ + 1 − % ),* = * − -!"#,
Hyperparameters: -, %
Oniteration8:Compute )*, ),on the current mini-batch
, = , − -!"+
% = 0.9
![Page 20: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/20.jpg)
Optimization Algorithms
RMSpropdeeplearning.ai
![Page 21: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/21.jpg)
Andrew Ng
RMSprop
![Page 22: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/22.jpg)
Optimization Algorithms
Adam optimizationalgorithmdeeplearning.ai
![Page 23: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/23.jpg)
Andrew Ng
Adam optimization algorithm
yhat = np.array([.9, 0.2, 0.1, .4, .9])
![Page 24: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/24.jpg)
Andrew Ng
Hyperparameters choice:
Adam Coates
![Page 25: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/25.jpg)
Optimization Algorithms
Learning rate decaydeeplearning.ai
![Page 26: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/26.jpg)
Andrew Ng
Learning rate decay
![Page 27: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/27.jpg)
Andrew Ng
Learning rate decay
![Page 28: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/28.jpg)
Andrew Ng
Other learning rate decay methods
![Page 29: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/29.jpg)
Optimization Algorithms
The problem oflocal optimadeeplearning.ai
![Page 30: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/30.jpg)
Andrew Ng
Local optima in neural networks
![Page 31: Mini-batch deeplearning.ai gradient descent · Batch vs. mini-batch gradient descent Vectorization allows you to efficiently compute on mexamples. Andrew Ng Mini-batch gradient descent.](https://reader030.fdocuments.net/reader030/viewer/2022041012/5ebf20f18c8ed325544026dd/html5/thumbnails/31.jpg)
Andrew Ng
Problem of plateaus
• Unlikely to get stuck in a bad local optima• Plateaus can make learning slow