Adaptive Memory Programming for the Robust Capacitated International Sourcing Problem
Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning...
Transcript of Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning...
![Page 1: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/1.jpg)
1
![Page 2: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/2.jpg)
2
![Page 3: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/3.jpg)
3
Out of Memory? No Problem. Developing Machine Learning Models on Big Data
Heather Gorr, PhD
MATLAB Product Marketing Manager
![Page 4: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/4.jpg)
4
Big data without big changes
One file One hundred files
![Page 5: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/5.jpg)
5
The big data landscape can seem overwhelming
![Page 6: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/6.jpg)
6
Building machine learning models with big data
Access, Preprocessing,
and Exploration
Model Validation and Scaling Up
Model Development
![Page 7: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/7.jpg)
7
Case study: Predict Air Quality in North America
![Page 8: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/8.jpg)
8
Building machine learning models with big data – step by step
Access, Preprocessing,
and Exploration
Model Validation and Scaling Up
Model Development
![Page 9: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/9.jpg)
9
Historical files are on HDFS and real time data are available through an API
• Temperature• Pressure• Relative Humidity• Dew Point• Wind speed • Wind direction• Ozone• CO• NO2• SO2
![Page 10: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/10.jpg)
10
You have 1TB of data you’ve never seen before. Where do you start?
![Page 11: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/11.jpg)
11
Use a Spark-enabled Hadoop cluster and MATLAB. Both are well known for machine learning.
HDFS
YARN
Spark
MATLAB
![Page 12: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/12.jpg)
12
Access and preview the data with datastore
![Page 13: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/13.jpg)
13
Databases
Images
MDF Files
Custom
Simulink
There are numerous datastores to access data in many forms
![Page 15: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/15.jpg)
15
![Page 17: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/17.jpg)
17
Preview the data and adjust properties to best represent the data of interest
![Page 18: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/18.jpg)
18
Use tall arrays to work with the data like any MATLAB array
![Page 19: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/19.jpg)
19
Create a tall array for each datastore
ozone
![Page 20: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/20.jpg)
20
Use familiar MATLAB functions on tall arrays
![Page 21: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/21.jpg)
21
Clean messy data using common preprocessing functions
![Page 22: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/22.jpg)
22
Execution model makes operations more efficient on big data
▪ Deferred evaluation– Commands are not executed right away
– Operations are added to a queue
▪ Execution triggers include:– gather function
– summary function
– Machine learning models
– Plotting
![Page 23: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/23.jpg)
23
Execution model makes operations more efficient on big data
Unnecessary results are not computed
![Page 24: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/24.jpg)
24
Explore the data with tall visualizations
plot
scatter
binscatter
histogram
histogram2
ksdensity
![Page 25: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/25.jpg)
25
Get a summary of the data
![Page 26: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/26.jpg)
26
Gather a subset of the data
datasample: from 1980 - 2017
head: first 10000tail: last 10000
![Page 27: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/27.jpg)
27
Explore the subset of data in MATLAB as you always do
![Page 28: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/28.jpg)
28
Use the results of explorations to help make decisions
![Page 29: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/29.jpg)
29
Use the results of explorations to help make decisions
![Page 30: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/30.jpg)
30
Synchronize all data to daily times
![Page 32: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/32.jpg)
32
You don’t need to leave MATLAB to monitor large jobs
![Page 33: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/33.jpg)
33
Building machine learning models with big data
Access, Preprocessing,
and Exploration
Model Validation and Scaling Up
Model Development
![Page 34: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/34.jpg)
34
How do you know which model to use?
Try them all ☺
![Page 35: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/35.jpg)
35
Predict air quality
Air Quality Index Air Quality Label
Regression Classification
![Page 36: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/36.jpg)
36
Use apps for easy model exploration
![Page 37: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/37.jpg)
37
Validate and compare models
![Page 38: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/38.jpg)
38
Select the most important features
![Page 39: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/39.jpg)
39
Building machine learning models with big data
Access, Preprocessing,
and Exploration
Model Validation and Scaling Up
Model Development
![Page 40: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/40.jpg)
40
Scale up with tall machine learning models
▪ Linear Regression (fitlm)
▪ Logistic & Generalized Linear Regression (fitglm)
▪ Discriminant Analysis Classification (fitcdiscr)
▪ K-means Clustering (kmeans)
▪ Principal Component Analysis (pca)
▪ Partition for Cross Validation (cvpartition)
▪ Linear Support Vector Machine (SVM) Classification (fitclinear)
▪ Naïve Bayes Classification (fitcnb)
▪ Random Forest Ensemble Classification (TreeBagger)
▪ Lasso Linear Regression (lasso)
▪ Linear Support Vector Machine (SVM) Regression (fitrlinear)
▪ Single Classification Decision Tree (fitctree)
▪ Linear SVM Classification with Random Kernel Expansion (fitckernel)
![Page 41: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/41.jpg)
41
Big data machine learning models also include goodness of fit measures and convenient functions to explore and validate model
![Page 42: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/42.jpg)
42
Scale up. But not all at once
Use tall arrays in code
Apply model to subset of data
Apply model to all data
Apply model to new data
Deploy/Compile
![Page 43: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/43.jpg)
43
Big data without big changes
One file One hundred files
![Page 44: Out of Memory? No Problem. - MathWorks · Out of Memory? No Problem. Developing Machine Learning Models on Big Data Heather Gorr, PhD MATLAB Product Marketing Manager. 4 Big data](https://reader030.fdocuments.net/reader030/viewer/2022041021/5ed2285fcd6b3728e75f24b9/html5/thumbnails/44.jpg)
44