Holistic approach to machine learning
-
Upload
source-ministry -
Category
Software
-
view
355 -
download
1
Transcript of Holistic approach to machine learning
![Page 1: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/1.jpg)
@SrcMinistry @MariuszGil
Holistic approach to Machine Learning
Data processing
![Page 2: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/2.jpg)
@SrcMinistry
![Page 3: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/3.jpg)
We are developers
![Page 4: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/4.jpg)
We love to…
![Page 5: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/5.jpg)
Write code
![Page 6: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/6.jpg)
Write tests
![Page 7: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/7.jpg)
Use DDD/OOP/AOP/SOLID/GRASP/XYZ
![Page 8: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/8.jpg)
What for?
![Page 9: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/9.jpg)
Write code
![Page 10: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/10.jpg)
Make money
![Page 11: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/11.jpg)
Make users happy
![Page 12: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/12.jpg)
Solve problems
![Page 13: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/13.jpg)
Solve problems by writing code, to make users happy and make money
![Page 14: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/14.jpg)
Solve problems by writing code, to make users happy and make money
Solve problems
![Page 15: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/15.jpg)
Solve problems by writing code, to make users happy and make money
Solve
![Page 16: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/16.jpg)
Solve problems by writing code, to make users happy and make money
problems
![Page 17: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/17.jpg)
Mapping all problems to DDD/OOP/AOP/SOLID/GRASP/XYZ
![Page 18: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/18.jpg)
Test first
![Page 19: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/19.jpg)
Understand the problem first
![Page 20: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/20.jpg)
Domain knowledge
![Page 21: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/21.jpg)
Ask expert
![Page 22: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/22.jpg)
Real problems
![Page 23: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/23.jpg)
Data classification
![Page 24: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/24.jpg)
Bot detection
![Page 25: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/25.jpg)
Minimize risk of error
![Page 26: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/26.jpg)
![Page 27: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/27.jpg)
+ value estimator
![Page 28: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/28.jpg)
+ chance of sell
![Page 29: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/29.jpg)
+ $ optimization
![Page 30: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/30.jpg)
Tens of thousands historical transactions
![Page 31: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/31.jpg)
Tens of data components
![Page 32: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/32.jpg)
Hundreds of data components
![Page 33: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/33.jpg)
![Page 34: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/34.jpg)
IF-Unsolveable
![Page 35: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/35.jpg)
Machine Learning
![Page 36: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/36.jpg)
The theory
![Page 37: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/37.jpg)
A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E
Tom M. Mitchell
![Page 38: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/38.jpg)
Task
![Page 39: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/39.jpg)
Typical ML techniquesClassification Regression Clustering Dimensionality reduction Association learning
![Page 40: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/40.jpg)
oooo
ooo
ooo oo
o oo
oo o o oo
ooo
oo
o
feature 1
feat
ure
2
![Page 41: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/41.jpg)
oooo
ooo
ooo oo
o oo
oo o o oo
ooo
oo
o
feature 1
feat
ure
2
![Page 42: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/42.jpg)
oooo
ooo
ooo oo
o oo
oo o o oo
ooo
oo
o
feature 1
feat
ure
2
![Page 43: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/43.jpg)
Experience
![Page 44: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/44.jpg)
Typical ML paradigmsSupervised learning Unsupervised learning Reinforcement learning
![Page 45: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/45.jpg)
Accuracy
![Page 46: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/46.jpg)
The practice
![Page 47: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/47.jpg)
![Page 48: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/48.jpg)
data + algo = result
![Page 49: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/49.jpg)
+-------+--------+------+--------+---------+-------+ | brand | model | year | milage | service | price | +-------+--------+------+--------+---------+-------+ | ford | mondeo | 2005 | 123000 | 9900 | 67000 | +-------+--------+------+--------+---------+-------+ | ford | mondeo | 2005 | 175000 | 9900 | 30000 | +-------+--------+------+--------+---------+-------+ | ford | focus | 2010 | 45000 | 6700 | 30000 | +-------+--------+------+--------+---------+-------+
…
![Page 50: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/50.jpg)
Learning Data
Algorithm Learning
Classifier ModelReal Data Classification
![Page 51: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/51.jpg)
Failure recipe
![Page 52: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/52.jpg)
+-------+--------+------+--------+---------+-------+ | brand | model | year | milage | service | price | +-------+--------+------+--------+---------+-------+ | ford | mondeo | 2005 | 123000 | 9900 | 67000 | +-------+--------+------+--------+---------+-------+ | ford | mondeo | 2005 | 175000 | 9900 | 30000 | +-------+--------+------+--------+---------+-------+ | ford | focus | 2010 | 45000 | 6700 | 30000 | +-------+--------+------+--------+---------+-------+
…
![Page 53: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/53.jpg)
+-------+--------+------+--------+---------+--------+-------+ | brand | model | year | milage | service | repair | price | +-------+--------+------+--------+---------+--------+-------+ | ford | mondeo | 2005 | 123000 | 9000 | 900 | 67000 | +-------+--------+------+--------+---------+--------+-------+ | ford | mondeo | 2005 | 175000 | 900 | 9000 | 30000 | +-------+--------+------+--------+---------+--------+-------+ | ford | focus | 2010 | 45000 | 3700 | 3000 | 30000 | +-------+--------+------+--------+---------+--------+-------+
…
![Page 54: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/54.jpg)
+-------+--------+------+--------+---------+--------+-------+ | brand | model | year | milage | service | repair | price | +-------+--------+------+--------+---------+--------+-------+ | ford | mondeo | 2005 | 123000 | 9000 | 900 | 67000 | +-------+--------+------+--------+---------+--------+-------+ | ford | mondeo | 2005 | 175000 | 900 | 9000 | 30000 | +-------+--------+------+--------+---------+--------+-------+ | ford | mondeo | 2005 | 175000 | 900 | 9000 | 45000 | +-------+--------+------+--------+---------+--------+-------+ | ford | focus | 2010 | 45000 | 3700 | 3000 | 30000 | +-------+--------+------+--------+---------+--------+-------+
…
![Page 55: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/55.jpg)
+-------+--------+-----+------+--------+---------+--------+-------+ | brand | model | gen | year | milage | service | repair | price | +-------+--------+-----+------+--------+---------+--------+-------+ | ford | mondeo | 4 | 2005 | 123000 | 9000 | 900 | 67000 | +-------+--------+-----+------+--------+---------+--------+-------+ | ford | mondeo | 3 | 2005 | 175000 | 900 | 9000 | 30000 | +-------+--------+-----+------+--------+---------+--------+-------+ | ford | mondeo | 4 | 2005 | 175000 | 900 | 9000 | 45000 | +-------+--------+-----+------+--------+---------+--------+-------+ | ford | focus | 4 | 2010 | 45000 | 3700 | 3000 | 30000 | +-------+--------+-----+------+--------+---------+--------+-------+
…
![Page 56: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/56.jpg)
+-------+--------+-----+------+--------+---------+--------+------+---------------+-------+ | brand | model | gen | year | milage | service | repair | igla | crying German | price | +-------+--------+-----+------+--------+---------+--------+------+---------------+-------+ | ford | mondeo | 4 | 2005 | 123000 | 9000 | 900 | 0 | 0 | 67000 | +-------+--------+-----+------+--------+---------+--------+------+---------------+-------+ | ford | mondeo | 3 | 2005 | 175000 | 900 | 9000 | 1 | 1 | 30000 | +-------+--------+-----+------+--------+---------+--------+------+---------------+-------+ | ford | mondeo | 4 | 2005 | 175000 | 900 | 9000 | 0 | 0 | 45000 | +-------+--------+-----+------+--------+---------+--------+------+---------------+-------+ | ford | focus | 4 | 2010 | 45000 | 3700 | 3000 | 1 | 0 | 30000 | +-------+--------+-----+------+--------+---------+--------+------+---------------+-------+
…
![Page 57: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/57.jpg)
Understand your data first
![Page 58: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/58.jpg)
Exploratory analysis
![Page 59: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/59.jpg)
http://blogs.adobe.com/digitalmarketing/wp-content/uploads/2013/08/aq2.jpg
![Page 60: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/60.jpg)
ML pipeline
![Page 61: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/61.jpg)
Raw Data Collection
Pre-processing
Sampling
Training Dataset
Algorithm Training
Optimization
Post-processing
Final model
Pre-processingFeature Selection
Feature Scaling
Dimensionality Reduction
Performance Metrics
Model Selection
Test Dataset
Cro
ss V
alid
atio
n
Final ModelEvaluation
Pre-processing Classification
Missing Data
Feature Extraction
DataSplit
Data
![Page 62: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/62.jpg)
Raw Data Collection
Pre-processing
Sampling
Training Dataset
Algorithm Training
Optimization
Final model
Pre-processingFeature Selection
Feature Scaling
Dimensionality Reduction
Performance Metrics
Model Selection
Test Dataset
Cro
ss V
alid
atio
n
Final ModelEvaluation
Pre-processing Classification
Missing Data
Feature Extraction
DataSplit
Post-processing
Data
![Page 63: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/63.jpg)
Classification algorithmsLinear Classification Logistic Regression Linear Discriminant Analysis PLS Discriminant Analysis
Non-Linear Classification Mixture Discriminant Analysis Quadratic Discriminant Analysis Regularized Discriminant Analysis Neural Networks Flexible Discriminant Analysis Support Vector Machines k-Nearest Neighbor Naive Bayes
Decission Trees for Classification Classification and Regression Trees C4.5 PART Bagging CART Random Forest Gradient Booster Machines Boosted 5.0
![Page 64: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/64.jpg)
Regression algorithmsLinear Regiression Ordinary Least Squares Regression Stepwise Linear Regression Prinicpal Component Regression Partial Least Squares Regression
Non-Linear Regression / Penalized Regression Ridge Regression Least Absolute Shrinkage ElasticNet Multivariate Adaptive Regression Support Vector Machines k-Nearest Neighbor Neural Network
Decission Trees for Regression Classification and Regression Trees Conditional Decision Tree Rule System Bagging CART Random Forest Gradient Boosted Machine Cubist
![Page 65: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/65.jpg)
Algorithm is only element in the ML chain
![Page 66: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/66.jpg)
Everything may be important for ML
![Page 67: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/67.jpg)
![Page 68: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/68.jpg)
Testing
![Page 69: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/69.jpg)
Test datasets
![Page 70: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/70.jpg)
60% 20% 20%
![Page 71: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/71.jpg)
Andrew NG rule of ML
![Page 72: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/72.jpg)
Does it do well onthe training data?
Does it do well onthe test data?
Better features /Better parameters
More data
Done!
No No
Yes
by Andrew Ng
Yes
![Page 73: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/73.jpg)
Calculate, measure, apply later
![Page 74: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/74.jpg)
The code
![Page 75: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/75.jpg)
import org.apache.spark.mllib.classification.{SVMModel, SVMWithSGD} import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics import org.apache.spark.mllib.util.MLUtils
// Load training data in LIBSVM format. val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt")
// Split data into training (60%) and test (40%). val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L) val training = splits(0).cache() val test = splits(1)
// Run training algorithm to build the model val numIterations = 100 val model = SVMWithSGD.train(training, numIterations)
// Clear the default threshold. model.clearThreshold()
// Compute raw scores on the test set. val scoreAndLabels = test.map { point => val score = model.predict(point.features) (score, point.label) }
// Get evaluation metrics. val metrics = new BinaryClassificationMetrics(scoreAndLabels) val auROC = metrics.areaUnderROC()
println("Area under ROC = " + auROC)
// Save and load model model.save(sc, "myModelPath") val sameModel = SVMModel.load(sc, "myModelPath")
![Page 76: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/76.jpg)
Art of asking right questions related to right data
![Page 77: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/77.jpg)
@SrcMinistry
Thanks!
@MariuszGil
![Page 78: Holistic approach to machine learning](https://reader031.fdocuments.net/reader031/viewer/2022022414/58797c1b1a28ab6c358b4925/html5/thumbnails/78.jpg)