Treasure Map: Choosing the Right Algorithm in Azure ML... · 2017-10-02 · June 17 thand 18 2015...
Transcript of Treasure Map: Choosing the Right Algorithm in Azure ML... · 2017-10-02 · June 17 thand 18 2015...
Online Conference
June 17th and 18th 2015EVENTS.COLLAB365.COMMUNITY
Treasure Map: Choosing the Right Algorithm in Azure ML
Online Conference
June 17th and 18th 2015
Leila Etaati
Microsoft AI MVP,
PhD, Senior Consultant, Trainer and Data Scientist.
International speaker in Microsoft Ignite USA 2017, Microsoft Insight
Summit 2017, PASS Summit 2017, Microsoft NZ Ignite 2016, PASS BA,
PASS24H, SQLRally, SQL Saturday in Oregon, Vienna, Auckland, Melbourne,
Sydney, Brisbane.
http://radacad.com/author/leila
Online Conference
June 17th and 18th 2015
Twitter: @leila_etaati
4
Question: What machine learning algorithms should
I use?
Answer: It depends
Even the most experienced data scientists can’t tell which algorithm will perform best before trying them. These
recommendations are compiled feedback and tips from experts.
4
Twitter: @leila_etaati
Machine Learning Process
Twitter: @leila_etaati
Twitter: @leila_etaati
7
What is Business Problem
Prediction
•Predictive Analytics
Grouping
•Descriptive Analysis
Find Unusual Data Point
•Descriptive-Anomaly Detection
Twitter: @leila_etaati
8
What is Business Problem -Predictive Analytics
Twitter: @leila_etaati
9
What is Business Problem - Descriptive
Analytics
Twitter: @leila_etaati
What is Business Problem - Anomaly Detection
Twitter: @leila_etaati
Twitter: @leila_etaati
12
Second Sign: Nature of Data -Linearity• Lots of machine learning algorithms make use of linearity.
• For example Linear classification algorithms assume that classes can be separated by a straight line (or its higher-
dimensional analog).
• These algorithms assume that data trends follow a straight line. These assumptions aren't bad for some problems, but on
others they bring accuracy down.
Twitter: @leila_etaati
13
Second Sign: Nature of Data –Linearity-Example
Data with a nonlinear trend
Twitter: @leila_etaati
Twitter: @leila_etaati
15
Third Sign: Accuracy and Training Time
Accuracy descriptionGetting the most accurate answer possible isn't always necessary.
Sometimes an approximation is adequate, depending on what you want to use it for.
Advantage of more approximate methods is that they naturally tend to avoid overfitting.
Training time
the number of minutes or hours necessary to train a model varies a great deal between algorithms.
Training time is often closely tied to accuracy.
Twitter: @leila_etaati
Twitter: @leila_etaati
17
Parameters Tuning
Twitter: @leila_etaati
18
Two-Class Classification
Accuracy Training Time Parameters Linearity
Decision Forrest 5
Decision Jungle 6
Boosted Decision Tree 6
Neural Network 9
Logistic Regression 5
SVM 5
Two-Class Bayes Point Machine
3
Two-Class Averaged Perceptron
4
Twitter: @leila_etaati
19
Multi-class classification
Accuracy Training Time Parameters Linearity
Decision Forrest 6
Decision Jungle 6
Neural Network 9
Logistic Regression 5
Twitter: @leila_etaati
20
Regression
Accuracy is much important:▪ decision forest
▪ decision jungle
▪ neural network
Training Time is Much Important
▪ logistic regression
Able to Handel more than 7 Parameters
▪ neural network
Accuracy Training Time Parameters Linearity
Linear Regression 4
Bayesian Linear Regression 2
Boosted Decision TreeRegression
6
Decision Forest 5
Neural Network 9
Logistic Regression 5
SVM 5
Fast Forest Quantile Regression
9
Poisson Regression 5