Treasure Map: Choosing the Right Algorithm in Azure ML... · 2017-10-02 · June 17 thand 18 2015...

Post on 03-Jul-2020

2 views 0 download

Transcript of Treasure Map: Choosing the Right Algorithm in Azure ML... · 2017-10-02 · June 17 thand 18 2015...

Online Conference

June 17th and 18th 2015EVENTS.COLLAB365.COMMUNITY

Treasure Map: Choosing the Right Algorithm in Azure ML

Online Conference

June 17th and 18th 2015

Leila Etaati

Microsoft AI MVP,

PhD, Senior Consultant, Trainer and Data Scientist.

International speaker in Microsoft Ignite USA 2017, Microsoft Insight

Summit 2017, PASS Summit 2017, Microsoft NZ Ignite 2016, PASS BA,

PASS24H, SQLRally, SQL Saturday in Oregon, Vienna, Auckland, Melbourne,

Sydney, Brisbane.

http://radacad.com/author/leila

Online Conference

June 17th and 18th 2015

Twitter: @leila_etaati

4

Question: What machine learning algorithms should

I use?

Answer: It depends

Even the most experienced data scientists can’t tell which algorithm will perform best before trying them. These

recommendations are compiled feedback and tips from experts.

4

Twitter: @leila_etaati

Machine Learning Process

Twitter: @leila_etaati

Twitter: @leila_etaati

7

What is Business Problem

Prediction

•Predictive Analytics

Grouping

•Descriptive Analysis

Find Unusual Data Point

•Descriptive-Anomaly Detection

Twitter: @leila_etaati

8

What is Business Problem -Predictive Analytics

Twitter: @leila_etaati

9

What is Business Problem - Descriptive

Analytics

Twitter: @leila_etaati

What is Business Problem - Anomaly Detection

Twitter: @leila_etaati

Twitter: @leila_etaati

12

Second Sign: Nature of Data -Linearity• Lots of machine learning algorithms make use of linearity.

• For example Linear classification algorithms assume that classes can be separated by a straight line (or its higher-

dimensional analog).

• These algorithms assume that data trends follow a straight line. These assumptions aren't bad for some problems, but on

others they bring accuracy down.

Twitter: @leila_etaati

13

Second Sign: Nature of Data –Linearity-Example

Data with a nonlinear trend

Twitter: @leila_etaati

Twitter: @leila_etaati

15

Third Sign: Accuracy and Training Time

Accuracy descriptionGetting the most accurate answer possible isn't always necessary.

Sometimes an approximation is adequate, depending on what you want to use it for.

Advantage of more approximate methods is that they naturally tend to avoid overfitting.

Training time

the number of minutes or hours necessary to train a model varies a great deal between algorithms.

Training time is often closely tied to accuracy.

Twitter: @leila_etaati

Twitter: @leila_etaati

17

Parameters Tuning

Twitter: @leila_etaati

18

Two-Class Classification

Accuracy Training Time Parameters Linearity

Decision Forrest 5

Decision Jungle 6

Boosted Decision Tree 6

Neural Network 9

Logistic Regression 5

SVM 5

Two-Class Bayes Point Machine

3

Two-Class Averaged Perceptron

4

Twitter: @leila_etaati

19

Multi-class classification

Accuracy Training Time Parameters Linearity

Decision Forrest 6

Decision Jungle 6

Neural Network 9

Logistic Regression 5

Twitter: @leila_etaati

20

Regression

Accuracy is much important:▪ decision forest

▪ decision jungle

▪ neural network

Training Time is Much Important

▪ logistic regression

Able to Handel more than 7 Parameters

▪ neural network

Accuracy Training Time Parameters Linearity

Linear Regression 4

Bayesian Linear Regression 2

Boosted Decision TreeRegression

6

Decision Forest 5

Neural Network 9

Logistic Regression 5

SVM 5

Fast Forest Quantile Regression

9

Poisson Regression 5