Download - Machine Learning with MLlib and scikit-learn · Machine Learning with MLlib and scikit-learn Christopher Homa . Goal Compare performance of sk-learn and MLlib machine learning libraries

Transcript
Page 1: Machine Learning with MLlib and scikit-learn · Machine Learning with MLlib and scikit-learn Christopher Homa . Goal Compare performance of sk-learn and MLlib machine learning libraries

Machine Learning with MLlib and scikit-learn

Christopher Homa

Page 2: Machine Learning with MLlib and scikit-learn · Machine Learning with MLlib and scikit-learn Christopher Homa . Goal Compare performance of sk-learn and MLlib machine learning libraries
Page 3: Machine Learning with MLlib and scikit-learn · Machine Learning with MLlib and scikit-learn Christopher Homa . Goal Compare performance of sk-learn and MLlib machine learning libraries
Page 4: Machine Learning with MLlib and scikit-learn · Machine Learning with MLlib and scikit-learn Christopher Homa . Goal Compare performance of sk-learn and MLlib machine learning libraries

Goal Compare performance of sk-learn and MLlib machine

learning libraries on datasets of varying size

Analyze results

Record performance

Train classifiers

Generate datasets

Page 5: Machine Learning with MLlib and scikit-learn · Machine Learning with MLlib and scikit-learn Christopher Homa . Goal Compare performance of sk-learn and MLlib machine learning libraries

Goal Compare performance of sk-learn and MLlib machine

learning libraries on datasets of varying size

Analyze results

Record performance

Train classifiers

Generate datasets

Page 6: Machine Learning with MLlib and scikit-learn · Machine Learning with MLlib and scikit-learn Christopher Homa . Goal Compare performance of sk-learn and MLlib machine learning libraries

Type • Binary Classification • Multiclass Regression • Regression

Size

•  Instances • Features

Generate datasets

Page 7: Machine Learning with MLlib and scikit-learn · Machine Learning with MLlib and scikit-learn Christopher Homa . Goal Compare performance of sk-learn and MLlib machine learning libraries

Type • Binary Classification • Multiclass Regression • Regression

Size

•  Instances • Features

[1000,10] [2000000,100]

Generate datasets

Page 8: Machine Learning with MLlib and scikit-learn · Machine Learning with MLlib and scikit-learn Christopher Homa . Goal Compare performance of sk-learn and MLlib machine learning libraries

Goal Compare performance of sk-learn and MLlib machine

learning libraries on datasets of varying size

Analyze results

Record performance

Train classifiers

Generate datasets

Page 9: Machine Learning with MLlib and scikit-learn · Machine Learning with MLlib and scikit-learn Christopher Homa . Goal Compare performance of sk-learn and MLlib machine learning libraries

Choose classifiers

•  Stochastic Gradient Descent •  Gradient Boosted Decision

Trees •  Random Forests

Match parameters

•  Iterations •  Depth •  •  Most defaults match

Train classifiers

Iteratively train classifiers on all datasets and record training times

Page 10: Machine Learning with MLlib and scikit-learn · Machine Learning with MLlib and scikit-learn Christopher Homa . Goal Compare performance of sk-learn and MLlib machine learning libraries

Goal Compare performance of sk-learn and MLlib machine

learning libraries on datasets of varying size

Analyze results

Record performance

Train classifiers

Generate datasets

Page 11: Machine Learning with MLlib and scikit-learn · Machine Learning with MLlib and scikit-learn Christopher Homa . Goal Compare performance of sk-learn and MLlib machine learning libraries

Analyze results

Page 12: Machine Learning with MLlib and scikit-learn · Machine Learning with MLlib and scikit-learn Christopher Homa . Goal Compare performance of sk-learn and MLlib machine learning libraries

Analyze results

Future Considerations

•  Fewer, (much) larger datasets •  Utilize EC2 instances to run sk-

learn scripts •  Improve data storage