MLeap: Productionize Data Science Workflows Using Spark

18
MLeap: Release Spark ML Pipelines Mikhail Semeniuk and Hollin Wilkins

Transcript of MLeap: Productionize Data Science Workflows Using Spark

Page 1: MLeap: Productionize Data Science Workflows Using Spark

MLeap: Release Spark ML PipelinesMikhail Semeniuk and Hollin Wilkins

Page 2: MLeap: Productionize Data Science Workflows Using Spark

Opening Demo

http://spark-summit.combust.ml

How much should I rent my house for on AirBnb?

Yes, open your cell phone and go here :)

Page 3: MLeap: Productionize Data Science Workflows Using Spark

Action Reaction

Page 4: MLeap: Productionize Data Science Workflows Using Spark

Hard-Coded Models(SQL, Java, Ruby)

PMML Emerging Solutions(yHat, DataRobot)

Enterprise Solutions(Microsoft, IBM, SAS)

MLeap

Quick to Implement

Open Sourced

Committed to Spark/Hadoop

API Server Infrastructure

Page 5: MLeap: Productionize Data Science Workflows Using Spark

mleap-spark

mleap-runtime

mleap-coreBundle.ML

mleap-serialization

Page 6: MLeap: Productionize Data Science Workflows Using Spark

Regressions

Page 7: MLeap: Productionize Data Science Workflows Using Spark
Page 8: MLeap: Productionize Data Science Workflows Using Spark

VectorAssembler Continuous Feature Vector StandardScaler

StringIndexer

StringIndexer

StringIndexer

OneHotEncoder

OneHotEncoder

VectorAssembler

LinearRegression

Categorical Feature

Categorical FeatureIndex

Categorical Feature

One Hot Vector

Categorical Feature Vector

VectorAssembler

Scaled Continuous Feature Vector

Final Feature Vector

Continuous Feature

Legend

Final Feature Vector Prediction

Regression Pipeline

OneHotEncoder

Page 9: MLeap: Productionize Data Science Workflows Using Spark

LeapFrame LeapFrame LeapFrame

Categorical Feature

StringIndexer OneHotEncoderCategorical

Feature Index

Categorical Feature One Hot Vector

StringIndexer OneHotEncoder

Page 10: MLeap: Productionize Data Science Workflows Using Spark
Page 11: MLeap: Productionize Data Science Workflows Using Spark
Page 12: MLeap: Productionize Data Science Workflows Using Spark

Spark Estimator Spark Model MLeap Model

MLeap Spark

Spark DataFrame Spark LeapFrame Spark LeapFrame

MLeap Spark

Spark DataFrame

MLeap Transformer

MLeap Spark

Page 13: MLeap: Productionize Data Science Workflows Using Spark

BenchmarksMLeap: 0.011ms/transform Spark: 23.4ms/transform

Page 14: MLeap: Productionize Data Science Workflows Using Spark
Page 15: MLeap: Productionize Data Science Workflows Using Spark
Page 16: MLeap: Productionize Data Science Workflows Using Spark

Combust.ML Overview

Combust.ML

Page 17: MLeap: Productionize Data Science Workflows Using Spark

Thank Yous

Page 18: MLeap: Productionize Data Science Workflows Using Spark

THANK YOU.

Hollin Wilkinsemail: [email protected]: https://github.com/hollinwilkinstwitter: https://twitter.com/HollinWilkinslinkedin: https://www.linkedin.com/in/hollinwilkins

Mikhail Semeniukemail: [email protected]: https://github.com/seme0021twitter: https://twitter.com/MikhailSemeniuklinkedin: https://www.linkedin.com/in/semeniuk