Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with...
Transcript of Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with...
Productionizing H2O Models with Apache
Spark
Jakub Háva,[email protected]://github.com/jakubhavahttps://www.linkedin.com/in/havaj/
AI UkraineKyiv, October 13-14 2018
#ML4SAIS
Who are we?• Kuba
• Senior Software engineer at H2O.ai - Core Sparkling Water • Master’s at Charles University (CZ) • Implemented high-performance cluster monitoring tool for
JVM based languages (JNI, JVMTI, instrumentation) • Michal
• VP of Engineering at H2O.ai • Creator of Sparkling Water • Ph.D at Charles University (CZ), PostDoc at Purdue
!2
Machine Learning (ML) Lifecycle
!4
ModelTrainingAlgorithm
FeatureEngineering
ModelPipelineBuilding
TrainingPredictions
DataEngineering
Basic ML Lifecycle
#ML4SAIS
!5
ModelTrainingAlgorithm
FeatureEngineering
FeaturizationPipelineModel
ModelPipelineBuilding
TrainingPredictions
DeploymentPredictions
DataEngineering
ModelPipelineDeployment
Basic ML Lifecycle
#ML4SAIS
Example Implementations
!6#ML4SAIS
Data Engineering Feature Engineering Training Algorithm Deployment
Pipeline Model
Spark H2O Spark H2O MOJO
Spark H2O Driverless AI Spark H2O Driverless AI MOJO
Model Building Model Deployment
H2O + Spark = Sparkling
Water
#ML4SAIS
H2O + Spark• H2O
• Machine Learning Library • Distributed Algorithms • For ML experts
• Sparkling Water • Integrates H2O & Spark Ecosystems • Transparent for Spark users • Based on Spark pipelines & H2O
!8
Basic ML Lifecycle: Sparkling Water
!9
ModelTrainingAlgorithm
FeatureEngineering
SparkTransformers H2OMOJOModel
TrainingPredictions
DeploymentPredictions
AutoML
Pipeline
#ML4SAIS
Demo: Spark Pipeline
H2O Driverless AI
#ML4SAIS
H2O Driverless AI• What if I’m not expert ?
• H2O Driverless AI • H2O Driverless AI
• No expert knowledge required • Automatic Feature Engineering & ML
!12
Basic ML Lifecycle: Driverless AI
!13
ModelTrainingAlgorithm
FeatureEngineering
DriverlessAIFeatureTransformations DriverlessAIModel
TrainingPredictions
DeploymentPredictions
PipelineDriverlessAIMOJOas
#ML4SAIS
Demo: Driverless AI as Spark Pipeline
!15
Driverless AI Pipeline
!16#ML4SAIS
Governed ML Lifecycle
Governed ML Lifecycle
!18
ModelTrainingAlgorithm
FeatureEngineering
FeaturizationPipelineModel
ModelPipelineBuilding
TrainingPredictions
DeploymentPredictions
ModelManagement
DataEngineering
ModelPipelineDeployment
ModelMonitoring
AutoDocumentation
#ML4SAIS
#ML4SAIS
Materials
!19
https://bit.ly/2sxowxD
#ML4SAIS
Sparkling Water enables deployment of H2O ML models with Spark Pipelines
!20
Thank you!