Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with...

20
Productionizing H2O Models with Apache Spark Jakub Háva, [email protected] https://github.com/jakubhava https://www.linkedin.com/in/havaj/ AI Ukraine Kyiv, October 13-14 2018

Transcript of Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with...

Page 1: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

Productionizing H2O Models with Apache

Spark

Jakub Háva,[email protected]://github.com/jakubhavahttps://www.linkedin.com/in/havaj/

AI UkraineKyiv, October 13-14 2018

Page 2: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

#ML4SAIS

Who are we?• Kuba

• Senior Software engineer at H2O.ai - Core Sparkling Water • Master’s at Charles University (CZ) • Implemented high-performance cluster monitoring tool for

JVM based languages (JNI, JVMTI, instrumentation) • Michal

• VP of Engineering at H2O.ai • Creator of Sparkling Water • Ph.D at Charles University (CZ), PostDoc at Purdue

!2

Page 3: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

Machine Learning (ML) Lifecycle

Page 4: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

!4

ModelTrainingAlgorithm

FeatureEngineering

ModelPipelineBuilding

TrainingPredictions

DataEngineering

Basic ML Lifecycle

#ML4SAIS

Page 5: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

!5

ModelTrainingAlgorithm

FeatureEngineering

FeaturizationPipelineModel

ModelPipelineBuilding

TrainingPredictions

DeploymentPredictions

DataEngineering

ModelPipelineDeployment

Basic ML Lifecycle

#ML4SAIS

Page 6: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

Example Implementations

!6#ML4SAIS

Data Engineering Feature Engineering Training Algorithm Deployment

Pipeline Model

Spark H2O Spark H2O MOJO

Spark H2O Driverless AI Spark H2O Driverless AI MOJO

Model Building Model Deployment

Page 7: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

H2O + Spark = Sparkling

Water

Page 8: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

#ML4SAIS

H2O + Spark• H2O

• Machine Learning Library • Distributed Algorithms • For ML experts

• Sparkling Water • Integrates H2O & Spark Ecosystems • Transparent for Spark users • Based on Spark pipelines & H2O

!8

Page 9: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

Basic ML Lifecycle: Sparkling Water

!9

ModelTrainingAlgorithm

FeatureEngineering

SparkTransformers H2OMOJOModel

TrainingPredictions

DeploymentPredictions

AutoML

Pipeline

#ML4SAIS

Page 10: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

Demo: Spark Pipeline

Page 11: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

H2O Driverless AI

Page 12: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

#ML4SAIS

H2O Driverless AI• What if I’m not expert ?

• H2O Driverless AI • H2O Driverless AI

• No expert knowledge required • Automatic Feature Engineering & ML

!12

Page 13: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

Basic ML Lifecycle: Driverless AI

!13

ModelTrainingAlgorithm

FeatureEngineering

DriverlessAIFeatureTransformations DriverlessAIModel

TrainingPredictions

DeploymentPredictions

PipelineDriverlessAIMOJOas

#ML4SAIS

Page 14: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

Demo: Driverless AI as Spark Pipeline

Page 15: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

!15

Page 16: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

Driverless AI Pipeline

!16#ML4SAIS

Page 17: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

Governed ML Lifecycle

Page 18: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

Governed ML Lifecycle

!18

ModelTrainingAlgorithm

FeatureEngineering

FeaturizationPipelineModel

ModelPipelineBuilding

TrainingPredictions

DeploymentPredictions

ModelManagement

DataEngineering

ModelPipelineDeployment

ModelMonitoring

AutoDocumentation

#ML4SAIS

Page 19: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

#ML4SAIS

Materials

!19

https://bit.ly/2sxowxD

Page 20: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model

#ML4SAIS

Sparkling Water enables deployment of H2O ML models with Spark Pipelines

!20

Thank you!