Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is...

49
Introduction to Kubeflow aronchick@

Transcript of Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is...

Page 1: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Introduction to Kubeflowaronchick@

Page 2: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Machine Learning is a way of solving problems without explicitly knowing how to create the solution

Page 3: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Google DC Ops

Page 4: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

PUE == Power Usage Effectiveness

Page 5: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

PUE == Power Usage Effectiveness

Page 6: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

PUE == Power Usage Effectiveness

Page 7: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

PUE == Power Usage Effectiveness

Page 8: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

But...

Page 9: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Most FolksMagical

AIGoodness

LOTS OFPAIN

Page 10: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Why the Gap?

Page 11: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Composability

Portability

Scalability

Page 12: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Buildinga

Model

Logging

DataIngestion

DataAnalysis

DataTransform

-ation

DataValidation

Data Splitting

Trainer ModelValidation

TrainingAt Scale

Roll-out Serving Monitoring

Composability

Page 13: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Portability

Page 14: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Each ML Stage is an Independent System

System 6System 5

System 4

TrainingAt Scale

System 3System 1

DataIngestion

DataAnalysis

DataTransform

-ation

DataValidation

System 2

Buildinga

Model

ModelValidation

Serving LoggingMonitoringRoll-out

Data Splitting

Trainer

Page 15: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Portability

Page 16: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Portability

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

HW

Model

Laptop

Page 17: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Portability

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

HW

Model

Laptop

Page 18: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Portability

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

HW HW

Model Model

Laptop Training Rig

Page 19: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Portability

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

HW HW HW

Model Model Model

Laptop Training Rig Cloud

Page 20: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Scalability● Machine specific HW (GPU)● Limited (or unlimited) compute● Network & storage constraints

○ Rack, Server Locality○ Bandwidth constraints

● Heterogeneous hardware● HW & SW lifecycle management● Scale isn’t JUST about adding new

machines!○ Intern vs Researcher○ Scale to 1000s of experiments

Page 21: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

You Know What’s Really Good at Composability,

Portability, and Scalability?

Page 22: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Containers and Kubernetes

Page 23: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Kubernetes

NFSCeph

CassandraMySQL

SparkAirflow

TensorflowCaffe

TF-ServingFlask+Scikit

Operating system (Linux, Windows)

CPU Memory DiskSSD GPU FPGA ASIC NIC

Jupyter

Quota

RBACMonitoring

Logging

GCP AWS Azure On-prem

Namespace

Kubernetes for ML

Page 24: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Kubernetes for ML● Supports accelerators in an extensible manner

○ GPUs already in progress○ Support for FPGAs, high perf NICs under discussion

● Existing Controllers simplify devops challenges○ K8S Jobs for Training○ K8S Deployments for Serving

● Handles 1000s of nodes● Container base images for ML workloads

Page 25: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

But Wait, There’s More!● Kubernetes native scaling objects

○ Autoscaling cluster based on workload metrics○ Priority eviction for removal of low priority jobs○ Scaled to large number of pods (experiments)

● Passes through cluster specs for specific needs○ Scheduling jobs where the data needed to run them is○ Node labels for Heterogeneous HW (more in the future)○ Manage SW drivers and HW health via addons

Page 26: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

But...

Page 27: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Oh, you want to use ML on K8s?

Before that, can you become an expert in:● Containers● Packaging● Kubernetes service endpoints● Persistent volumes● Scaling● Immutable deployments● GPUs, Drivers & the GPL● Cloud APIs● DevOps● ...

Page 28: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Kubeflow

Page 29: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Make it Easy for Everyone to Learn, Deploy and Manage Portable, Distributed ML

on Kubernetes(Everywhere)

Page 30: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Kubernetes + ML = Kubeflow = Win● Composability

○ Choose from existing popular tools○ Uses ksonnet packaging for easy setup

● Portability○ Build using cloud native, portable Kubernetes APIs○ Let K8s community solve for your deployment

● Scalability○ TF already supports CPU/GPU/distributed○ K8s scales to 5k nodes with same stack

Page 31: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Portability

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

HW HW HW

Model Model Model

Laptop Training Rig Cloud

Page 32: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Portability

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

HW HW HW

Model Model Model

Laptop Training Rig Cloud

Page 33: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Portability

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

HW HW HW

Model Model Model

Laptop Training Rig Cloud

Kubeflow

Page 34: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Portability

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

HW HW HW

Model Model Model

Laptop Training Rig Cloud

Kubeflow

Kubeflow

Page 35: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Storage

Framework

Tooling

UX

Model

Portability

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

Drivers

OS

Accelerator

Runtime

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

HW HW HW

Model Model

Laptop Training Rig Cloud

Kubeflow

Kubeflow Kubeflow

Page 36: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Storage

Framework

Tooling

UX

Model

Storage

Framework

Tooling

UX

Model

Portability

Storage

Drivers

OS

Accelerator

Runtime

Framework

Tooling

UX

Drivers

OS

Accelerator

Runtime

Drivers

OS

Accelerator

Runtime

HW HW HW

Model

Laptop Training Rig Cloud

Kubeflow

Kubeflow Kubeflow Kubeflow

Page 37: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

What’s in the Box?● Jupyter Hub - for collaborative & interactive training● A TensorFlow Training Controller● A TensorFlow Serving Deployment● Argo for workflows● SeldonCore for complex inference and non TF models● Reverse Proxy (Ambassador)● Wiring to make it work on any Kubernetes anywhere

Page 38: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

LoggingRoll-out Monitoring

DataIngestion

DataAnalysis

DataTransform

-ation

DataValidation

Data Splitting

What’s in the Box?

Buildinga

ModelTrainer Model

ValidationTrainingAt Scale

Serving

Page 39: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Using Kubeflow# Initialize a ksonnet APPAPP_NAME=my-kubeflowks init ${APP_NAME}cd ${APP_NAME}

# Install Kubeflow componentsks registry add kubeflow github.com/kubeflow/kubeflow/tree/master/kubeflowks pkg install kubeflow/coreks pkg install kubeflow/tf-servingks pkg install kubeflow/tf-job

# Deploy KubeflowNAMESPACE=kubeflowkubectl create namespace ${NAMESPACE}ks generate core kubeflow-core --name=kubeflow-core --namespace=${NAMESPACE}ks apply default -c kubeflow-core

Page 40: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Don’t Like TensorFlow?# Initialize a ksonnet APPAPP_NAME=my-kubeflowks init ${APP_NAME}cd ${APP_NAME}

# Install Kubeflow componentsks registry add kubeflow github.com/kubeflow/kubeflow/tree/master/kubeflowks pkg install kubeflow/coreks pkg install kubeflow/tf-servingks pkg install kubeflow/tf-job

# Deploy KubeflowNAMESPACE=kubeflowkubectl create namespace ${NAMESPACE}ks generate core kubeflow-core --name=kubeflow-core --namespace=${NAMESPACE}ks apply default -c kubeflow-core

ks pkg install kubeflow/sklearn-job # Soon

Page 41: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Don’t Like TF Serving?# Initialize a ksonnet APPAPP_NAME=my-kubeflowks init ${APP_NAME}cd ${APP_NAME}

# Install Kubeflow componentsks registry add kubeflow github.com/kubeflow/kubeflow/tree/master/kubeflowks pkg install kubeflow/coreks pkg install kubeflow/tf-servingks pkg install kubeflow/tf-job

# Deploy KubeflowNAMESPACE=kubeflowkubectl create namespace ${NAMESPACE}ks generate core kubeflow-core --name=kubeflow-core --namespace=${NAMESPACE}ks apply default -c kubeflow-core

ks pkg install kubeflow/seldon-core # Soon

Page 42: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

That’s It?

Page 43: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Yes…(For Now)

Page 44: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Yes…(For Now)

Page 45: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Yes…(For Now)

Page 46: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

We’re Just Getting Started!

● Who’s helping?○ Redhat, Weave, CaiCloud, Canonical, many more

● What’s next...○ Easy to use accelerator integration○ Support for other popular tools like Spark ML, XGBoost, sklearn○ Autoscaled TF Serving○ tf.transform (programmatic data transforms)

● You tell us! (Or better yet, help!)

Page 47: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

Kubeflow is Open- open community- open design- open source- open to ideas

Page 48: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

https://github.com/kubeflow/kubeflowslack: kubeflow (http://kubeflow.slack.com)

twitter: @kubeflow@aronchick ([email protected])

@jeremylewi ([email protected])`

Page 49: Introduction to Kubeflow - Oliver Wyman · Introduction to Kubeflow aronchick@ Machine Learning is a way of solving problems without ... Containers and Kubernetes. Kubernetes NFS

● As a data scientist, you want to use the right HW for the job

● Every variation is an opportunity for pain○ GPUs/FPGAs, ASICs, NICs ○ Kernel drivers, libraries, performance

● Even within an ML frameworks dependencies cause chaos○ Package management○ ML compilation

Container

Kernel

GPU FPGA Infiniband

Drivers

LibraryApp

App

Portability