H2O Deep Water - Making Deep Learning Accessible to Everyone

59
H 2 O Deep Water Making Deep Learning Accessible to Everyone Jo-fai (Joe) Chow Data Scientist [email protected] @matlabulous SF Big Data Science at Metis 23 rd February, 2017

Transcript of H2O Deep Water - Making Deep Learning Accessible to Everyone

Page 1: H2O Deep Water - Making Deep Learning Accessible to Everyone

H2O Deep WaterMaking Deep Learning Accessible to Everyone

Jo-fai (Joe) Chow

Data Scientist

[email protected]

@matlabulous

SF Big Data Science at Metis23rd February, 2017

Page 2: H2O Deep Water - Making Deep Learning Accessible to Everyone

About Me

• Civil (Water) Engineer• 2010 – 2015

• Consultant (UK)• Utilities

• Asset Management

• Constrained Optimization

• Industrial PhD (UK)• Infrastructure Design Optimization

• Machine Learning + Water Engineering

• Discovered H2O in 2014

• Data Scientist• From 2015

• Virgin Media (UK)

• Domino Data Lab (Silicon Valley)

• H2O.ai (Silicon Valley)

2

Page 3: H2O Deep Water - Making Deep Learning Accessible to Everyone

3

Joe (2015)

http://www.h2o.ai/gartner-magic-quadrant/

Page 4: H2O Deep Water - Making Deep Learning Accessible to Everyone

Agenda

• About H2O.ai• Company• Machine Learning Platform

• Deep Learning Tools• TensorFlow, MXNet, Caffe, H2O

• Deep Water• Motivation• Benefits• Interface• Learning Resources

• Conclusions

4

Page 5: H2O Deep Water - Making Deep Learning Accessible to Everyone

About H2O.ai

5

Page 6: H2O Deep Water - Making Deep Learning Accessible to Everyone

Company OverviewFounded 2011 Venture-backed, debuted in 2012

Products • H2O Open Source In-Memory AI Prediction Engine• Sparkling Water• Steam

Mission Operationalize Data Science, and provide a platform for users to build beautiful data products

Team 70 employees• Distributed Systems Engineers doing Machine Learning• World-class visualization designers

Headquarters Mountain View, CA

6

Page 7: H2O Deep Water - Making Deep Learning Accessible to Everyone

H2O Machine Learning Platform

7

Page 8: H2O Deep Water - Making Deep Learning Accessible to Everyone

Supervised Learning

• Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie

• Naïve Bayes

Statistical Analysis

Ensembles

• Distributed Random Forest: Classification or regression models

• Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations

Deep Neural Networks

• Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations

Algorithms OverviewUnsupervised Learning

• K-means: Partitions observations into k clusters/groups of the same spatial size. Automatically detect optimal k

Clustering

Dimensionality Reduction

• Principal Component Analysis: Linearly transforms correlated variables to independent components

• Generalized Low Rank Models: extend the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data

Anomaly Detection

• Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning

8

Page 9: H2O Deep Water - Making Deep Learning Accessible to Everyone

HDFS

S3

NFS

DistributedIn-Memory

Load Data

Loss-lessCompression

H2O Compute Engine

Production Scoring Environment

Exploratory &Descriptive

Analysis

Feature Engineering &

Selection

Supervised &Unsupervised

Modeling

ModelEvaluation &

Selection

Predict

Data & ModelStorage

Model Export:Plain Old Java Object

YourImagination

Data Prep Export:Plain Old Java Object

Local

SQL

High Level Architecture

9

Flow (Web), R, Python APIJava for computation

Page 10: H2O Deep Water - Making Deep Learning Accessible to Everyone

10

H2O Flow (Web) Interface

Page 11: H2O Deep Water - Making Deep Learning Accessible to Everyone

11

H2O Flow (Web) Interface

Page 12: H2O Deep Water - Making Deep Learning Accessible to Everyone

H2O + R

12

Page 13: H2O Deep Water - Making Deep Learning Accessible to Everyone

13

docs.h2o.ai

Page 14: H2O Deep Water - Making Deep Learning Accessible to Everyone

Deep Learning ToolsTensorFlow, mxnet, Caffe and H2O Deep Learning

14

Page 15: H2O Deep Water - Making Deep Learning Accessible to Everyone

TensorFlow

• Open source machine learning framework by Google

• Python / C++ API

• TensorBoard• Data Flow Graph Visualization

• Multi CPU / GPU• v0.8+ distributed machines support

• Multi devices support• desktop, server and Android devices

• Image, audio and NLP applications

• HUGE Community

• Support for Spark, Windows …

15

https://github.com/tensorflow/tensorflow

Page 16: H2O Deep Water - Making Deep Learning Accessible to Everyone

TensorFlow Wrappers

• TFLearn – Simplified interface

• keras – TensorFlow + Theano

• tensorflow.rb – Ruby wrapper

• TensorFlow.jl – Julia wrapper

• TensorFlow for R – R wrapper

• … and many more!

• See: github.com/jtoy/awesome-tensorflow

16

Page 17: H2O Deep Water - Making Deep Learning Accessible to Everyone

Sorting Cucumbers

• Problem• Sorting cucumbers is a laborious

process.• In a Japanese farm, the farmer’s wife

can spend up to eight hours a day sorting cucumbers during peak harvesting period.

• Solution• Farmer’s son (Makoto Koike) used

TensorFlow, Arduino and Raspberry Pi to create an automatic cucumber sorting system.

17

Page 18: H2O Deep Water - Making Deep Learning Accessible to Everyone

Sorting Cucumbers

• Classification Problem• Input: cucumber photos (side, top,

bottom)

• Output: one of nine classes

• Google’s Blog Post [Link]

• YouTube Video [Link]

18

Page 19: H2O Deep Water - Making Deep Learning Accessible to Everyone

19

Page 20: H2O Deep Water - Making Deep Learning Accessible to Everyone

20

Page 21: H2O Deep Water - Making Deep Learning Accessible to Everyone

21

Page 22: H2O Deep Water - Making Deep Learning Accessible to Everyone

TensorKart – Self-driving Mario Kart

22

https://github.com/kevinhughes27/TensorKarthttps://opensourceforu.com/2017/01/tensorflow-brings-self-driving-to-mario-kart/

Page 23: H2O Deep Water - Making Deep Learning Accessible to Everyone

23

https://github.com/dmlc/mxnethttps://www.zeolearn.com/magazine/amazon-to-use-mxnet-as-deep-learning-framework

Page 24: H2O Deep Water - Making Deep Learning Accessible to Everyone

Caffe

• Convolution Architecture For Feature Extraction (CAFFE)

• Pure C++ / CUDA architecture for deep learning

• Command line, Python and MATLAB interface

• Model Zoo• Open collection of models

24

https://docs.google.com/presentation/d/1UeKXVgRvvxg9OUdh_UiC5G71UMscNPlvArsWER41PsU/

Page 25: H2O Deep Water - Making Deep Learning Accessible to Everyone

Supervised Learning

• Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie

• Naïve Bayes

Statistical Analysis

Ensembles

• Distributed Random Forest: Classification or regression models

• Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations

Deep Neural Networks

• Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations

H2O Deep LearningUnsupervised Learning

• K-means: Partitions observations into k clusters/groups of the same spatial size. Automatically detect optimal k

Clustering

Dimensionality Reduction

• Principal Component Analysis: Linearly transforms correlated variables to independent components

• Generalized Low Rank Models: extend the idea of PCA to handle arbitrary data consisting of numerical, Boolean, categorical, and missing data

Anomaly Detection

• Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning

25

Page 26: H2O Deep Water - Making Deep Learning Accessible to Everyone

H2O Deep Learning

26

Page 27: H2O Deep Water - Making Deep Learning Accessible to Everyone

Both TensorFlow and H2O are widely used

27

Page 28: H2O Deep Water - Making Deep Learning Accessible to Everyone

TensorFlow , MXNet, Caffe and H2Odemocratize the power of deep learning.

H2O platform democratizes artificial intelligence & big data science.

There are other open source deep learning libraries like Theano and Torch too.

Let’s have a party, this will be fun!

28

Page 29: H2O Deep Water - Making Deep Learning Accessible to Everyone

29

Page 30: H2O Deep Water - Making Deep Learning Accessible to Everyone

Deep Water Next-Gen Distributed Deep Learning with H2O

H2O integrates with existing GPU backends for significant performance gains

One Interface - GPU Enabled - Significant Performance GainsInherits All H2O Properties in Scalability, Ease of Use and Deployment

Recurrent Neural Networksenabling natural language processing, sequences, time series, and more

Convolutional Neural Networks enabling Image, video, speech recognition

Hybrid Neural Network Architectures enabling speech to text translation, image captioning, scene parsing and more

Deep Water

30

Page 31: H2O Deep Water - Making Deep Learning Accessible to Everyone

Deep Water Architecture

Node 1 Node N

ScalaSpark

H2OJava

Execution Engine

TensorFlow/mxnet/CaffeC++

GPU CPU

TensorFlow/mxnet/CaffeC++

GPU CPU

RPC

R/Py/Flow/Scala clientREST APIWeb server

H2OJava

Execution Engine

grpc/MPI/RDMA

ScalaSpark

31

Page 32: H2O Deep Water - Making Deep Learning Accessible to Everyone

32

Using H2O Flow to train Deep Water Model

Page 34: H2O Deep Water - Making Deep Learning Accessible to Everyone

34

Choosing different network structures

Page 35: H2O Deep Water - Making Deep Learning Accessible to Everyone

35

Choosing different backends(TensorFlow, MXNet, Caffe)

Page 36: H2O Deep Water - Making Deep Learning Accessible to Everyone

Unified Interface for TF, MXNet and Caffe

36

Change backend to“mxnet”, “caffe” or “auto”

Choosing different network structures

Page 37: H2O Deep Water - Making Deep Learning Accessible to Everyone

Easy Stacking with other H2O Models

37

Ensemble of Deep Water, Gradient Boosting Machine & Random Forest models

Page 38: H2O Deep Water - Making Deep Learning Accessible to Everyone

38

docs.h2o.ai

Page 39: H2O Deep Water - Making Deep Learning Accessible to Everyone

39

https://github.com/h2oai/h2o-3/tree/master/examples/deeplearning/notebooks

Deep Water Example notebooks

Page 40: H2O Deep Water - Making Deep Learning Accessible to Everyone

Deep Water Cat/Dog/MouseDemo

40

Page 41: H2O Deep Water - Making Deep Learning Accessible to Everyone

Deep Water H2O + TensorFlow Demo

• H2O + TensorFlow• Dataset – Cat/Dog/Mouse

• TensorFlow as GPU backend

• Train a LeNet (CNN) model

• Interfaces• Python (Jupyter Notebook)

• Web (H2O Flow)

• Code and Data• github.com/h2oai/deepwater

41

Page 42: H2O Deep Water - Making Deep Learning Accessible to Everyone

Data – Cat/Dog/Mouse Images

42

Page 43: H2O Deep Water - Making Deep Learning Accessible to Everyone

Data – CSV

43

Page 44: H2O Deep Water - Making Deep Learning Accessible to Everyone

44

Page 45: H2O Deep Water - Making Deep Learning Accessible to Everyone

45

Page 46: H2O Deep Water - Making Deep Learning Accessible to Everyone

Deep Water Basic Usage

46

Page 47: H2O Deep Water - Making Deep Learning Accessible to Everyone

Unified Interface for TF, MXNet and Caffe

47

Change backend to“mxnet”, “caffe” or “auto”

Choosing different network structures

Page 48: H2O Deep Water - Making Deep Learning Accessible to Everyone

48

Using GPU for training

Page 49: H2O Deep Water - Making Deep Learning Accessible to Everyone

49

Page 50: H2O Deep Water - Making Deep Learning Accessible to Everyone

Deep Water – Custom Network

50

Page 51: H2O Deep Water - Making Deep Learning Accessible to Everyone

51

Saving the custom network structure as a file

Page 52: H2O Deep Water - Making Deep Learning Accessible to Everyone

52

Creating the custom network structure with size = 28x28 and channels = 3

Page 53: H2O Deep Water - Making Deep Learning Accessible to Everyone

53

Specifying the custom network structure for training

Page 54: H2O Deep Water - Making Deep Learning Accessible to Everyone

Conclusions

54

Page 55: H2O Deep Water - Making Deep Learning Accessible to Everyone

Project “Deep Water”

• H2O + TF + MXNet + Caffe• a powerful combination of widely

used open source machine learning libraries.

• All Goodies from H2O• inherits all H2O properties in

scalability, ease of use and deployment.

• Unified Interface• allows users to build, stack and

deploy deep learning models from different libraries efficiently.

55

• 100% Open Source• the party will get bigger!

Page 56: H2O Deep Water - Making Deep Learning Accessible to Everyone

Deep Water – Current Contributors

56

Page 57: H2O Deep Water - Making Deep Learning Accessible to Everyone

57

community.h2o.ai

Page 58: H2O Deep Water - Making Deep Learning Accessible to Everyone

• Organizers & Sponsors• Metis & H2O.ai

• Code, Slides & Documents• bit.ly/h2o_meetups• bit.ly/h2o_deepwater• docs.h2o.ai

• Contact• [email protected]• @matlabulous• github.com/woobe

58

Thanks!

Making Machine Learning Accessible to Everyone

Photo credit: Virgin Media

Page 59: H2O Deep Water - Making Deep Learning Accessible to Everyone

59

https://techcrunch.com/2017/01/26/h2os-deep-water-puts-deep-learning-in-the-hands-of-enterprise-users/