World's Fastest Machine Learning With...

World's FastestMachine Learning

With GPUs

http://github.com/h2oai/h2o4gpuSpeaker: Jonathan C. McKinney

H2O4GPU TEAM

Mateusz Erin Navdeep Rory Terry

Karen Arno Jonathan Steve

Machine Learning

c

Deep Learning

6

RISE OF GPU COMPUTING

GPU-Computing perf1.5X per year

1000Xby 2025

102

103

104

105

106

107

Single-threaded perf

1.5X per year

1.1X per year

APPLICATIONS

SYSTEMS

ALGORITHMS

CUDA

ARCHITECTURE

8

github.com/gpuopenanalyticsGPU Data Frame (GDF)

Ingest/Parse

Exploratory Analysis

Feature Engineering

ML/DL Algorithms

Grid Search

Scoring

ModelExport

H2O4GPU/ Open-Source: http://github.com/h2oai/h2o4gpu

/ Used within our own Driverless AI Product to boost performance 30X

/ Scikit-Learn Python API (and now R API)

/ All Scikit-Learn algorithms included

/ Important algorithms ported to GPU

11

Driverless AI

https://www.youtube.com/watch?v=KkvWX3FD7yI

12

Driverless AI

13

ModelAccuracy & Speed

Generalized Linear Model

/ Algorithm:‒ A solver for convex optimization problems

in graph form using Alternating Direction Method of Multipliers (ADMM)

/ Solvers: Lasso, Ridge Regression, Logistic Regression, and Elastic Net

Regularization/ Improvements to original POGS:

‒ Full alpha search‒ Cross Validation‒ Early Stopping + Warm Start‒ Added Scikit-learn like API‒ Supports multiple GPUs

15

https://www.youtube.com/watch?v=LrC3mBNG7WUhttps://github.com/h2oai/h2o4gpu/blob/master/examples/py/demos/Multi-GPU-H2O-GLM-simple.ipynb

16

https://www.youtube.com/watch?v=4RKSXNfreLE

K-Means

• Significantly faster than scikit-learn implementation (50x)

• Significantly faster than other GPU implementations (5x-10x)

• Supports kmeans++/kmeans|| initialization

• Supports multiple GPUs

• Supports batching data if exceeds GPU memory

https://github.com/h2oai/h2o4gpu/blob/master/examples/py/demos/H2O4GPU_KMeans_Images.ipynb

K-Means

19

10 with latest solver

Principle Component Analysis (PCA)

Generate faces from PCA

Gradient Boosting Machines

/ Based upon XGBoost

/ Raw floating point data -> Binned into Quantiles/ Quantiles are stored as compressed instead of floats

/ Compressed Quantiles are efficiently transferred to GPU/ Sparsity is handled directly with highly GPU efficiency

/ Multi-GPU by sharding rows using NVIDIA NCCL AllReduce

Tree Growth Algorithms

29

https://www.youtube.com/watch?v=NkeSDrifJdg

171 with latest solver

87

51

31

Driverless AI on GPUs

https://www.youtube.com/watch?v=KkvWX3FD7yI

33

Driverless AI — Competitive with Kagglers!

Top 8 position in Kaggle with zero manual labor!(ranked above multiple Kaggle Grandmasters)

https://www.kaggle.com/c/mercedes-benz-greener-manufacturing/leaderboard

H2O4GPUhttp://github.com/h2oai/h2o4gpuhttps://stackoverflow.com/questions/tagged/h2o4gpuhttps://gitter.im/h2oai/h2o4gpu

Thank You!Questions?

World's Fastest Machine Learning With...

Documents

Transcript of World's Fastest Machine Learning With...