World's Fastest Machine Learning With...
Transcript of World's Fastest Machine Learning With...
World's FastestMachine Learning
With GPUs
http://github.com/h2oai/h2o4gpuSpeaker: Jonathan C. McKinney
H2O4GPU TEAM
Mateusz Erin Navdeep Rory Terry
Karen Arno Jonathan Steve
Machine Learning
c
Deep Learning
6
RISE OF GPU COMPUTING
GPU-Computing perf1.5X per year
1000Xby 2025
102
103
104
105
106
107
Single-threaded perf
1.5X per year
1.1X per year
APPLICATIONS
SYSTEMS
ALGORITHMS
CUDA
ARCHITECTURE
8
github.com/gpuopenanalyticsGPU Data Frame (GDF)
Ingest/Parse
Exploratory Analysis
Feature Engineering
ML/DL Algorithms
Grid Search
Scoring
ModelExport
H2O4GPU/ Open-Source: http://github.com/h2oai/h2o4gpu
/ Used within our own Driverless AI Product to boost performance 30X
/ Scikit-Learn Python API (and now R API)
/ All Scikit-Learn algorithms included
/ Important algorithms ported to GPU
11
Driverless AI
https://www.youtube.com/watch?v=KkvWX3FD7yI
12
Driverless AI
13
ModelAccuracy & Speed
Generalized Linear Model
/ Algorithm:‒ A solver for convex optimization problems
in graph form using Alternating Direction Method of Multipliers (ADMM)
/ Solvers: Lasso, Ridge Regression, Logistic Regression, and Elastic Net
Regularization/ Improvements to original POGS:
‒ Full alpha search‒ Cross Validation‒ Early Stopping + Warm Start‒ Added Scikit-learn like API‒ Supports multiple GPUs
15
https://www.youtube.com/watch?v=LrC3mBNG7WUhttps://github.com/h2oai/h2o4gpu/blob/master/examples/py/demos/Multi-GPU-H2O-GLM-simple.ipynb
16
https://www.youtube.com/watch?v=4RKSXNfreLE
K-Means
• Significantly faster than scikit-learn implementation (50x)
• Significantly faster than other GPU implementations (5x-10x)
• Supports kmeans++/kmeans|| initialization
• Supports multiple GPUs
• Supports batching data if exceeds GPU memory
https://github.com/h2oai/h2o4gpu/blob/master/examples/py/demos/H2O4GPU_KMeans_Images.ipynb
K-Means
19
10 with latest solver
Principle Component Analysis (PCA)
Generate faces from PCA
Gradient Boosting Machines
/ Based upon XGBoost
/ Raw floating point data -> Binned into Quantiles/ Quantiles are stored as compressed instead of floats
/ Compressed Quantiles are efficiently transferred to GPU/ Sparsity is handled directly with highly GPU efficiency
/ Multi-GPU by sharding rows using NVIDIA NCCL AllReduce
Tree Growth Algorithms
29
https://www.youtube.com/watch?v=NkeSDrifJdg
171 with latest solver
87
51
31
Driverless AI on GPUs
https://www.youtube.com/watch?v=KkvWX3FD7yI
32
33
Driverless AI — Competitive with Kagglers!
Top 8 position in Kaggle with zero manual labor!(ranked above multiple Kaggle Grandmasters)
https://www.kaggle.com/c/mercedes-benz-greener-manufacturing/leaderboard
H2O4GPUhttp://github.com/h2oai/h2o4gpuhttps://stackoverflow.com/questions/tagged/h2o4gpuhttps://gitter.im/h2oai/h2o4gpu
Thank You!Questions?