World's Fastest Machine Learning With...

World's FastestMachine Learning

With GPUs

http://github.com/h2oai/h2o4gpuSpeaker: Jonathan C. McKinney

H2O4GPU TEAM

Mateusz Erin Navdeep Rory Terry

Karen Arno Jonathan Steve

Machine Learning

Deep Learning

RISE OF GPU COMPUTING

GPU-Computing perf1.5X per year

1000Xby 2025

Single-threaded perf

1.5X per year

1.1X per year

APPLICATIONS

SYSTEMS

ALGORITHMS

ARCHITECTURE

github.com/gpuopenanalyticsGPU Data Frame (GDF)

Ingest/Parse

Exploratory Analysis

Feature Engineering

ML/DL Algorithms

Grid Search

Scoring

ModelExport

H2O4GPU/ Open-Source: http://github.com/h2oai/h2o4gpu

/ Used within our own Driverless AI Product to boost performance 30X

/ Scikit-Learn Python API (and now R API)

/ All Scikit-Learn algorithms included

/ Important algorithms ported to GPU

Driverless AI

https://www.youtube.com/watch?v=KkvWX3FD7yI

Driverless AI

ModelAccuracy & Speed

Generalized Linear Model

/ Algorithm:‒ A solver for convex optimization problems

in graph form using Alternating Direction Method of Multipliers (ADMM)

/ Solvers: Lasso, Ridge Regression, Logistic Regression, and Elastic Net

Regularization/ Improvements to original POGS:

‒ Full alpha search‒ Cross Validation‒ Early Stopping + Warm Start‒ Added Scikit-learn like API‒ Supports multiple GPUs

https://www.youtube.com/watch?v=LrC3mBNG7WUhttps://github.com/h2oai/h2o4gpu/blob/master/examples/py/demos/Multi-GPU-H2O-GLM-simple.ipynb

https://www.youtube.com/watch?v=4RKSXNfreLE

K-Means

• Significantly faster than scikit-learn implementation (50x)

• Significantly faster than other GPU implementations (5x-10x)

• Supports kmeans++/kmeans|| initialization

• Supports multiple GPUs

• Supports batching data if exceeds GPU memory

https://github.com/h2oai/h2o4gpu/blob/master/examples/py/demos/H2O4GPU_KMeans_Images.ipynb

K-Means

10 with latest solver

Principle Component Analysis (PCA)

Generate faces from PCA

Gradient Boosting Machines

/ Based upon XGBoost

/ Raw floating point data -> Binned into Quantiles/ Quantiles are stored as compressed instead of floats

/ Compressed Quantiles are efficiently transferred to GPU/ Sparsity is handled directly with highly GPU efficiency

/ Multi-GPU by sharding rows using NVIDIA NCCL AllReduce

Tree Growth Algorithms

https://www.youtube.com/watch?v=NkeSDrifJdg

171 with latest solver

Driverless AI on GPUs

https://www.youtube.com/watch?v=KkvWX3FD7yI

Driverless AI — Competitive with Kagglers!

Top 8 position in Kaggle with zero manual labor!(ranked above multiple Kaggle Grandmasters)

https://www.kaggle.com/c/mercedes-benz-greener-manufacturing/leaderboard

H2O4GPUhttp://github.com/h2oai/h2o4gpuhttps://stackoverflow.com/questions/tagged/h2o4gpuhttps://gitter.im/h2oai/h2o4gpu

Thank You!Questions?

World's Fastest Machine Learning With...

Documents

Transcript of World's Fastest Machine Learning With...

Computer Simulation of Lignocellulosic Biomass - GTC …on-demand.gputechconf.com/gtc/2012/presentations/S0659-Computer... · Molecular Dynamics Simulation d s ... Art Ragauskas Jeremy

3D ADI Method for Fluid Simulation on Multiple GPUson-demand.gputechconf.com/gtc/2012/presentations/S0247-3D-ADI... · ADI method – iterations Use global iterations for the whole

Compression on GPUson-demand.gputechconf.com/gtc/2015/presentation/S5455-Sergio-Zarantonello.pdfS5455, GTC 2015 High Capability Multidimensional Data Compression on GPUs Sergio E.

Signal Processing on GPUs for Radio Telescopes - GTC …on-demand.gputechconf.com/gtc/2012/presentations/S0124-GTC2012... · GTC'12 May 14-17, 2012 1 Signal Processing on GPUs for

NVIDIA GTC 2016 - GTC On-Demand Featured Talkson-demand.gputechconf.com/gtc/2016/presentation/s... · 3DS.COM © Dassault Systèmes | Confidential Information | 4/14/2016 | ref.:

Deep Packet Inspection Using GPUson-demand.gputechconf.com/gtc/2017/presentation/s7468-wenji-wu... · Qian Gong, Wenji Wu, Phil DeMar GPU Technology Conference 2017 May 2017 Deep

Hedging Strategy, Simulation and Backtesting | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S...(generated scenarios, stress scenarios and historical back-testing scenarios)

Academic Research Programs & Sponsored Research - GTC 2012on-demand.gputechconf.com/gtc/2012/presentations/S... · Academic Research Programs & Sponsored Research - GTC 2012 Author:

Virtual Reality Features of NVIDIA GPUson-demand.gputechconf.com/siggraph/2016/... · Focus on Virtual Reality Features of Pascal GPUs Virtual Reality renderingfeatures in Pascal

HP GTC Presentation - GTC On Demand | NVIDIA GTC Digitalon-demand.gputechconf.com/gtc/2012/presentations/S... · Flexible LOM SL230s SL250s 7 Introducing HP ProLiant SL230s & SL250s

Powering Real-time Radio Astronomy Signal Processing with GPUson-demand.gputechconf.com/gtc/2013/presentations/S... · Powering RT Radio Astronomy Signal Processing | GTC 2013 Author:

Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway

Practical Real-Time Voxel-Based Global Illumination for Current GPUson-demand.gputechconf.com/gtc/2014/presentations/S4… · · 2014-04-08practical real-time voxel-based global

エヌビディアコーポレーションソリューションアーキテクチャ エ …on-demand.gputechconf.com/gtc/2014/jp/sessions/1000.pdf · ディープラーニングとは？

GTC Presentation March 19, 2013 - GTC On-Demand …on-demand.gputechconf.com/gtc/2013/presentations/S... · Penguin Computing's public HPC cloud Penguin Computing on Demand (POD)

How to run SQL queries on TBs of data … using GPUson-demand.gputechconf.com/gtc-il/...how-to-run-sql-queries-on-tbs-of-data---using-gpus.pdfOutput of remove_if, join, reduce/reduce_by_key,

Implementing Challenging Seismic Attributes on the GPU ...on-demand.gputechconf.com/gtc/...Seismic-Attributes... · Implementing Challenging Seismic Attributes on the GPU | GTC 2013

Building Your Own GPU Research Cluster | GTC 2013on-demand.gputechconf.com/gtc/2013/presentations/S3516-Build-Yo… · Building Your Own GPU Research Cluster | GTC 2013 Author: Pradeep

Lattice Boltzmann multi-phase simulations using GPUson-demand.gputechconf.com/gtc/2010/presentations/S12170-LBM-Multi... · Lattice Boltzmann multi-phase simulations using GPUs Jonas

Packet-based Network Traffic Monitoring & Analysis with GPUson-demand.gputechconf.com/gtc/2014/presentations/S4320... · 2014-04-07 · Packet-based Network Traffic Monitoring & Analysis

エヌビディアコーポレーションソリューションアーキテクチャエ …on-demand.gputechconf.com/gtc/2014/jp/sessions/1000.pdf · ディープラーニングとは？