San Francisco Hadoop User Group Meetup Deep Learning

36
Distributed Deep Learning for Classification and Regression Problems using H2O Hadoop User Group Meetup San Francisco, 12/10/14 Arno Candel, H2O.ai

Transcript of San Francisco Hadoop User Group Meetup Deep Learning

Page 1: San Francisco Hadoop User Group Meetup Deep Learning

Distributed Deep Learning for Classification and Regression

Problems using H2O

Hadoop User Group Meetup San Francisco 121014

Arno Candel H2Oai

Who am IPhD in Computational Physics 2005

from ETH Zurich Switzerland

6 years at SLAC - Accelerator Physics Modeling 2 years at Skytree - Machine Learning

1 year at H2Oai - Machine Learning

15 years in Supercomputing amp Modeling

Named ldquo2014 Big Data All-Starrdquo by Fortune Magazine

ArnoCandel

H2O Deep Learning ArnoCandel

OutlineIntro (5 mins)

Methods amp Implementation (10 mins)

Results amp Live Demos (20 mins)

Higgs boson classification

MNIST handwritten digits

text classification

3

H2O Deep Learning ArnoCandel

Teamwork at H2OaiJava Apache v2 Open-Source

1 Java Machine Learning in Github Join the community

4

H2O Deep Learning ArnoCandel

H2O Open-Source (Apache v2) Predictive Analytics Platform

5

H2O Deep Learning ArnoCandel 6

H2O Architecture - Designed for speed scale accuracy amp ease of use

Key technical points bull distributed JVMs + REST API bull no Java GC issues

(data in byte[] Double) bull loss-less number compression bull Hadoop integration (v1YARN) bull R package (CRAN)

Pre-built fully featured algos K-Means NB PCA CoxPH GLM RF GBM DeepLearning

H2O Deep Learning ArnoCandel

WikipediaDeep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using

architectures composed of multiple non-linear transformations

What is Deep Learning

InputImage

Output User ID

7

Example Facebook DeepFace

H2O Deep Learning ArnoCandel

What is NOT DeepLinear models are not deep (by definition)

Neural nets with 1 hidden layer are not deep (only 1 layer - no feature hierarchy)

SVMs and Kernel methods are not deep (2 layers kernel + linear)

Classification trees are not deep (operate on original input space no new features generated)

8

H2O Deep Learning ArnoCandel

1970s multi-layer feed-forward Neural Network (stochastic gradient descent with back-propagation) + distributed processing for big data (fine-grain in-memory MapReduce on distributed data) + multi-threaded speedup (async forkjoin worker threads operate at FORTRAN speeds) + smart algorithms for fast amp accurate results (automatic standardization one-hot encoding of categoricals missing value imputation weight amp bias initialization adaptive learning rate momentum dropoutl1L2 regularization grid search N-fold cross-validation checkpointing load balancing auto-tuning model averaging etc)

= powerful tool for (un)supervised machine learning on real-world data

H2O Deep Learning9

all 320 cores maxed out

H2O Deep Learning ArnoCandel

ldquofully connectedrdquo directed graph of neurons

age

income

employment

married

single

Input layerHidden layer 1

Hidden layer 2

Output layer

3x4 4x3 3x2connections

information flow

inputoutput neuronhidden neuron

4 3 2neurons 3

Example Neural Network10

H2O Deep Learning ArnoCandel

age

income

employmentyj = tanh(sumi(xiuij)+bj)

uij

xi

yj

per-class probabilities sum(pl) = 1

zk = tanh(sumj(yjvjk)+ck)

vjk

zk pl

pl = softmax(sumk(zkwkl)+dl)

wkl

softmax(xk) = exp(xk) sumk(exp(xk))

ldquoneurons activate each other via weighted sumsrdquo

Prediction Forward Propagation

activation function tanh alternative

x -gt max(0x) ldquorectifierrdquo

pl is a non-linear function of xi can approximate ANY function

with enough layers

bj ck dl bias values(indep of inputs)

11

married

single

H2O Deep Learning ArnoCandel

age

income

employment

xi

Automatic standardization of data xi mean = 0 stddev = 1

horizontalize categorical variables eg

full-time part-time none self-employed -gt

010 = part-time 000 = self-employed

Automatic initialization of weights

Poor manrsquos initialization random weights wkl

Default (better) Uniform distribution in+- sqrt(6(units + units_previous_layer))

Data preparation amp InitializationNeural Networks are sensitive to numerical noise operate best in the linear regime (not saturated)

12

married

single

wkl

H2O Deep Learning ArnoCandel

Mean Square Error = (022 + 022)2 ldquopenalize differences per-classrdquo Cross-entropy = -log(08) ldquostrongly penalize non-1-nessrdquo

Training Update Weights amp Biases

Stochastic Gradient Descent Update weights and biases via gradient of the error (via back-propagation)

For each training row we make a prediction and compare with the actual label (supervised learning)

married108predicted actual

Objective minimize prediction error (MSE or cross-entropy)

w ltmdash w - rate partEpartw

1

13

single002

E

wrate

H2O Deep Learning ArnoCandel

Backward Propagation

partEpartwi = partEparty partypartnet partnetpartwi

= part(error(y))party part(activation(net))partnet xi

Backprop Compute partEpartwi via chain rule going backwards

wi

net = sumi(wixi) + b

xiE = error(y)

y = activation(net)

How to compute partEpartwi for wi ltmdash wi - rate partEpartwi

Naive For every i evaluate E twice at (w1hellipwiplusmn∆hellipwN)hellip Slow

14

H2O Deep Learning ArnoCandel

H2O Deep Learning Architecture

K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4

map each node trains a copy of the weights

and biases with (some or all of) its

local data with asynchronous FJ

threads

initial model weights and biases w

updated model w

H2O atomic in-memoryK-V store

reduce model averaging

average weights and biases from all nodes

speedup is at least nodeslog(rows) arxiv12094129v3

Keep iterating over the data (ldquoepochsrdquo) score from time to time

Query amp display the model via

JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i

auto-tuned (default) or user-specified number of points per MapReduce iteration

15

H2O Deep Learning ArnoCandel

Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history

Grid Search and Checkpointing Run a grid search to scan many hyper-parameters then continue training the most promising model(s)

RegularizationL1 penalizes non-zero weights L2 penalizes large weightsDropout randomly ignore certain inputs

16

ldquoSecretrdquo Sauce to Higher Accuracy

H2O Deep Learning ArnoCandel

Detail Adaptive Learning Rate

Compute moving average of ∆wi2 at time t for window length rho

E[∆wi2]t = rho E[∆wi2]t-1 + (1-rho) ∆wi2

Compute RMS of ∆wi at time t with smoothing epsilon

RMS[∆wi]t = sqrt( E[∆wi2]t + epsilon )

Adaptive annealing progress Gradient-dependent learning rate moving window prevents ldquofreezingrdquo (unlike ADAGRAD no window)

Adaptive acceleration momentum accumulate previous weight updates but over a window of time

RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =

Do the same for partEpartwi then obtain per-weight learning rate

cf ADADELTA paper

17

H2O Deep Learning ArnoCandel

Detail Dropout Regularization18

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 2: San Francisco Hadoop User Group Meetup Deep Learning

Who am IPhD in Computational Physics 2005

from ETH Zurich Switzerland

6 years at SLAC - Accelerator Physics Modeling 2 years at Skytree - Machine Learning

1 year at H2Oai - Machine Learning

15 years in Supercomputing amp Modeling

Named ldquo2014 Big Data All-Starrdquo by Fortune Magazine

ArnoCandel

H2O Deep Learning ArnoCandel

OutlineIntro (5 mins)

Methods amp Implementation (10 mins)

Results amp Live Demos (20 mins)

Higgs boson classification

MNIST handwritten digits

text classification

3

H2O Deep Learning ArnoCandel

Teamwork at H2OaiJava Apache v2 Open-Source

1 Java Machine Learning in Github Join the community

4

H2O Deep Learning ArnoCandel

H2O Open-Source (Apache v2) Predictive Analytics Platform

5

H2O Deep Learning ArnoCandel 6

H2O Architecture - Designed for speed scale accuracy amp ease of use

Key technical points bull distributed JVMs + REST API bull no Java GC issues

(data in byte[] Double) bull loss-less number compression bull Hadoop integration (v1YARN) bull R package (CRAN)

Pre-built fully featured algos K-Means NB PCA CoxPH GLM RF GBM DeepLearning

H2O Deep Learning ArnoCandel

WikipediaDeep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using

architectures composed of multiple non-linear transformations

What is Deep Learning

InputImage

Output User ID

7

Example Facebook DeepFace

H2O Deep Learning ArnoCandel

What is NOT DeepLinear models are not deep (by definition)

Neural nets with 1 hidden layer are not deep (only 1 layer - no feature hierarchy)

SVMs and Kernel methods are not deep (2 layers kernel + linear)

Classification trees are not deep (operate on original input space no new features generated)

8

H2O Deep Learning ArnoCandel

1970s multi-layer feed-forward Neural Network (stochastic gradient descent with back-propagation) + distributed processing for big data (fine-grain in-memory MapReduce on distributed data) + multi-threaded speedup (async forkjoin worker threads operate at FORTRAN speeds) + smart algorithms for fast amp accurate results (automatic standardization one-hot encoding of categoricals missing value imputation weight amp bias initialization adaptive learning rate momentum dropoutl1L2 regularization grid search N-fold cross-validation checkpointing load balancing auto-tuning model averaging etc)

= powerful tool for (un)supervised machine learning on real-world data

H2O Deep Learning9

all 320 cores maxed out

H2O Deep Learning ArnoCandel

ldquofully connectedrdquo directed graph of neurons

age

income

employment

married

single

Input layerHidden layer 1

Hidden layer 2

Output layer

3x4 4x3 3x2connections

information flow

inputoutput neuronhidden neuron

4 3 2neurons 3

Example Neural Network10

H2O Deep Learning ArnoCandel

age

income

employmentyj = tanh(sumi(xiuij)+bj)

uij

xi

yj

per-class probabilities sum(pl) = 1

zk = tanh(sumj(yjvjk)+ck)

vjk

zk pl

pl = softmax(sumk(zkwkl)+dl)

wkl

softmax(xk) = exp(xk) sumk(exp(xk))

ldquoneurons activate each other via weighted sumsrdquo

Prediction Forward Propagation

activation function tanh alternative

x -gt max(0x) ldquorectifierrdquo

pl is a non-linear function of xi can approximate ANY function

with enough layers

bj ck dl bias values(indep of inputs)

11

married

single

H2O Deep Learning ArnoCandel

age

income

employment

xi

Automatic standardization of data xi mean = 0 stddev = 1

horizontalize categorical variables eg

full-time part-time none self-employed -gt

010 = part-time 000 = self-employed

Automatic initialization of weights

Poor manrsquos initialization random weights wkl

Default (better) Uniform distribution in+- sqrt(6(units + units_previous_layer))

Data preparation amp InitializationNeural Networks are sensitive to numerical noise operate best in the linear regime (not saturated)

12

married

single

wkl

H2O Deep Learning ArnoCandel

Mean Square Error = (022 + 022)2 ldquopenalize differences per-classrdquo Cross-entropy = -log(08) ldquostrongly penalize non-1-nessrdquo

Training Update Weights amp Biases

Stochastic Gradient Descent Update weights and biases via gradient of the error (via back-propagation)

For each training row we make a prediction and compare with the actual label (supervised learning)

married108predicted actual

Objective minimize prediction error (MSE or cross-entropy)

w ltmdash w - rate partEpartw

1

13

single002

E

wrate

H2O Deep Learning ArnoCandel

Backward Propagation

partEpartwi = partEparty partypartnet partnetpartwi

= part(error(y))party part(activation(net))partnet xi

Backprop Compute partEpartwi via chain rule going backwards

wi

net = sumi(wixi) + b

xiE = error(y)

y = activation(net)

How to compute partEpartwi for wi ltmdash wi - rate partEpartwi

Naive For every i evaluate E twice at (w1hellipwiplusmn∆hellipwN)hellip Slow

14

H2O Deep Learning ArnoCandel

H2O Deep Learning Architecture

K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4

map each node trains a copy of the weights

and biases with (some or all of) its

local data with asynchronous FJ

threads

initial model weights and biases w

updated model w

H2O atomic in-memoryK-V store

reduce model averaging

average weights and biases from all nodes

speedup is at least nodeslog(rows) arxiv12094129v3

Keep iterating over the data (ldquoepochsrdquo) score from time to time

Query amp display the model via

JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i

auto-tuned (default) or user-specified number of points per MapReduce iteration

15

H2O Deep Learning ArnoCandel

Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history

Grid Search and Checkpointing Run a grid search to scan many hyper-parameters then continue training the most promising model(s)

RegularizationL1 penalizes non-zero weights L2 penalizes large weightsDropout randomly ignore certain inputs

16

ldquoSecretrdquo Sauce to Higher Accuracy

H2O Deep Learning ArnoCandel

Detail Adaptive Learning Rate

Compute moving average of ∆wi2 at time t for window length rho

E[∆wi2]t = rho E[∆wi2]t-1 + (1-rho) ∆wi2

Compute RMS of ∆wi at time t with smoothing epsilon

RMS[∆wi]t = sqrt( E[∆wi2]t + epsilon )

Adaptive annealing progress Gradient-dependent learning rate moving window prevents ldquofreezingrdquo (unlike ADAGRAD no window)

Adaptive acceleration momentum accumulate previous weight updates but over a window of time

RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =

Do the same for partEpartwi then obtain per-weight learning rate

cf ADADELTA paper

17

H2O Deep Learning ArnoCandel

Detail Dropout Regularization18

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 3: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

OutlineIntro (5 mins)

Methods amp Implementation (10 mins)

Results amp Live Demos (20 mins)

Higgs boson classification

MNIST handwritten digits

text classification

3

H2O Deep Learning ArnoCandel

Teamwork at H2OaiJava Apache v2 Open-Source

1 Java Machine Learning in Github Join the community

4

H2O Deep Learning ArnoCandel

H2O Open-Source (Apache v2) Predictive Analytics Platform

5

H2O Deep Learning ArnoCandel 6

H2O Architecture - Designed for speed scale accuracy amp ease of use

Key technical points bull distributed JVMs + REST API bull no Java GC issues

(data in byte[] Double) bull loss-less number compression bull Hadoop integration (v1YARN) bull R package (CRAN)

Pre-built fully featured algos K-Means NB PCA CoxPH GLM RF GBM DeepLearning

H2O Deep Learning ArnoCandel

WikipediaDeep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using

architectures composed of multiple non-linear transformations

What is Deep Learning

InputImage

Output User ID

7

Example Facebook DeepFace

H2O Deep Learning ArnoCandel

What is NOT DeepLinear models are not deep (by definition)

Neural nets with 1 hidden layer are not deep (only 1 layer - no feature hierarchy)

SVMs and Kernel methods are not deep (2 layers kernel + linear)

Classification trees are not deep (operate on original input space no new features generated)

8

H2O Deep Learning ArnoCandel

1970s multi-layer feed-forward Neural Network (stochastic gradient descent with back-propagation) + distributed processing for big data (fine-grain in-memory MapReduce on distributed data) + multi-threaded speedup (async forkjoin worker threads operate at FORTRAN speeds) + smart algorithms for fast amp accurate results (automatic standardization one-hot encoding of categoricals missing value imputation weight amp bias initialization adaptive learning rate momentum dropoutl1L2 regularization grid search N-fold cross-validation checkpointing load balancing auto-tuning model averaging etc)

= powerful tool for (un)supervised machine learning on real-world data

H2O Deep Learning9

all 320 cores maxed out

H2O Deep Learning ArnoCandel

ldquofully connectedrdquo directed graph of neurons

age

income

employment

married

single

Input layerHidden layer 1

Hidden layer 2

Output layer

3x4 4x3 3x2connections

information flow

inputoutput neuronhidden neuron

4 3 2neurons 3

Example Neural Network10

H2O Deep Learning ArnoCandel

age

income

employmentyj = tanh(sumi(xiuij)+bj)

uij

xi

yj

per-class probabilities sum(pl) = 1

zk = tanh(sumj(yjvjk)+ck)

vjk

zk pl

pl = softmax(sumk(zkwkl)+dl)

wkl

softmax(xk) = exp(xk) sumk(exp(xk))

ldquoneurons activate each other via weighted sumsrdquo

Prediction Forward Propagation

activation function tanh alternative

x -gt max(0x) ldquorectifierrdquo

pl is a non-linear function of xi can approximate ANY function

with enough layers

bj ck dl bias values(indep of inputs)

11

married

single

H2O Deep Learning ArnoCandel

age

income

employment

xi

Automatic standardization of data xi mean = 0 stddev = 1

horizontalize categorical variables eg

full-time part-time none self-employed -gt

010 = part-time 000 = self-employed

Automatic initialization of weights

Poor manrsquos initialization random weights wkl

Default (better) Uniform distribution in+- sqrt(6(units + units_previous_layer))

Data preparation amp InitializationNeural Networks are sensitive to numerical noise operate best in the linear regime (not saturated)

12

married

single

wkl

H2O Deep Learning ArnoCandel

Mean Square Error = (022 + 022)2 ldquopenalize differences per-classrdquo Cross-entropy = -log(08) ldquostrongly penalize non-1-nessrdquo

Training Update Weights amp Biases

Stochastic Gradient Descent Update weights and biases via gradient of the error (via back-propagation)

For each training row we make a prediction and compare with the actual label (supervised learning)

married108predicted actual

Objective minimize prediction error (MSE or cross-entropy)

w ltmdash w - rate partEpartw

1

13

single002

E

wrate

H2O Deep Learning ArnoCandel

Backward Propagation

partEpartwi = partEparty partypartnet partnetpartwi

= part(error(y))party part(activation(net))partnet xi

Backprop Compute partEpartwi via chain rule going backwards

wi

net = sumi(wixi) + b

xiE = error(y)

y = activation(net)

How to compute partEpartwi for wi ltmdash wi - rate partEpartwi

Naive For every i evaluate E twice at (w1hellipwiplusmn∆hellipwN)hellip Slow

14

H2O Deep Learning ArnoCandel

H2O Deep Learning Architecture

K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4

map each node trains a copy of the weights

and biases with (some or all of) its

local data with asynchronous FJ

threads

initial model weights and biases w

updated model w

H2O atomic in-memoryK-V store

reduce model averaging

average weights and biases from all nodes

speedup is at least nodeslog(rows) arxiv12094129v3

Keep iterating over the data (ldquoepochsrdquo) score from time to time

Query amp display the model via

JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i

auto-tuned (default) or user-specified number of points per MapReduce iteration

15

H2O Deep Learning ArnoCandel

Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history

Grid Search and Checkpointing Run a grid search to scan many hyper-parameters then continue training the most promising model(s)

RegularizationL1 penalizes non-zero weights L2 penalizes large weightsDropout randomly ignore certain inputs

16

ldquoSecretrdquo Sauce to Higher Accuracy

H2O Deep Learning ArnoCandel

Detail Adaptive Learning Rate

Compute moving average of ∆wi2 at time t for window length rho

E[∆wi2]t = rho E[∆wi2]t-1 + (1-rho) ∆wi2

Compute RMS of ∆wi at time t with smoothing epsilon

RMS[∆wi]t = sqrt( E[∆wi2]t + epsilon )

Adaptive annealing progress Gradient-dependent learning rate moving window prevents ldquofreezingrdquo (unlike ADAGRAD no window)

Adaptive acceleration momentum accumulate previous weight updates but over a window of time

RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =

Do the same for partEpartwi then obtain per-weight learning rate

cf ADADELTA paper

17

H2O Deep Learning ArnoCandel

Detail Dropout Regularization18

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 4: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

Teamwork at H2OaiJava Apache v2 Open-Source

1 Java Machine Learning in Github Join the community

4

H2O Deep Learning ArnoCandel

H2O Open-Source (Apache v2) Predictive Analytics Platform

5

H2O Deep Learning ArnoCandel 6

H2O Architecture - Designed for speed scale accuracy amp ease of use

Key technical points bull distributed JVMs + REST API bull no Java GC issues

(data in byte[] Double) bull loss-less number compression bull Hadoop integration (v1YARN) bull R package (CRAN)

Pre-built fully featured algos K-Means NB PCA CoxPH GLM RF GBM DeepLearning

H2O Deep Learning ArnoCandel

WikipediaDeep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using

architectures composed of multiple non-linear transformations

What is Deep Learning

InputImage

Output User ID

7

Example Facebook DeepFace

H2O Deep Learning ArnoCandel

What is NOT DeepLinear models are not deep (by definition)

Neural nets with 1 hidden layer are not deep (only 1 layer - no feature hierarchy)

SVMs and Kernel methods are not deep (2 layers kernel + linear)

Classification trees are not deep (operate on original input space no new features generated)

8

H2O Deep Learning ArnoCandel

1970s multi-layer feed-forward Neural Network (stochastic gradient descent with back-propagation) + distributed processing for big data (fine-grain in-memory MapReduce on distributed data) + multi-threaded speedup (async forkjoin worker threads operate at FORTRAN speeds) + smart algorithms for fast amp accurate results (automatic standardization one-hot encoding of categoricals missing value imputation weight amp bias initialization adaptive learning rate momentum dropoutl1L2 regularization grid search N-fold cross-validation checkpointing load balancing auto-tuning model averaging etc)

= powerful tool for (un)supervised machine learning on real-world data

H2O Deep Learning9

all 320 cores maxed out

H2O Deep Learning ArnoCandel

ldquofully connectedrdquo directed graph of neurons

age

income

employment

married

single

Input layerHidden layer 1

Hidden layer 2

Output layer

3x4 4x3 3x2connections

information flow

inputoutput neuronhidden neuron

4 3 2neurons 3

Example Neural Network10

H2O Deep Learning ArnoCandel

age

income

employmentyj = tanh(sumi(xiuij)+bj)

uij

xi

yj

per-class probabilities sum(pl) = 1

zk = tanh(sumj(yjvjk)+ck)

vjk

zk pl

pl = softmax(sumk(zkwkl)+dl)

wkl

softmax(xk) = exp(xk) sumk(exp(xk))

ldquoneurons activate each other via weighted sumsrdquo

Prediction Forward Propagation

activation function tanh alternative

x -gt max(0x) ldquorectifierrdquo

pl is a non-linear function of xi can approximate ANY function

with enough layers

bj ck dl bias values(indep of inputs)

11

married

single

H2O Deep Learning ArnoCandel

age

income

employment

xi

Automatic standardization of data xi mean = 0 stddev = 1

horizontalize categorical variables eg

full-time part-time none self-employed -gt

010 = part-time 000 = self-employed

Automatic initialization of weights

Poor manrsquos initialization random weights wkl

Default (better) Uniform distribution in+- sqrt(6(units + units_previous_layer))

Data preparation amp InitializationNeural Networks are sensitive to numerical noise operate best in the linear regime (not saturated)

12

married

single

wkl

H2O Deep Learning ArnoCandel

Mean Square Error = (022 + 022)2 ldquopenalize differences per-classrdquo Cross-entropy = -log(08) ldquostrongly penalize non-1-nessrdquo

Training Update Weights amp Biases

Stochastic Gradient Descent Update weights and biases via gradient of the error (via back-propagation)

For each training row we make a prediction and compare with the actual label (supervised learning)

married108predicted actual

Objective minimize prediction error (MSE or cross-entropy)

w ltmdash w - rate partEpartw

1

13

single002

E

wrate

H2O Deep Learning ArnoCandel

Backward Propagation

partEpartwi = partEparty partypartnet partnetpartwi

= part(error(y))party part(activation(net))partnet xi

Backprop Compute partEpartwi via chain rule going backwards

wi

net = sumi(wixi) + b

xiE = error(y)

y = activation(net)

How to compute partEpartwi for wi ltmdash wi - rate partEpartwi

Naive For every i evaluate E twice at (w1hellipwiplusmn∆hellipwN)hellip Slow

14

H2O Deep Learning ArnoCandel

H2O Deep Learning Architecture

K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4

map each node trains a copy of the weights

and biases with (some or all of) its

local data with asynchronous FJ

threads

initial model weights and biases w

updated model w

H2O atomic in-memoryK-V store

reduce model averaging

average weights and biases from all nodes

speedup is at least nodeslog(rows) arxiv12094129v3

Keep iterating over the data (ldquoepochsrdquo) score from time to time

Query amp display the model via

JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i

auto-tuned (default) or user-specified number of points per MapReduce iteration

15

H2O Deep Learning ArnoCandel

Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history

Grid Search and Checkpointing Run a grid search to scan many hyper-parameters then continue training the most promising model(s)

RegularizationL1 penalizes non-zero weights L2 penalizes large weightsDropout randomly ignore certain inputs

16

ldquoSecretrdquo Sauce to Higher Accuracy

H2O Deep Learning ArnoCandel

Detail Adaptive Learning Rate

Compute moving average of ∆wi2 at time t for window length rho

E[∆wi2]t = rho E[∆wi2]t-1 + (1-rho) ∆wi2

Compute RMS of ∆wi at time t with smoothing epsilon

RMS[∆wi]t = sqrt( E[∆wi2]t + epsilon )

Adaptive annealing progress Gradient-dependent learning rate moving window prevents ldquofreezingrdquo (unlike ADAGRAD no window)

Adaptive acceleration momentum accumulate previous weight updates but over a window of time

RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =

Do the same for partEpartwi then obtain per-weight learning rate

cf ADADELTA paper

17

H2O Deep Learning ArnoCandel

Detail Dropout Regularization18

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 5: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

H2O Open-Source (Apache v2) Predictive Analytics Platform

5

H2O Deep Learning ArnoCandel 6

H2O Architecture - Designed for speed scale accuracy amp ease of use

Key technical points bull distributed JVMs + REST API bull no Java GC issues

(data in byte[] Double) bull loss-less number compression bull Hadoop integration (v1YARN) bull R package (CRAN)

Pre-built fully featured algos K-Means NB PCA CoxPH GLM RF GBM DeepLearning

H2O Deep Learning ArnoCandel

WikipediaDeep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using

architectures composed of multiple non-linear transformations

What is Deep Learning

InputImage

Output User ID

7

Example Facebook DeepFace

H2O Deep Learning ArnoCandel

What is NOT DeepLinear models are not deep (by definition)

Neural nets with 1 hidden layer are not deep (only 1 layer - no feature hierarchy)

SVMs and Kernel methods are not deep (2 layers kernel + linear)

Classification trees are not deep (operate on original input space no new features generated)

8

H2O Deep Learning ArnoCandel

1970s multi-layer feed-forward Neural Network (stochastic gradient descent with back-propagation) + distributed processing for big data (fine-grain in-memory MapReduce on distributed data) + multi-threaded speedup (async forkjoin worker threads operate at FORTRAN speeds) + smart algorithms for fast amp accurate results (automatic standardization one-hot encoding of categoricals missing value imputation weight amp bias initialization adaptive learning rate momentum dropoutl1L2 regularization grid search N-fold cross-validation checkpointing load balancing auto-tuning model averaging etc)

= powerful tool for (un)supervised machine learning on real-world data

H2O Deep Learning9

all 320 cores maxed out

H2O Deep Learning ArnoCandel

ldquofully connectedrdquo directed graph of neurons

age

income

employment

married

single

Input layerHidden layer 1

Hidden layer 2

Output layer

3x4 4x3 3x2connections

information flow

inputoutput neuronhidden neuron

4 3 2neurons 3

Example Neural Network10

H2O Deep Learning ArnoCandel

age

income

employmentyj = tanh(sumi(xiuij)+bj)

uij

xi

yj

per-class probabilities sum(pl) = 1

zk = tanh(sumj(yjvjk)+ck)

vjk

zk pl

pl = softmax(sumk(zkwkl)+dl)

wkl

softmax(xk) = exp(xk) sumk(exp(xk))

ldquoneurons activate each other via weighted sumsrdquo

Prediction Forward Propagation

activation function tanh alternative

x -gt max(0x) ldquorectifierrdquo

pl is a non-linear function of xi can approximate ANY function

with enough layers

bj ck dl bias values(indep of inputs)

11

married

single

H2O Deep Learning ArnoCandel

age

income

employment

xi

Automatic standardization of data xi mean = 0 stddev = 1

horizontalize categorical variables eg

full-time part-time none self-employed -gt

010 = part-time 000 = self-employed

Automatic initialization of weights

Poor manrsquos initialization random weights wkl

Default (better) Uniform distribution in+- sqrt(6(units + units_previous_layer))

Data preparation amp InitializationNeural Networks are sensitive to numerical noise operate best in the linear regime (not saturated)

12

married

single

wkl

H2O Deep Learning ArnoCandel

Mean Square Error = (022 + 022)2 ldquopenalize differences per-classrdquo Cross-entropy = -log(08) ldquostrongly penalize non-1-nessrdquo

Training Update Weights amp Biases

Stochastic Gradient Descent Update weights and biases via gradient of the error (via back-propagation)

For each training row we make a prediction and compare with the actual label (supervised learning)

married108predicted actual

Objective minimize prediction error (MSE or cross-entropy)

w ltmdash w - rate partEpartw

1

13

single002

E

wrate

H2O Deep Learning ArnoCandel

Backward Propagation

partEpartwi = partEparty partypartnet partnetpartwi

= part(error(y))party part(activation(net))partnet xi

Backprop Compute partEpartwi via chain rule going backwards

wi

net = sumi(wixi) + b

xiE = error(y)

y = activation(net)

How to compute partEpartwi for wi ltmdash wi - rate partEpartwi

Naive For every i evaluate E twice at (w1hellipwiplusmn∆hellipwN)hellip Slow

14

H2O Deep Learning ArnoCandel

H2O Deep Learning Architecture

K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4

map each node trains a copy of the weights

and biases with (some or all of) its

local data with asynchronous FJ

threads

initial model weights and biases w

updated model w

H2O atomic in-memoryK-V store

reduce model averaging

average weights and biases from all nodes

speedup is at least nodeslog(rows) arxiv12094129v3

Keep iterating over the data (ldquoepochsrdquo) score from time to time

Query amp display the model via

JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i

auto-tuned (default) or user-specified number of points per MapReduce iteration

15

H2O Deep Learning ArnoCandel

Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history

Grid Search and Checkpointing Run a grid search to scan many hyper-parameters then continue training the most promising model(s)

RegularizationL1 penalizes non-zero weights L2 penalizes large weightsDropout randomly ignore certain inputs

16

ldquoSecretrdquo Sauce to Higher Accuracy

H2O Deep Learning ArnoCandel

Detail Adaptive Learning Rate

Compute moving average of ∆wi2 at time t for window length rho

E[∆wi2]t = rho E[∆wi2]t-1 + (1-rho) ∆wi2

Compute RMS of ∆wi at time t with smoothing epsilon

RMS[∆wi]t = sqrt( E[∆wi2]t + epsilon )

Adaptive annealing progress Gradient-dependent learning rate moving window prevents ldquofreezingrdquo (unlike ADAGRAD no window)

Adaptive acceleration momentum accumulate previous weight updates but over a window of time

RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =

Do the same for partEpartwi then obtain per-weight learning rate

cf ADADELTA paper

17

H2O Deep Learning ArnoCandel

Detail Dropout Regularization18

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 6: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel 6

H2O Architecture - Designed for speed scale accuracy amp ease of use

Key technical points bull distributed JVMs + REST API bull no Java GC issues

(data in byte[] Double) bull loss-less number compression bull Hadoop integration (v1YARN) bull R package (CRAN)

Pre-built fully featured algos K-Means NB PCA CoxPH GLM RF GBM DeepLearning

H2O Deep Learning ArnoCandel

WikipediaDeep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using

architectures composed of multiple non-linear transformations

What is Deep Learning

InputImage

Output User ID

7

Example Facebook DeepFace

H2O Deep Learning ArnoCandel

What is NOT DeepLinear models are not deep (by definition)

Neural nets with 1 hidden layer are not deep (only 1 layer - no feature hierarchy)

SVMs and Kernel methods are not deep (2 layers kernel + linear)

Classification trees are not deep (operate on original input space no new features generated)

8

H2O Deep Learning ArnoCandel

1970s multi-layer feed-forward Neural Network (stochastic gradient descent with back-propagation) + distributed processing for big data (fine-grain in-memory MapReduce on distributed data) + multi-threaded speedup (async forkjoin worker threads operate at FORTRAN speeds) + smart algorithms for fast amp accurate results (automatic standardization one-hot encoding of categoricals missing value imputation weight amp bias initialization adaptive learning rate momentum dropoutl1L2 regularization grid search N-fold cross-validation checkpointing load balancing auto-tuning model averaging etc)

= powerful tool for (un)supervised machine learning on real-world data

H2O Deep Learning9

all 320 cores maxed out

H2O Deep Learning ArnoCandel

ldquofully connectedrdquo directed graph of neurons

age

income

employment

married

single

Input layerHidden layer 1

Hidden layer 2

Output layer

3x4 4x3 3x2connections

information flow

inputoutput neuronhidden neuron

4 3 2neurons 3

Example Neural Network10

H2O Deep Learning ArnoCandel

age

income

employmentyj = tanh(sumi(xiuij)+bj)

uij

xi

yj

per-class probabilities sum(pl) = 1

zk = tanh(sumj(yjvjk)+ck)

vjk

zk pl

pl = softmax(sumk(zkwkl)+dl)

wkl

softmax(xk) = exp(xk) sumk(exp(xk))

ldquoneurons activate each other via weighted sumsrdquo

Prediction Forward Propagation

activation function tanh alternative

x -gt max(0x) ldquorectifierrdquo

pl is a non-linear function of xi can approximate ANY function

with enough layers

bj ck dl bias values(indep of inputs)

11

married

single

H2O Deep Learning ArnoCandel

age

income

employment

xi

Automatic standardization of data xi mean = 0 stddev = 1

horizontalize categorical variables eg

full-time part-time none self-employed -gt

010 = part-time 000 = self-employed

Automatic initialization of weights

Poor manrsquos initialization random weights wkl

Default (better) Uniform distribution in+- sqrt(6(units + units_previous_layer))

Data preparation amp InitializationNeural Networks are sensitive to numerical noise operate best in the linear regime (not saturated)

12

married

single

wkl

H2O Deep Learning ArnoCandel

Mean Square Error = (022 + 022)2 ldquopenalize differences per-classrdquo Cross-entropy = -log(08) ldquostrongly penalize non-1-nessrdquo

Training Update Weights amp Biases

Stochastic Gradient Descent Update weights and biases via gradient of the error (via back-propagation)

For each training row we make a prediction and compare with the actual label (supervised learning)

married108predicted actual

Objective minimize prediction error (MSE or cross-entropy)

w ltmdash w - rate partEpartw

1

13

single002

E

wrate

H2O Deep Learning ArnoCandel

Backward Propagation

partEpartwi = partEparty partypartnet partnetpartwi

= part(error(y))party part(activation(net))partnet xi

Backprop Compute partEpartwi via chain rule going backwards

wi

net = sumi(wixi) + b

xiE = error(y)

y = activation(net)

How to compute partEpartwi for wi ltmdash wi - rate partEpartwi

Naive For every i evaluate E twice at (w1hellipwiplusmn∆hellipwN)hellip Slow

14

H2O Deep Learning ArnoCandel

H2O Deep Learning Architecture

K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4

map each node trains a copy of the weights

and biases with (some or all of) its

local data with asynchronous FJ

threads

initial model weights and biases w

updated model w

H2O atomic in-memoryK-V store

reduce model averaging

average weights and biases from all nodes

speedup is at least nodeslog(rows) arxiv12094129v3

Keep iterating over the data (ldquoepochsrdquo) score from time to time

Query amp display the model via

JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i

auto-tuned (default) or user-specified number of points per MapReduce iteration

15

H2O Deep Learning ArnoCandel

Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history

Grid Search and Checkpointing Run a grid search to scan many hyper-parameters then continue training the most promising model(s)

RegularizationL1 penalizes non-zero weights L2 penalizes large weightsDropout randomly ignore certain inputs

16

ldquoSecretrdquo Sauce to Higher Accuracy

H2O Deep Learning ArnoCandel

Detail Adaptive Learning Rate

Compute moving average of ∆wi2 at time t for window length rho

E[∆wi2]t = rho E[∆wi2]t-1 + (1-rho) ∆wi2

Compute RMS of ∆wi at time t with smoothing epsilon

RMS[∆wi]t = sqrt( E[∆wi2]t + epsilon )

Adaptive annealing progress Gradient-dependent learning rate moving window prevents ldquofreezingrdquo (unlike ADAGRAD no window)

Adaptive acceleration momentum accumulate previous weight updates but over a window of time

RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =

Do the same for partEpartwi then obtain per-weight learning rate

cf ADADELTA paper

17

H2O Deep Learning ArnoCandel

Detail Dropout Regularization18

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 7: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

WikipediaDeep learning is a set of algorithms in machine learning that attempt to model high-level abstractions in data by using

architectures composed of multiple non-linear transformations

What is Deep Learning

InputImage

Output User ID

7

Example Facebook DeepFace

H2O Deep Learning ArnoCandel

What is NOT DeepLinear models are not deep (by definition)

Neural nets with 1 hidden layer are not deep (only 1 layer - no feature hierarchy)

SVMs and Kernel methods are not deep (2 layers kernel + linear)

Classification trees are not deep (operate on original input space no new features generated)

8

H2O Deep Learning ArnoCandel

1970s multi-layer feed-forward Neural Network (stochastic gradient descent with back-propagation) + distributed processing for big data (fine-grain in-memory MapReduce on distributed data) + multi-threaded speedup (async forkjoin worker threads operate at FORTRAN speeds) + smart algorithms for fast amp accurate results (automatic standardization one-hot encoding of categoricals missing value imputation weight amp bias initialization adaptive learning rate momentum dropoutl1L2 regularization grid search N-fold cross-validation checkpointing load balancing auto-tuning model averaging etc)

= powerful tool for (un)supervised machine learning on real-world data

H2O Deep Learning9

all 320 cores maxed out

H2O Deep Learning ArnoCandel

ldquofully connectedrdquo directed graph of neurons

age

income

employment

married

single

Input layerHidden layer 1

Hidden layer 2

Output layer

3x4 4x3 3x2connections

information flow

inputoutput neuronhidden neuron

4 3 2neurons 3

Example Neural Network10

H2O Deep Learning ArnoCandel

age

income

employmentyj = tanh(sumi(xiuij)+bj)

uij

xi

yj

per-class probabilities sum(pl) = 1

zk = tanh(sumj(yjvjk)+ck)

vjk

zk pl

pl = softmax(sumk(zkwkl)+dl)

wkl

softmax(xk) = exp(xk) sumk(exp(xk))

ldquoneurons activate each other via weighted sumsrdquo

Prediction Forward Propagation

activation function tanh alternative

x -gt max(0x) ldquorectifierrdquo

pl is a non-linear function of xi can approximate ANY function

with enough layers

bj ck dl bias values(indep of inputs)

11

married

single

H2O Deep Learning ArnoCandel

age

income

employment

xi

Automatic standardization of data xi mean = 0 stddev = 1

horizontalize categorical variables eg

full-time part-time none self-employed -gt

010 = part-time 000 = self-employed

Automatic initialization of weights

Poor manrsquos initialization random weights wkl

Default (better) Uniform distribution in+- sqrt(6(units + units_previous_layer))

Data preparation amp InitializationNeural Networks are sensitive to numerical noise operate best in the linear regime (not saturated)

12

married

single

wkl

H2O Deep Learning ArnoCandel

Mean Square Error = (022 + 022)2 ldquopenalize differences per-classrdquo Cross-entropy = -log(08) ldquostrongly penalize non-1-nessrdquo

Training Update Weights amp Biases

Stochastic Gradient Descent Update weights and biases via gradient of the error (via back-propagation)

For each training row we make a prediction and compare with the actual label (supervised learning)

married108predicted actual

Objective minimize prediction error (MSE or cross-entropy)

w ltmdash w - rate partEpartw

1

13

single002

E

wrate

H2O Deep Learning ArnoCandel

Backward Propagation

partEpartwi = partEparty partypartnet partnetpartwi

= part(error(y))party part(activation(net))partnet xi

Backprop Compute partEpartwi via chain rule going backwards

wi

net = sumi(wixi) + b

xiE = error(y)

y = activation(net)

How to compute partEpartwi for wi ltmdash wi - rate partEpartwi

Naive For every i evaluate E twice at (w1hellipwiplusmn∆hellipwN)hellip Slow

14

H2O Deep Learning ArnoCandel

H2O Deep Learning Architecture

K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4

map each node trains a copy of the weights

and biases with (some or all of) its

local data with asynchronous FJ

threads

initial model weights and biases w

updated model w

H2O atomic in-memoryK-V store

reduce model averaging

average weights and biases from all nodes

speedup is at least nodeslog(rows) arxiv12094129v3

Keep iterating over the data (ldquoepochsrdquo) score from time to time

Query amp display the model via

JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i

auto-tuned (default) or user-specified number of points per MapReduce iteration

15

H2O Deep Learning ArnoCandel

Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history

Grid Search and Checkpointing Run a grid search to scan many hyper-parameters then continue training the most promising model(s)

RegularizationL1 penalizes non-zero weights L2 penalizes large weightsDropout randomly ignore certain inputs

16

ldquoSecretrdquo Sauce to Higher Accuracy

H2O Deep Learning ArnoCandel

Detail Adaptive Learning Rate

Compute moving average of ∆wi2 at time t for window length rho

E[∆wi2]t = rho E[∆wi2]t-1 + (1-rho) ∆wi2

Compute RMS of ∆wi at time t with smoothing epsilon

RMS[∆wi]t = sqrt( E[∆wi2]t + epsilon )

Adaptive annealing progress Gradient-dependent learning rate moving window prevents ldquofreezingrdquo (unlike ADAGRAD no window)

Adaptive acceleration momentum accumulate previous weight updates but over a window of time

RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =

Do the same for partEpartwi then obtain per-weight learning rate

cf ADADELTA paper

17

H2O Deep Learning ArnoCandel

Detail Dropout Regularization18

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 8: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

What is NOT DeepLinear models are not deep (by definition)

Neural nets with 1 hidden layer are not deep (only 1 layer - no feature hierarchy)

SVMs and Kernel methods are not deep (2 layers kernel + linear)

Classification trees are not deep (operate on original input space no new features generated)

8

H2O Deep Learning ArnoCandel

1970s multi-layer feed-forward Neural Network (stochastic gradient descent with back-propagation) + distributed processing for big data (fine-grain in-memory MapReduce on distributed data) + multi-threaded speedup (async forkjoin worker threads operate at FORTRAN speeds) + smart algorithms for fast amp accurate results (automatic standardization one-hot encoding of categoricals missing value imputation weight amp bias initialization adaptive learning rate momentum dropoutl1L2 regularization grid search N-fold cross-validation checkpointing load balancing auto-tuning model averaging etc)

= powerful tool for (un)supervised machine learning on real-world data

H2O Deep Learning9

all 320 cores maxed out

H2O Deep Learning ArnoCandel

ldquofully connectedrdquo directed graph of neurons

age

income

employment

married

single

Input layerHidden layer 1

Hidden layer 2

Output layer

3x4 4x3 3x2connections

information flow

inputoutput neuronhidden neuron

4 3 2neurons 3

Example Neural Network10

H2O Deep Learning ArnoCandel

age

income

employmentyj = tanh(sumi(xiuij)+bj)

uij

xi

yj

per-class probabilities sum(pl) = 1

zk = tanh(sumj(yjvjk)+ck)

vjk

zk pl

pl = softmax(sumk(zkwkl)+dl)

wkl

softmax(xk) = exp(xk) sumk(exp(xk))

ldquoneurons activate each other via weighted sumsrdquo

Prediction Forward Propagation

activation function tanh alternative

x -gt max(0x) ldquorectifierrdquo

pl is a non-linear function of xi can approximate ANY function

with enough layers

bj ck dl bias values(indep of inputs)

11

married

single

H2O Deep Learning ArnoCandel

age

income

employment

xi

Automatic standardization of data xi mean = 0 stddev = 1

horizontalize categorical variables eg

full-time part-time none self-employed -gt

010 = part-time 000 = self-employed

Automatic initialization of weights

Poor manrsquos initialization random weights wkl

Default (better) Uniform distribution in+- sqrt(6(units + units_previous_layer))

Data preparation amp InitializationNeural Networks are sensitive to numerical noise operate best in the linear regime (not saturated)

12

married

single

wkl

H2O Deep Learning ArnoCandel

Mean Square Error = (022 + 022)2 ldquopenalize differences per-classrdquo Cross-entropy = -log(08) ldquostrongly penalize non-1-nessrdquo

Training Update Weights amp Biases

Stochastic Gradient Descent Update weights and biases via gradient of the error (via back-propagation)

For each training row we make a prediction and compare with the actual label (supervised learning)

married108predicted actual

Objective minimize prediction error (MSE or cross-entropy)

w ltmdash w - rate partEpartw

1

13

single002

E

wrate

H2O Deep Learning ArnoCandel

Backward Propagation

partEpartwi = partEparty partypartnet partnetpartwi

= part(error(y))party part(activation(net))partnet xi

Backprop Compute partEpartwi via chain rule going backwards

wi

net = sumi(wixi) + b

xiE = error(y)

y = activation(net)

How to compute partEpartwi for wi ltmdash wi - rate partEpartwi

Naive For every i evaluate E twice at (w1hellipwiplusmn∆hellipwN)hellip Slow

14

H2O Deep Learning ArnoCandel

H2O Deep Learning Architecture

K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4

map each node trains a copy of the weights

and biases with (some or all of) its

local data with asynchronous FJ

threads

initial model weights and biases w

updated model w

H2O atomic in-memoryK-V store

reduce model averaging

average weights and biases from all nodes

speedup is at least nodeslog(rows) arxiv12094129v3

Keep iterating over the data (ldquoepochsrdquo) score from time to time

Query amp display the model via

JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i

auto-tuned (default) or user-specified number of points per MapReduce iteration

15

H2O Deep Learning ArnoCandel

Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history

Grid Search and Checkpointing Run a grid search to scan many hyper-parameters then continue training the most promising model(s)

RegularizationL1 penalizes non-zero weights L2 penalizes large weightsDropout randomly ignore certain inputs

16

ldquoSecretrdquo Sauce to Higher Accuracy

H2O Deep Learning ArnoCandel

Detail Adaptive Learning Rate

Compute moving average of ∆wi2 at time t for window length rho

E[∆wi2]t = rho E[∆wi2]t-1 + (1-rho) ∆wi2

Compute RMS of ∆wi at time t with smoothing epsilon

RMS[∆wi]t = sqrt( E[∆wi2]t + epsilon )

Adaptive annealing progress Gradient-dependent learning rate moving window prevents ldquofreezingrdquo (unlike ADAGRAD no window)

Adaptive acceleration momentum accumulate previous weight updates but over a window of time

RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =

Do the same for partEpartwi then obtain per-weight learning rate

cf ADADELTA paper

17

H2O Deep Learning ArnoCandel

Detail Dropout Regularization18

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 9: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

1970s multi-layer feed-forward Neural Network (stochastic gradient descent with back-propagation) + distributed processing for big data (fine-grain in-memory MapReduce on distributed data) + multi-threaded speedup (async forkjoin worker threads operate at FORTRAN speeds) + smart algorithms for fast amp accurate results (automatic standardization one-hot encoding of categoricals missing value imputation weight amp bias initialization adaptive learning rate momentum dropoutl1L2 regularization grid search N-fold cross-validation checkpointing load balancing auto-tuning model averaging etc)

= powerful tool for (un)supervised machine learning on real-world data

H2O Deep Learning9

all 320 cores maxed out

H2O Deep Learning ArnoCandel

ldquofully connectedrdquo directed graph of neurons

age

income

employment

married

single

Input layerHidden layer 1

Hidden layer 2

Output layer

3x4 4x3 3x2connections

information flow

inputoutput neuronhidden neuron

4 3 2neurons 3

Example Neural Network10

H2O Deep Learning ArnoCandel

age

income

employmentyj = tanh(sumi(xiuij)+bj)

uij

xi

yj

per-class probabilities sum(pl) = 1

zk = tanh(sumj(yjvjk)+ck)

vjk

zk pl

pl = softmax(sumk(zkwkl)+dl)

wkl

softmax(xk) = exp(xk) sumk(exp(xk))

ldquoneurons activate each other via weighted sumsrdquo

Prediction Forward Propagation

activation function tanh alternative

x -gt max(0x) ldquorectifierrdquo

pl is a non-linear function of xi can approximate ANY function

with enough layers

bj ck dl bias values(indep of inputs)

11

married

single

H2O Deep Learning ArnoCandel

age

income

employment

xi

Automatic standardization of data xi mean = 0 stddev = 1

horizontalize categorical variables eg

full-time part-time none self-employed -gt

010 = part-time 000 = self-employed

Automatic initialization of weights

Poor manrsquos initialization random weights wkl

Default (better) Uniform distribution in+- sqrt(6(units + units_previous_layer))

Data preparation amp InitializationNeural Networks are sensitive to numerical noise operate best in the linear regime (not saturated)

12

married

single

wkl

H2O Deep Learning ArnoCandel

Mean Square Error = (022 + 022)2 ldquopenalize differences per-classrdquo Cross-entropy = -log(08) ldquostrongly penalize non-1-nessrdquo

Training Update Weights amp Biases

Stochastic Gradient Descent Update weights and biases via gradient of the error (via back-propagation)

For each training row we make a prediction and compare with the actual label (supervised learning)

married108predicted actual

Objective minimize prediction error (MSE or cross-entropy)

w ltmdash w - rate partEpartw

1

13

single002

E

wrate

H2O Deep Learning ArnoCandel

Backward Propagation

partEpartwi = partEparty partypartnet partnetpartwi

= part(error(y))party part(activation(net))partnet xi

Backprop Compute partEpartwi via chain rule going backwards

wi

net = sumi(wixi) + b

xiE = error(y)

y = activation(net)

How to compute partEpartwi for wi ltmdash wi - rate partEpartwi

Naive For every i evaluate E twice at (w1hellipwiplusmn∆hellipwN)hellip Slow

14

H2O Deep Learning ArnoCandel

H2O Deep Learning Architecture

K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4

map each node trains a copy of the weights

and biases with (some or all of) its

local data with asynchronous FJ

threads

initial model weights and biases w

updated model w

H2O atomic in-memoryK-V store

reduce model averaging

average weights and biases from all nodes

speedup is at least nodeslog(rows) arxiv12094129v3

Keep iterating over the data (ldquoepochsrdquo) score from time to time

Query amp display the model via

JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i

auto-tuned (default) or user-specified number of points per MapReduce iteration

15

H2O Deep Learning ArnoCandel

Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history

Grid Search and Checkpointing Run a grid search to scan many hyper-parameters then continue training the most promising model(s)

RegularizationL1 penalizes non-zero weights L2 penalizes large weightsDropout randomly ignore certain inputs

16

ldquoSecretrdquo Sauce to Higher Accuracy

H2O Deep Learning ArnoCandel

Detail Adaptive Learning Rate

Compute moving average of ∆wi2 at time t for window length rho

E[∆wi2]t = rho E[∆wi2]t-1 + (1-rho) ∆wi2

Compute RMS of ∆wi at time t with smoothing epsilon

RMS[∆wi]t = sqrt( E[∆wi2]t + epsilon )

Adaptive annealing progress Gradient-dependent learning rate moving window prevents ldquofreezingrdquo (unlike ADAGRAD no window)

Adaptive acceleration momentum accumulate previous weight updates but over a window of time

RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =

Do the same for partEpartwi then obtain per-weight learning rate

cf ADADELTA paper

17

H2O Deep Learning ArnoCandel

Detail Dropout Regularization18

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 10: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

ldquofully connectedrdquo directed graph of neurons

age

income

employment

married

single

Input layerHidden layer 1

Hidden layer 2

Output layer

3x4 4x3 3x2connections

information flow

inputoutput neuronhidden neuron

4 3 2neurons 3

Example Neural Network10

H2O Deep Learning ArnoCandel

age

income

employmentyj = tanh(sumi(xiuij)+bj)

uij

xi

yj

per-class probabilities sum(pl) = 1

zk = tanh(sumj(yjvjk)+ck)

vjk

zk pl

pl = softmax(sumk(zkwkl)+dl)

wkl

softmax(xk) = exp(xk) sumk(exp(xk))

ldquoneurons activate each other via weighted sumsrdquo

Prediction Forward Propagation

activation function tanh alternative

x -gt max(0x) ldquorectifierrdquo

pl is a non-linear function of xi can approximate ANY function

with enough layers

bj ck dl bias values(indep of inputs)

11

married

single

H2O Deep Learning ArnoCandel

age

income

employment

xi

Automatic standardization of data xi mean = 0 stddev = 1

horizontalize categorical variables eg

full-time part-time none self-employed -gt

010 = part-time 000 = self-employed

Automatic initialization of weights

Poor manrsquos initialization random weights wkl

Default (better) Uniform distribution in+- sqrt(6(units + units_previous_layer))

Data preparation amp InitializationNeural Networks are sensitive to numerical noise operate best in the linear regime (not saturated)

12

married

single

wkl

H2O Deep Learning ArnoCandel

Mean Square Error = (022 + 022)2 ldquopenalize differences per-classrdquo Cross-entropy = -log(08) ldquostrongly penalize non-1-nessrdquo

Training Update Weights amp Biases

Stochastic Gradient Descent Update weights and biases via gradient of the error (via back-propagation)

For each training row we make a prediction and compare with the actual label (supervised learning)

married108predicted actual

Objective minimize prediction error (MSE or cross-entropy)

w ltmdash w - rate partEpartw

1

13

single002

E

wrate

H2O Deep Learning ArnoCandel

Backward Propagation

partEpartwi = partEparty partypartnet partnetpartwi

= part(error(y))party part(activation(net))partnet xi

Backprop Compute partEpartwi via chain rule going backwards

wi

net = sumi(wixi) + b

xiE = error(y)

y = activation(net)

How to compute partEpartwi for wi ltmdash wi - rate partEpartwi

Naive For every i evaluate E twice at (w1hellipwiplusmn∆hellipwN)hellip Slow

14

H2O Deep Learning ArnoCandel

H2O Deep Learning Architecture

K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4

map each node trains a copy of the weights

and biases with (some or all of) its

local data with asynchronous FJ

threads

initial model weights and biases w

updated model w

H2O atomic in-memoryK-V store

reduce model averaging

average weights and biases from all nodes

speedup is at least nodeslog(rows) arxiv12094129v3

Keep iterating over the data (ldquoepochsrdquo) score from time to time

Query amp display the model via

JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i

auto-tuned (default) or user-specified number of points per MapReduce iteration

15

H2O Deep Learning ArnoCandel

Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history

Grid Search and Checkpointing Run a grid search to scan many hyper-parameters then continue training the most promising model(s)

RegularizationL1 penalizes non-zero weights L2 penalizes large weightsDropout randomly ignore certain inputs

16

ldquoSecretrdquo Sauce to Higher Accuracy

H2O Deep Learning ArnoCandel

Detail Adaptive Learning Rate

Compute moving average of ∆wi2 at time t for window length rho

E[∆wi2]t = rho E[∆wi2]t-1 + (1-rho) ∆wi2

Compute RMS of ∆wi at time t with smoothing epsilon

RMS[∆wi]t = sqrt( E[∆wi2]t + epsilon )

Adaptive annealing progress Gradient-dependent learning rate moving window prevents ldquofreezingrdquo (unlike ADAGRAD no window)

Adaptive acceleration momentum accumulate previous weight updates but over a window of time

RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =

Do the same for partEpartwi then obtain per-weight learning rate

cf ADADELTA paper

17

H2O Deep Learning ArnoCandel

Detail Dropout Regularization18

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 11: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

age

income

employmentyj = tanh(sumi(xiuij)+bj)

uij

xi

yj

per-class probabilities sum(pl) = 1

zk = tanh(sumj(yjvjk)+ck)

vjk

zk pl

pl = softmax(sumk(zkwkl)+dl)

wkl

softmax(xk) = exp(xk) sumk(exp(xk))

ldquoneurons activate each other via weighted sumsrdquo

Prediction Forward Propagation

activation function tanh alternative

x -gt max(0x) ldquorectifierrdquo

pl is a non-linear function of xi can approximate ANY function

with enough layers

bj ck dl bias values(indep of inputs)

11

married

single

H2O Deep Learning ArnoCandel

age

income

employment

xi

Automatic standardization of data xi mean = 0 stddev = 1

horizontalize categorical variables eg

full-time part-time none self-employed -gt

010 = part-time 000 = self-employed

Automatic initialization of weights

Poor manrsquos initialization random weights wkl

Default (better) Uniform distribution in+- sqrt(6(units + units_previous_layer))

Data preparation amp InitializationNeural Networks are sensitive to numerical noise operate best in the linear regime (not saturated)

12

married

single

wkl

H2O Deep Learning ArnoCandel

Mean Square Error = (022 + 022)2 ldquopenalize differences per-classrdquo Cross-entropy = -log(08) ldquostrongly penalize non-1-nessrdquo

Training Update Weights amp Biases

Stochastic Gradient Descent Update weights and biases via gradient of the error (via back-propagation)

For each training row we make a prediction and compare with the actual label (supervised learning)

married108predicted actual

Objective minimize prediction error (MSE or cross-entropy)

w ltmdash w - rate partEpartw

1

13

single002

E

wrate

H2O Deep Learning ArnoCandel

Backward Propagation

partEpartwi = partEparty partypartnet partnetpartwi

= part(error(y))party part(activation(net))partnet xi

Backprop Compute partEpartwi via chain rule going backwards

wi

net = sumi(wixi) + b

xiE = error(y)

y = activation(net)

How to compute partEpartwi for wi ltmdash wi - rate partEpartwi

Naive For every i evaluate E twice at (w1hellipwiplusmn∆hellipwN)hellip Slow

14

H2O Deep Learning ArnoCandel

H2O Deep Learning Architecture

K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4

map each node trains a copy of the weights

and biases with (some or all of) its

local data with asynchronous FJ

threads

initial model weights and biases w

updated model w

H2O atomic in-memoryK-V store

reduce model averaging

average weights and biases from all nodes

speedup is at least nodeslog(rows) arxiv12094129v3

Keep iterating over the data (ldquoepochsrdquo) score from time to time

Query amp display the model via

JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i

auto-tuned (default) or user-specified number of points per MapReduce iteration

15

H2O Deep Learning ArnoCandel

Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history

Grid Search and Checkpointing Run a grid search to scan many hyper-parameters then continue training the most promising model(s)

RegularizationL1 penalizes non-zero weights L2 penalizes large weightsDropout randomly ignore certain inputs

16

ldquoSecretrdquo Sauce to Higher Accuracy

H2O Deep Learning ArnoCandel

Detail Adaptive Learning Rate

Compute moving average of ∆wi2 at time t for window length rho

E[∆wi2]t = rho E[∆wi2]t-1 + (1-rho) ∆wi2

Compute RMS of ∆wi at time t with smoothing epsilon

RMS[∆wi]t = sqrt( E[∆wi2]t + epsilon )

Adaptive annealing progress Gradient-dependent learning rate moving window prevents ldquofreezingrdquo (unlike ADAGRAD no window)

Adaptive acceleration momentum accumulate previous weight updates but over a window of time

RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =

Do the same for partEpartwi then obtain per-weight learning rate

cf ADADELTA paper

17

H2O Deep Learning ArnoCandel

Detail Dropout Regularization18

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 12: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

age

income

employment

xi

Automatic standardization of data xi mean = 0 stddev = 1

horizontalize categorical variables eg

full-time part-time none self-employed -gt

010 = part-time 000 = self-employed

Automatic initialization of weights

Poor manrsquos initialization random weights wkl

Default (better) Uniform distribution in+- sqrt(6(units + units_previous_layer))

Data preparation amp InitializationNeural Networks are sensitive to numerical noise operate best in the linear regime (not saturated)

12

married

single

wkl

H2O Deep Learning ArnoCandel

Mean Square Error = (022 + 022)2 ldquopenalize differences per-classrdquo Cross-entropy = -log(08) ldquostrongly penalize non-1-nessrdquo

Training Update Weights amp Biases

Stochastic Gradient Descent Update weights and biases via gradient of the error (via back-propagation)

For each training row we make a prediction and compare with the actual label (supervised learning)

married108predicted actual

Objective minimize prediction error (MSE or cross-entropy)

w ltmdash w - rate partEpartw

1

13

single002

E

wrate

H2O Deep Learning ArnoCandel

Backward Propagation

partEpartwi = partEparty partypartnet partnetpartwi

= part(error(y))party part(activation(net))partnet xi

Backprop Compute partEpartwi via chain rule going backwards

wi

net = sumi(wixi) + b

xiE = error(y)

y = activation(net)

How to compute partEpartwi for wi ltmdash wi - rate partEpartwi

Naive For every i evaluate E twice at (w1hellipwiplusmn∆hellipwN)hellip Slow

14

H2O Deep Learning ArnoCandel

H2O Deep Learning Architecture

K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4

map each node trains a copy of the weights

and biases with (some or all of) its

local data with asynchronous FJ

threads

initial model weights and biases w

updated model w

H2O atomic in-memoryK-V store

reduce model averaging

average weights and biases from all nodes

speedup is at least nodeslog(rows) arxiv12094129v3

Keep iterating over the data (ldquoepochsrdquo) score from time to time

Query amp display the model via

JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i

auto-tuned (default) or user-specified number of points per MapReduce iteration

15

H2O Deep Learning ArnoCandel

Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history

Grid Search and Checkpointing Run a grid search to scan many hyper-parameters then continue training the most promising model(s)

RegularizationL1 penalizes non-zero weights L2 penalizes large weightsDropout randomly ignore certain inputs

16

ldquoSecretrdquo Sauce to Higher Accuracy

H2O Deep Learning ArnoCandel

Detail Adaptive Learning Rate

Compute moving average of ∆wi2 at time t for window length rho

E[∆wi2]t = rho E[∆wi2]t-1 + (1-rho) ∆wi2

Compute RMS of ∆wi at time t with smoothing epsilon

RMS[∆wi]t = sqrt( E[∆wi2]t + epsilon )

Adaptive annealing progress Gradient-dependent learning rate moving window prevents ldquofreezingrdquo (unlike ADAGRAD no window)

Adaptive acceleration momentum accumulate previous weight updates but over a window of time

RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =

Do the same for partEpartwi then obtain per-weight learning rate

cf ADADELTA paper

17

H2O Deep Learning ArnoCandel

Detail Dropout Regularization18

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 13: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

Mean Square Error = (022 + 022)2 ldquopenalize differences per-classrdquo Cross-entropy = -log(08) ldquostrongly penalize non-1-nessrdquo

Training Update Weights amp Biases

Stochastic Gradient Descent Update weights and biases via gradient of the error (via back-propagation)

For each training row we make a prediction and compare with the actual label (supervised learning)

married108predicted actual

Objective minimize prediction error (MSE or cross-entropy)

w ltmdash w - rate partEpartw

1

13

single002

E

wrate

H2O Deep Learning ArnoCandel

Backward Propagation

partEpartwi = partEparty partypartnet partnetpartwi

= part(error(y))party part(activation(net))partnet xi

Backprop Compute partEpartwi via chain rule going backwards

wi

net = sumi(wixi) + b

xiE = error(y)

y = activation(net)

How to compute partEpartwi for wi ltmdash wi - rate partEpartwi

Naive For every i evaluate E twice at (w1hellipwiplusmn∆hellipwN)hellip Slow

14

H2O Deep Learning ArnoCandel

H2O Deep Learning Architecture

K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4

map each node trains a copy of the weights

and biases with (some or all of) its

local data with asynchronous FJ

threads

initial model weights and biases w

updated model w

H2O atomic in-memoryK-V store

reduce model averaging

average weights and biases from all nodes

speedup is at least nodeslog(rows) arxiv12094129v3

Keep iterating over the data (ldquoepochsrdquo) score from time to time

Query amp display the model via

JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i

auto-tuned (default) or user-specified number of points per MapReduce iteration

15

H2O Deep Learning ArnoCandel

Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history

Grid Search and Checkpointing Run a grid search to scan many hyper-parameters then continue training the most promising model(s)

RegularizationL1 penalizes non-zero weights L2 penalizes large weightsDropout randomly ignore certain inputs

16

ldquoSecretrdquo Sauce to Higher Accuracy

H2O Deep Learning ArnoCandel

Detail Adaptive Learning Rate

Compute moving average of ∆wi2 at time t for window length rho

E[∆wi2]t = rho E[∆wi2]t-1 + (1-rho) ∆wi2

Compute RMS of ∆wi at time t with smoothing epsilon

RMS[∆wi]t = sqrt( E[∆wi2]t + epsilon )

Adaptive annealing progress Gradient-dependent learning rate moving window prevents ldquofreezingrdquo (unlike ADAGRAD no window)

Adaptive acceleration momentum accumulate previous weight updates but over a window of time

RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =

Do the same for partEpartwi then obtain per-weight learning rate

cf ADADELTA paper

17

H2O Deep Learning ArnoCandel

Detail Dropout Regularization18

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 14: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

Backward Propagation

partEpartwi = partEparty partypartnet partnetpartwi

= part(error(y))party part(activation(net))partnet xi

Backprop Compute partEpartwi via chain rule going backwards

wi

net = sumi(wixi) + b

xiE = error(y)

y = activation(net)

How to compute partEpartwi for wi ltmdash wi - rate partEpartwi

Naive For every i evaluate E twice at (w1hellipwiplusmn∆hellipwN)hellip Slow

14

H2O Deep Learning ArnoCandel

H2O Deep Learning Architecture

K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4

map each node trains a copy of the weights

and biases with (some or all of) its

local data with asynchronous FJ

threads

initial model weights and biases w

updated model w

H2O atomic in-memoryK-V store

reduce model averaging

average weights and biases from all nodes

speedup is at least nodeslog(rows) arxiv12094129v3

Keep iterating over the data (ldquoepochsrdquo) score from time to time

Query amp display the model via

JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i

auto-tuned (default) or user-specified number of points per MapReduce iteration

15

H2O Deep Learning ArnoCandel

Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history

Grid Search and Checkpointing Run a grid search to scan many hyper-parameters then continue training the most promising model(s)

RegularizationL1 penalizes non-zero weights L2 penalizes large weightsDropout randomly ignore certain inputs

16

ldquoSecretrdquo Sauce to Higher Accuracy

H2O Deep Learning ArnoCandel

Detail Adaptive Learning Rate

Compute moving average of ∆wi2 at time t for window length rho

E[∆wi2]t = rho E[∆wi2]t-1 + (1-rho) ∆wi2

Compute RMS of ∆wi at time t with smoothing epsilon

RMS[∆wi]t = sqrt( E[∆wi2]t + epsilon )

Adaptive annealing progress Gradient-dependent learning rate moving window prevents ldquofreezingrdquo (unlike ADAGRAD no window)

Adaptive acceleration momentum accumulate previous weight updates but over a window of time

RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =

Do the same for partEpartwi then obtain per-weight learning rate

cf ADADELTA paper

17

H2O Deep Learning ArnoCandel

Detail Dropout Regularization18

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 15: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

H2O Deep Learning Architecture

K-V

K-V

HTTPD

HTTPD

nodesJVMs sync

threads async

communication

w

w w

w w w w

w1 w3 w2w4

w2+w4w1+w3

w = (w1+w2+w3+w4)4

map each node trains a copy of the weights

and biases with (some or all of) its

local data with asynchronous FJ

threads

initial model weights and biases w

updated model w

H2O atomic in-memoryK-V store

reduce model averaging

average weights and biases from all nodes

speedup is at least nodeslog(rows) arxiv12094129v3

Keep iterating over the data (ldquoepochsrdquo) score from time to time

Query amp display the model via

JSON WWW

2

2 431

1

1

1

43 2

1 2

1

i

auto-tuned (default) or user-specified number of points per MapReduce iteration

15

H2O Deep Learning ArnoCandel

Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history

Grid Search and Checkpointing Run a grid search to scan many hyper-parameters then continue training the most promising model(s)

RegularizationL1 penalizes non-zero weights L2 penalizes large weightsDropout randomly ignore certain inputs

16

ldquoSecretrdquo Sauce to Higher Accuracy

H2O Deep Learning ArnoCandel

Detail Adaptive Learning Rate

Compute moving average of ∆wi2 at time t for window length rho

E[∆wi2]t = rho E[∆wi2]t-1 + (1-rho) ∆wi2

Compute RMS of ∆wi at time t with smoothing epsilon

RMS[∆wi]t = sqrt( E[∆wi2]t + epsilon )

Adaptive annealing progress Gradient-dependent learning rate moving window prevents ldquofreezingrdquo (unlike ADAGRAD no window)

Adaptive acceleration momentum accumulate previous weight updates but over a window of time

RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =

Do the same for partEpartwi then obtain per-weight learning rate

cf ADADELTA paper

17

H2O Deep Learning ArnoCandel

Detail Dropout Regularization18

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 16: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

Adaptive learning rate - ADADELTA (Google)Automatically set learning rate for each neuron based on its training history

Grid Search and Checkpointing Run a grid search to scan many hyper-parameters then continue training the most promising model(s)

RegularizationL1 penalizes non-zero weights L2 penalizes large weightsDropout randomly ignore certain inputs

16

ldquoSecretrdquo Sauce to Higher Accuracy

H2O Deep Learning ArnoCandel

Detail Adaptive Learning Rate

Compute moving average of ∆wi2 at time t for window length rho

E[∆wi2]t = rho E[∆wi2]t-1 + (1-rho) ∆wi2

Compute RMS of ∆wi at time t with smoothing epsilon

RMS[∆wi]t = sqrt( E[∆wi2]t + epsilon )

Adaptive annealing progress Gradient-dependent learning rate moving window prevents ldquofreezingrdquo (unlike ADAGRAD no window)

Adaptive acceleration momentum accumulate previous weight updates but over a window of time

RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =

Do the same for partEpartwi then obtain per-weight learning rate

cf ADADELTA paper

17

H2O Deep Learning ArnoCandel

Detail Dropout Regularization18

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 17: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

Detail Adaptive Learning Rate

Compute moving average of ∆wi2 at time t for window length rho

E[∆wi2]t = rho E[∆wi2]t-1 + (1-rho) ∆wi2

Compute RMS of ∆wi at time t with smoothing epsilon

RMS[∆wi]t = sqrt( E[∆wi2]t + epsilon )

Adaptive annealing progress Gradient-dependent learning rate moving window prevents ldquofreezingrdquo (unlike ADAGRAD no window)

Adaptive acceleration momentum accumulate previous weight updates but over a window of time

RMS[∆wi]t-1

RMS[partEpartwi]t

rate(wi t) =

Do the same for partEpartwi then obtain per-weight learning rate

cf ADADELTA paper

17

H2O Deep Learning ArnoCandel

Detail Dropout Regularization18

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 18: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

Detail Dropout Regularization18

Training For each hidden neuron for each training sample for each iteration ignore (zero out) a different random fraction p of input activations

age

income

employment

married

singleX

X

X

Testing Use all activations but reduce them by a factor p

(to ldquosimulaterdquo the missing activations during training)

cf Geoff Hintons paper

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 19: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel 19

Application Higgs Boson Classification

Higgsvs

Background

Large Hadron Collider Largest experiment of mankind $13+ billion 168 miles long 120 MegaWatts -456F 1PBday etc Higgs boson discovery (July rsquo12) led to 2013 Nobel prize

httparxivorgpdf14024735v2pdf

Images courtesy CERN LHC

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features (physics formulae) Train 10M rows Valid 500k Test 500k rows

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 20: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel 20

Live Demo Letrsquos see what Deep Learning can do with low-level features alone

Former baseline for AUC 0733 and 0816

H2O Algorithm low-level H2O AUC all features H2O AUC

Generalized Linear Model 0596 0684

Random Forest 0764 0840

Gradient Boosted Trees 0753 0839

Neural Net 1 hidden layer 0760 0830

H2O Deep Learning

add derived

features

Higgs Derived features are important

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 21: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

MNIST digits classification

Standing world record Without distortions or convolutions the best-ever published error rate on test set 083 (Microsoft)

21

Train 60000 rows 784 integer columns 10 classes Test 10000 rows 784 integer columns 10 classes

MNIST = Digitized handwritten digits database (Yann LeCun)

Data 28x28=784 pixels with (gray-scale) values in 0hellip255

Yann LeCun ldquoYet another advice dont get fooled by people who claim to have a solution to Artificial General Intelligence Ask them what error rate they get on MNIST or ImageNetrdquo

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 22: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel 22

H2O Deep Learning beats MNIST

Standard 60k10k data No distortions

No convolutions No unsupervised training

No ensemble

10 hours on 10 16-core nodes

World-record 083 test set error

httplearnh2oaicontenthands-on_trainingdeep_learninghtml

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 23: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

POJO Model Export for Production Scoring

23

Plain old Java code is auto-generated to take your H2O Deep Learning models into production

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 24: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

Parallel Scalability (for 64 epochs on MNIST with ldquo087rdquo parameters)

24

Speedup

000

1000

2000

3000

4000

1 2 4 8 16 32 63

H2O Nodes

(4 cores per node 1 epoch per node per MapReduce)

27 mins

Training Time

0

25

50

75

100

1 2 4 8 16 32 63

H2O Nodes

in minutes

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 25: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

Goal Predict the item from sellerrsquos text description

25

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

ldquoVintage 18KT gold Rolex 2 Tone in great conditionrdquo

Data Bag of words vector 0010000010001hellip0

vintagegold condition

Text Classification

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 26: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

Out-Of-The-Box 116 test set error after 10 epochs Predicts the correct class (out of 143) 884 of the time

26

Note 2 No tuning was done(results are for illustration only)

Train 578361 rows 8647 cols 467 classes Test 64263 rows 8647 cols 143 classes

Note 1 H2O columnar-compressed in-memory store only needs 60 MB to store 5 billion values (dense CSV needs 18 GB)

Text Classification

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 27: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

MNIST Unsupervised Anomaly Detection with Deep Learning (Autoencoder)

27

The good The bad The ugly

smallmedianlarge reconstruction errorhttplearnh2oaicontenthands-on_traininganomaly_detectionhtml

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 28: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel 28

How well did Deep Learning do

Letrsquos see how H2O did in the past 10 minutes

Higgs Live Demo (Continued)

ltyour guessgt

reference paper results

Any guesses for AUC on low-level features AUC=076 was the best for RFGBMNN (H2O)

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 29: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

H2O Steam Scoring Platform

29

Higgs Dataset Demo on 10-node cluster Letrsquos score all our H2O models and compare them

httpserverportsteamindexhtml

Live Demo

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 30: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel 30

Live Demo on 10-node cluster lt10 minutes runtime for all H2O algos Better than LHC baseline of AUC=073

Scoring Higgs Models in H2O Steam

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 31: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel 31

AlgorithmPaperrsquosl-l AUC

low-level H2O AUC

all featuresH2O AUC

Parameters (not heavily tuned) H2O running on 10 nodes

Generalized Linear Model - 0596 0684 default binomial

Random Forest - 0764 0840 50 trees max depth 50

Gradient Boosted Trees 073 0753 0839 50 trees max depth 15

Neural Net 1 layer 0733 0760 0830 1x300 Rectifier 100 epochs

Deep Learning 3 hidden layers 0836 0850 - 3x1000 Rectifier L2=1e-5 40 epochs

Deep Learning 4 hidden layers 0868 0869 - 4x500 Rectifier L1=L2=1e-5 300 epochs

Deep Learning 5 hidden layers 0880 0871 - 5x500 Rectifier L1=L2=1e-5

Deep Learning on low-level features alone beats everything else Prelim H2O results compare well with paperrsquos results (TMVA amp Theano)

Higgs Particle Detection with H2O

Nature paper httparxivorgpdf14024735v2pdf

HIGGS UCI Dataset 21 low-level features AND 7 high-level derived features Train 10M rows Test 500k rows

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 32: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel 32

H2O Deep Learning BookletR Vignette Explains methods amp parameters

Even more tips amp tricks in our other presentations and at H2O World next week

Grab your copy today or download at httptcokWzyFMGJ2S

or view as GitBook

httph2ogitbooksioh2o-deep-learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 33: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

You can participate33

- Images Convolutional amp Pooling Layers PUB-644

- Sequences Recurrent Neural Networks PUB-1052

- Faster Training GPGPU support PUB-1013

- Pre-Training Stacked Auto-Encoders PUB-1014

- Ensembles PUB-1072

- Use H2O at Kaggle Challenges

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 34: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

H2O Kaggle Starter Scripts34

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 35: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

Re-Live H2O World35

httph2oaih2o-world httplearnh2oai Watch the Videos

Day 2 bull Speakers from Academia amp Industry bull Trevor Hastie (ML) bull John Chambers (S R) bull Josh Bloch (Java API) bull Many use cases from customers bull 3 Top Kaggle Contestants (Top 10)

bull 3 Panel discussions

Day 1 bull Hands-On Training bull Supervised bull Unsupervised bull Advanced Topics bull Markting Usecase

bull Product Demos bull Hacker-Fest with Cliff Click (CTO Hotspot)

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you

Page 36: San Francisco Hadoop User Group Meetup Deep Learning

H2O Deep Learning ArnoCandel

Key Take-AwaysH2O is an open source predictive analytics platform for data scientists and business analysts who need scalable and fast machine learning

H2O Deep Learning is ready to take your advanced analytics to the next level - Try it on your data

Join our Community and Meetups httpsgithubcomh2oai h2ostream community forum wwwh2oai h2oai

36

Thank you