Who Am I?Background
• PhD in CS from Charles University in Prague, Czech Republic
• Postdoc at Purdue University experimenting with algos for large-scale computation
• Now at H2O.ai Experience with domain-specific languages,
distributed system, software engineering, and big data.
H2O.ai
H2O team
Sri Ambati Cliff ClickCo-F
ound
ers
Stephen Boyd
Rob Tibshirani
TrevorHastie
Scie
ntifi
cAd
viso
ryCo
unci
l
H2OOpen-Source In-Memory Data Science Platform
• Highly optimized Java code (in-house) • Distributed in-memory K-V store and map/
reduce computation framework • Data parser (HDFS, S3, NFS, HTTP, local
drives, etc.) • Read/write access to distributed data
frames (R/Pandas-style) • ML algos - Deep Learning, GBM, DRF,
GLM, GLRM, K-Means, PCA, CoxPH, Ensembles
• REST API: clients Interactive UI/R/Python
Sparkling Water
Sparkling WaterProvides
• Transparent integration of H2O into Spark ecosystem
• Use H2O Frames and algorithms with Spark API
Excels in existing Spark workflows requiring advanced Machine Learning algorithms
Where to use Sparkling Water?
Data SourceM
odel
build
ing
Modelling
Deep Learning, GBMDRF, GLM, GLRM
K-Means, PCACoxPH, Ensembles
Prediction processingData munging
Where to use Sparkling Water?
Data Source
Data
par
sing
mun
ging
ModellingData load/munging/
exploration
Load and parsedata directly into
H2OFrame
Ad hocdata
transformation
Where to use Sparkling Water?
DataSourceO
ff-lin
e m
odel
train
ing
Stre
ampr
oces
sing
Data Stream
Data munging
Model prediction
Deploy the model
Export modelin a binary format
or as code
Modelling
Cluster manager
Worker node
Spark executor
Scala/Py main program
Driver node
H2OContext
SparkContext
Worker node
Spark executor
Worker node
Spark executor
H2O
Ser
vice
sH
2O S
ervi
ces
Data Source
Spar
k Ex
ecut
orSp
ark
Exec
utor
Spar
k Ex
ecut
or
Spark Cluster
DataFrame
H2O
Ser
vice
s
H2OFrame
Data Source
h2oContext.asDataFrame
h2oContext.asH2OFrame
Key Points to RememberSparkling Water integrates H2O to Spark
• Enables using advanced machine learning algorithms inside Spark workflows
• Offers eager computation model,mutable data structure H2OFrame
Top Related