H2O World - PySparkling Water - Nidhi Mehta

10
Hands On : PySparkling Water - By Nidhi Mehta

Transcript of H2O World - PySparkling Water - Nidhi Mehta

Page 1: H2O World - PySparkling Water - Nidhi Mehta

Hands On : PySparkling Water

- By Nidhi Mehta

Page 2: H2O World - PySparkling Water - Nidhi Mehta

What is PySparkling Water

PySparkling Water = Python + Spark + H2O

Sparkling Water Python +

Page 3: H2O World - PySparkling Water - Nidhi Mehta

Py4J

H2O Context

Spark Context

H2O Python

h2o.init ( ip, port )

Driver Python

Cluster Manager

Executor

H2O

Executor

H2O

H2O Rest API

Master Workers

PySparkling Architecture

Page 4: H2O World - PySparkling Water - Nidhi Mehta

Aim: Build a model to predict Arrest for Chicago crime dataset

● Import Chicago Crime Dataset● Combine Crime data with Census and Weather

data.● Build a model to predict whether an arrest was

made● Predict on a test dataset

Demo Workflow

Page 5: H2O World - PySparkling Water - Nidhi Mehta

- Install Spark-1.5.1

- Install and Build Sparkling Water-1.5.6

( ./gradlew build -x check )

- Install H2O-3.6.0.3

- Install H2O-python

( sudo pip install h2o-3.6.0.3-py2.py3-none-any.whl )

Pre Requisites to run the demo

Page 6: H2O World - PySparkling Water - Nidhi Mehta

1)

Set spark environment by specifying SPARK_HOME and Master

export SPARK_HOME =Path_to_Spark_dir

export MASTER ='local-cluster[2,8,6040]'

2)

- To run from Python notebook-

IPYTHON_OPTS="notebook" Path_to_Sparkling_dir/bin/pysparkling

- To run from regular Python shell

Path_to_Sparkling_dir/bin/pysparkling

Command to Start/Access PySparking Water Cluster

Page 7: H2O World - PySparkling Water - Nidhi Mehta

Let's Run the Demo!

Page 8: H2O World - PySparkling Water - Nidhi Mehta

Why use PySparkling

● Automatic Parallelization and less lines of code

● Much Faster on big data - uses H2O's rest API calls to connect to H2O Cluster

Page 9: H2O World - PySparkling Water - Nidhi Mehta

Thank You

Page 10: H2O World - PySparkling Water - Nidhi Mehta

What do these stickers mean?

I have Sparkling Water Installed

I have Python installed

I have H2O installed

I have the H2O World data sets

Pick up stickers or get install help at the information booth