1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Enabling Apache Zeppelin* and Spark* for Data Science in the Enterprise
Bikas Saha@bikassaha
*Apache Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive,HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper,Oozie, Zeppelin and the Hadoop elephant logo are trademarks of theApache Software Foundation.
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaMaking Big Data Science easy to approach
What are the current issues for the enterprise
Making Apache Zeppelin enterprise ready
Future Roadmap
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache Zeppelin
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin makes Big Data Science Easy to Approach
Zero install – Just connect via a web browser and ready to run
Support for multiple execution platforms (Apache Spark, JDBC, Hive…)
Support for multiple languages (Scala, SQL, Python…)
Support for built-in visualizations
Support for reporting
Support for sharing and collaborative work
Does NOT have machine learning built-in – that’s where Apache Spark comes in (or your favorite SQL engine Apache Flink/Drill/Hive… and 30+ others)
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Zeppelin for Sharing
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaMaking Big Data Science easy to approach
What are the current issues for the enterprise
Making Apache Zeppelin enterprise ready
Future Roadmap
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Current Apache Zeppelin and Spark integration
ZeppelinServer
SparkDriver
U
s
e
r SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architectural Issue with Secure Data Access
ZeppelinServer
SparkDriver
U
s
e
r
1
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
Zeppelin ServerUser
H
D
F
S
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architectural Issues with Multi-Tenancy – Fault Tolerance
ZeppelinServer
SparkDriver
U
s
e
r
1
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
U
s
e
r
2
User 1 failure affects User 2
Heavy-weight Spark drivers
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Architectural Issues with Multi-Tenancy – Privacy
ZeppelinServer
SparkDriver
U
s
e
r
1
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
U
s
e
r
2
User 1 can
access User 2Data
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaMaking Big Data Science easy to approach
What are the current issues for the enterprise
Enterprise Ready Big Data Science
Future Roadmap
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Livy Server as a Session Management Service
LivyServer
Remote Spark Driver
SessionRemote Context
Interactive REST API
BatchREST API
Standard Spark Batch Job
SparkExecutor
SparkExecutor
SparkExecutor
SparkExecutor
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Secure Data Access - Solved
ZeppelinServer
LivyInterpreter
U
s
e
r
SparkExecutor
SparkExecutor
LivyServer
Remote Spark Driver
Session
Remote Context
User
HDFS
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Multi Tenancy - Solved
ZeppelinServer
LivyInterpreter
LivyServer
Session 1
U
s
e
r
1
U
s
e
r
2
LivyInterpreter
Session 2
Remote Spark Driver
Remote Context
SparkExecutor
Remote Spark Driver
Remote Context
SparkExecutor
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
AgendaMaking Big Data Science easy to approach
What are the current issues for the enterprise
Making Apache Zeppelin enterprise ready
Future Roadmap
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Near Term Improvements
Session Management
Debuggability
Unified session for all languages
Better visualizations for Machine Learning
Support for Spark 2.0
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Long Term Improvements
Controlled sharing of sessions for collaboration
Data exploration and browsing with metadata
Taking the model from training to production
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank You
Top Related