Apache Zeppelin, the missing component for the Spark eco-system

30
@doanduyhai Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

Transcript of Apache Zeppelin, the missing component for the Spark eco-system

Page 1: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Apache Zeppelin, the missing component for your BigData ecosystem DuyHai DOAN, Cassandra Technical Advocate

Page 2: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Who Am I ?!Duy Hai DOAN Cassandra technical advocate •  talks, meetups, confs •  open-source devs (Achilles, …) •  OSS Cassandra point of contact

[email protected] ☞ @doanduyhai

2

Page 3: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Datastax!•  Founded in April 2010

•  We contribute a lot to Apache Cassandra™

•  400+ customers (25 of the Fortune 100), 400+ employees

•  Headquarter in San Francisco Bay area

•  EU headquarter in London, offices in France and Germany

•  Datastax Enterprise = OSS Cassandra + extra features

3

Page 4: Apache Zeppelin, the missing component for the Spark eco-system

What is Apache Zeppelin ?!

Presentation!Architecture!

!

Page 5: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Zeppelin Presentation!

5

Page 6: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Zeppelin Architecture!

Zeppelin Server

Zeppelin Engine

6

REST

Web

Sock

et

Spark Interpreter Group

Spark SparkSQL

Zeppelin Interpreter

Factory

Tajo Interpreter

Flink Interpreter

Cassandra Interpreter

JVM

JVM

JVM

JVM

JVM

Page 7: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

What does Zeppelin provide ?!Front-end & display system for free Generic back-end with REST APIs & WebSocket Pluggable interpreters system Task scheduler (à la CRON)

7

Page 8: Apache Zeppelin, the missing component for the Spark eco-system

Zeppelin UI Layout!

Notebook!Paragraph!

UI elements!

Page 9: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Demo https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Page 10: Apache Zeppelin, the missing component for the Spark eco-system

Zeppelin Display System!!

Raw, Table, HTML!Available graphs!

View modes!Dynamic form!Iframe export!

Page 11: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Demo https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Page 12: Apache Zeppelin, the missing component for the Spark eco-system

Interpreter system !!

Core interpreters!Third-parties interpreters!

Interpreters conf & usage!

Page 13: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Interpreter processing lifecycle!①  Receive input commands/data •  as raw text

•  from form data

②  Process the input commands/data by the external back-end ③  Format the response using Zeppelin display system ④  Send response back to the Zeppelin engine

13

Page 14: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Core interpreters !!•  Spark (Spark core, SparkSQL/DataFrame, PySpark) •  Spark core = default (or %spark)

•  SparkSQL = %sql

•  Shell (%sh)

•  Markdown (%md) !

•  AngularJS (%angular)

14

Page 15: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Third-parties interpreters!•  Hive •  Phoenix •  Tajo •  Flink •  Ignite •  Lens •  Cassandra •  Geode •  PostgreSQL •  Kylin

15

Page 16: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Interpreter conf & usage https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Page 17: Apache Zeppelin, the missing component for the Spark eco-system

Writing An Interpreter !!

How To!Simple interpreter example (AsciiDoc)!

Complex interpreter example (Cassandra)!

Page 18: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Steps to write your own interpreter!

•  Create a class that extends Interpreter base class

•  Register it in a static block

•  Optionnally define default config params

18

static { Interpreter.register("MyInterpreterName", MyClassName.class.getName());

}

static { Interpreter.register("MyInterpreterName", MyClassName.class.getName(), new InterpreterPropertyBuilder() .add("property1", "default value", "Description of property1").build());

}

Page 19: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

To register your interpreter as default !

•  Edit the enum ZeppelinConfiguration.ConfVars

•  Add your interpreter FQCN in the property ZEPPELIN_INTERPRETERS

19

Page 20: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

To register your interpreter in config files!

•  Create conf/zeppelin-site.xml from conf/zeppelin-site.xml.template

•  Add your interpreter FQCN in the property zeppelin.interpreters

20

<property> <name>zeppelin.interpreters</name> <value>org.apache.zeppelin.spark.SparkInterpreter,org.apache.zeppelin.spark.PySparkInterpreter, org.apache.zeppelin.spark.SparkSqlInterpreter,org.apache.zeppelin.spark.DepInterpreter, org.apache.zeppelin.markdown.Markdown,org.apache.zeppelin.shell.ShellInterpreter, org.apache.zeppelin.hive.HiveInterpreter,com.me.MyNewInterpreter </value>

</property>

Page 21: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Simple AsciiDoc Interpreter!

21

Zeppelin Server

AsciiDoc Interpreter

JVM Zeppelin Engine

Raw Text Block

Raw Text Block

Converted To

HTML

HTML Output

① ②

③ ④

JVM

Page 22: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Simple interpreter (AsciiDoc) https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Page 23: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Cassandra Interpreter Architecture!

23

Cassandra Interpreter

JVM

Display Results as

HTML

① ②

Zeppelin Server

JVM

Raw Text Block

Raw Text Block

Cassandra Cassandra

Java Driver

Async CQL statements

④ Render HTML

Page 24: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Cassandra Interpreter Commands!

24

Native CQL statements SELECT * FROM …; INSERT INTO …; …

Schema commands DESCRIBE TABLE …; DESCRIBE KEYSPACE …; …

Prepared statements Commands

@prepare …; @bind …; @remove_prepared …;

Help command HELP;

Options Commands @consistency …; @retryPolicy …; @fetchSize …;

Page 25: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Complex interpreter (Cassandra) https://github.com/doanduyhai/incubator-zeppelin/tree/ZeppelinPresentation

Page 26: Apache Zeppelin, the missing component for the Spark eco-system

Zeppelin future!!

Roadmap!

Page 27: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Roadmap & future!•  More graph options (Map viz ZEPPELIN-157)

•  Helium project, packaging Zeppelin view, logic (code) & resource into Applications

•  Interpreters packaging re-design •  ship & compile core interpreters only

•  third-parties interpreters can be pulled from repository

•  which interpreter is core ? Who will maintain ? Community….

•  Integrate security (Apache Shiro, ZEPPELIN-53 )

27

Page 28: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Roadmap & future!•  Out of incubation state to become 1st class Apache project

28

Page 29: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Q & R

! " !

Page 30: Apache Zeppelin, the missing component for the Spark eco-system

@doanduyhai

Thank You @doanduyhai

[email protected]

http://zeppelin.incubator.apache.org/