Zeppelin meetup 2016 madrid

21
Advanced features of Apache Zeppelin http://zeppelin.apache.org

Transcript of Zeppelin meetup 2016 madrid

Page 1: Zeppelin meetup 2016 madrid

Advanced features of Apache Zeppelinhttp://zeppelin.apache.org

Page 2: Zeppelin meetup 2016 madrid

Jongyoul Lee

PMC of Apache Zeppelin from Sep. 2015.

Software Development Engineer at NFLabs

Page 3: Zeppelin meetup 2016 madrid

Advanced?• lium

• A new extension for visualization

• Multi-users features

• Users & Permissions

• Per user/Per note & Shared/Scoped/Isolated

• Futures

• Impersonation & Personalized mode

• Scalability & Reliability

He2

Page 4: Zeppelin meetup 2016 madrid

liumHe2

Page 5: Zeppelin meetup 2016 madrid

Zeppelin

Visualizations : 6 Built-in visualizations comes with pivot

Table Bar Pie Area Line Scatter

Free to draw any customized visualizations inside of notebook

Page 6: Zeppelin meetup 2016 madrid

He liumHe2

Interpreter Notebook StorageSp

ark

Flin

k

Geo

de

JDBC …

File

Sys

tem

Amaz

on S

3

Git …

Application

Visu

aliz

atio

ns

Map

Wor

dClo

ud

Resource PoolSparkContext Flink Environment JDBC connection …

Ana

lytic

s

… …

User object

Extend pluggable visualization to pluggable analytics application

Working in progress to make visualization pluggable

Page 7: Zeppelin meetup 2016 madrid

Users and Permissions

Page 8: Zeppelin meetup 2016 madrid

• Company complains

• Why security works …

• Why authentication works …

• Why Zeppelin stores my password as plain …

• Why two user use same Spark …

• Why I wait while other run somethings

& Enterprise

Page 9: Zeppelin meetup 2016 madrid

Auhentication : Integrated with Apache Shiro

Contributions

- PAM - ActiveDirectory - Jdbc - Jndi - Ldap - Properties

Zeppelin

Page 10: Zeppelin meetup 2016 madrid

Notebook Authorization : Owners, Writers, Readers per Note

Zeppelin

Page 11: Zeppelin meetup 2016 madrid

Multi-tenancyPer user/Per note & Shared/Scoped/Isolated

Page 12: Zeppelin meetup 2016 madrid

SHARED ISOLATED SCOPED

PROCESS 1 N 1

THREADS 1 1 N

Multi-tenancyZeppelin

Page 13: Zeppelin meetup 2016 madrid

ZeppelinServer

SparkInterpreter

Run P1 on NoteA

Run SparkInterpreter for P1

User1

User2

Run P2 on NoteB Run SparkInterpreter for P2

SharedZeppelin

Page 14: Zeppelin meetup 2016 madrid

• Originally implemented • Pros

• Simple structure • Predictable behavior

• Cons • All resources shared • Interference among users

SharedZeppelin

Page 15: Zeppelin meetup 2016 madrid

ZeppelinServer

SparkInterpreter

Run P1 on NoteA

Run SparkInterpreter for P1

User1

User2

Run P2 on NoteB

Run SparkInterpreter for P2 SparkInterpreter

IsolatedZeppelin

Page 16: Zeppelin meetup 2016 madrid

• Pros • No pending • No resources shared

• Cons • Lots of memory • Inefficiency of using memory • Limited by resources

IsolatedZeppelin

Page 17: Zeppelin meetup 2016 madrid

ZeppelinServer

JDBCInterpreter

Run P2 on NoteA

Run SparkInterpreter for P2

User1

User2

Run P3 on NoteB Run SparkInterpreter for P3

Scoped

JDBCInstance User1

JDBCInstance User2

Zeppelin

Page 18: Zeppelin meetup 2016 madrid

• Pros • Less memory • Some resources Isolated

• Cons • Some resources shared • Big single process

ScopedZeppelin

Page 19: Zeppelin meetup 2016 madrid

SHARED ISOLATED SCOPED

PROCESS 1 N 1

THREADS 1 1 N

Multi-tenancyZeppelin

Page 20: Zeppelin meetup 2016 madrid

• ~ 0.7.0

• Impersonation of JDBC/Spark Interpreter

• Personalized mode

• 0.7.0 ~

• Scalability & Reliability

• …

& Futures

Page 21: Zeppelin meetup 2016 madrid

Thank you

Jongyoul Lee [email protected]

@madeng