Spark Summit Europe: Building a REST Job Server for interactive Spark as a service
-
Author
gethue -
Category
Data & Analytics
-
view
4.272 -
download
1
Embed Size (px)
Transcript of Spark Summit Europe: Building a REST Job Server for interactive Spark as a service
-
BUILDING A REST JOB SERVER FOR INTERACTIVE SPARK AS A SERVICERomain Rigaux - Cloudera Erick Tryzelaar - Cloudera
-
WHY?
-
NOTEBOOKS
EASYACCESSFROMANYWHERE
SHARESPARKCONTEXTSANDRDDs
BUILDAPPS
SPARKMAGIC
WHY SPARKAS A SERVICE?
-
MARRIEDWITHFULLHADOOPECOSYSTEM
WHY SPARKIN HUE?
-
HISTORYV1: OOZIE
Itworks
Codesnippet
THE GOOD
SubmitthroughOozie
Shellac:on
VerySlow
Batch
THE BAD
workflow.xmlsnippet.py
stdout
-
HISTORYV2: SPARK IGNITER
ItworksbeAer
THE GOOD
CompilerJar
Batchonly,noshell
NoPython,R
Security
Singlepointoffailure
THE BAD Compile
Implement
Upload
jsonoutput
Batch
Scala
jar
Ooyala
-
HISTORYV3: NOTEBOOK
Likespark-submit/sparkshells
Scala/Python/Rshells
Jar/PythonbatchJobs
NotebookUI
YARN
THE GOOD
Beta?
THE BAD
Livy
codesnippet batch
-
GENERAL ARCHITECTURE
Spark
Spark
Spark
Livy YARN
!"
# $
-
Livy
Spark
Spark
Spark
YARN
API
!"
# $
GENERAL ARCHITECTURE
-
LIVY SPARK SERVER
-
LIVYSPARK SERVER
RESTWebserverinScalaforSparksubmissions
Interac:veShellSessionsorBatchJobs
Backends:Scala,Java,Python,R
NodependencyonHue
OpenSource:hAps://github.com/cloudera/hue/tree/master/apps/spark/java
Readaboutit:hAp://gethue.com/spark/
https://github.com/cloudera/hue/tree/master/apps/spark/javahttp://gethue.com/spark/
-
ARCHITECTURE
Standardwebservice:wrapperaroundspark-submit/Sparkshells YARNmode,Sparkdriversruninsidethecluster(supportscrashes) Noneedtoinheritanyinterfaceorcompilecode Extendedtoworkwithadditionalbackends
-
LIVY WEB SERVERARCHITECTURE
LOCALDEVMODE YARNMODE
-
LOCAL MODE
LivyServer
Scalatra
SessionManager
Session
SparkContextSpark
Client
SparkClient
SparkInterpreter
-
LOCAL MODE
LivyServer
Scalatra
SessionManager
Session
SparkClient
SparkClient
SparkContext
SparkInterpreter
-
LOCAL MODE
SparkClient
1
LivyServer
Scalatra
SessionManager
Session
SparkClient
SparkContext
SparkInterpreter
-
LOCAL MODE
SparkClient
1
2
LivyServer
Scalatra
SessionManager
Session
SparkClient
SparkContext
SparkInterpreter
-
LOCAL MODE
SparkClient
SparkInterpreter
1
2
LivyServer
Scalatra
SessionManager
Session
SparkClient
SparkContext
3
-
LOCAL MODE
SparkClient
1
2
LivyServer
Scalatra
SessionManager
Session
SparkClient
SparkContext
3
4 SparkInterpreter
-
LOCAL MODE
SparkClient
1
2
LivyServer
Scalatra
SessionManager
Session
SparkClient
SparkContext
3
4
5
SparkInterpreter
-
YARN-CLUSTERMODE
PRODUCTION SCALABLE
-
YARNMaster
SparkClient
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
LivyServer
Scalatra
SessionManager
Session
YARN-CLUSTERMODE
SparkInterpreter
-
LivyServer
YARNMaster
Scalatra
SparkClient
SessionManager
Session
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1
YARN-CLUSTERMODE
SparkInterpreter
-
YARNMaster
SparkClient
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1
2
LivyServer
Scalatra
SessionManager
Session
YARN-CLUSTERMODE
SparkInterpreter
-
YARNMaster
SparkClient
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1
2
3
LivyServer
Scalatra
SessionManager
Session
YARN-CLUSTERMODE
SparkInterpreter
-
YARNMaster
SparkClient
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1
2
3
4LivyServer
Scalatra
SessionManager
Session
YARN-CLUSTERMODE
SparkInterpreter
-
YARNMaster
SparkClient
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1
2
3
4
5
LivyServer
Scalatra
SessionManager
Session
YARN-CLUSTERMODE
SparkInterpreter
-
YARNMaster
SparkClient
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1
2
3
4
5
6
LivyServer
Scalatra
SessionManager
Session
YARN-CLUSTERMODE
SparkInterpreter
-
YARNMaster
SparkClient
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
1 7
2
3
4
5
6
LivyServer
Scalatra
SessionManager
Session
YARN-CLUSTERMODE
SparkInterpreter
-
SESSION CREATION AND EXECUTION%curl-XPOSTlocalhost:8998/sessions\-d'{"kind":"spark"}'{"id":0,"kind":"spark","log":[...],"state":"idle"}
%curl-XPOSTlocalhost:8998/sessions/0/statements-d'{"code":"1+1"}'{"id":0,"output":{"data":{"text/plain":"res0:Int=2"},"execution_count":0,"status":"ok"},"state":"available"}
-
Jar
Py
Scala
Python
R
Livy
Spark
Spark
Spark
YARN
/batches
/sessions
BATCH OR INTERACTIVE
-
SHELL OR BATCH?YARNMaster
SparkClient
YARNNode
SparkInterpreter
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
LivyServer
Scalatra
SessionManager
Session
-
SHELLYARNMaster
SparkClient
YARNNode
pyspark
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
LivyServer
Scalatra
SessionManager
Session
-
BATCHYARNMaster
SparkClient
YARNNode
spark-submit
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
LivyServer
Scalatra
SessionManager
Session
-
LIVY INTERPRETERSScala,Python,R
-
REMEMBER?YARNMaster
SparkClient
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
LivyServer
Scalatra
SessionManager
Session
SparkInterpreter
-
INTERPRETERS
Pipestdin/stdouttoarunningshell
Executethecode/sendtoSparkworkers
Performmagicopera:ons Oneinterpreterperlanguage Swappablewithotherkernels(python,spark..)
Interpreter
>println(1+1)2
println(1+1)
2
-
LivyServer
INTERPRETER FLOW
Interpreter
-
LivyServer
>1+1
Interpreter
INTERPRETER FLOW
-
LivyServer
{code:1+1}
>1+1
Interpreter
INTERPRETER FLOW
-
LivyServer Interpreter
1+1{code:1+1}
>1+1
INTERPRETER FLOW
-
LivyServer Interpreter
1+1{code:1+1}
>1+1
Magic
INTERPRETER FLOW
-
LivyServer
2
Interpreter
1+1{code:1+1}
>1+1
Magic
INTERPRETER FLOW
-
{data:{application/json:2}}
LivyServer
2
Interpreter
1+1{code:1+1}
>1+1
Magic
INTERPRETER FLOW
-
{data:{application/json:2}}
LivyServer
2
Interpreter
1+1{code:1+1}
>1+1
2 Magic
INTERPRETER FLOW
-
INTERPRETER FLOW CHART
ReceivelinesSplitintoChunks
Sendoutputtoserver
Senderrortoserver
Success
ExecuteChunkMagic!
Chunksle[?
Magicchunk?
No
Yes
NoYes
Exampleofparsing
https://github.com/cloudera/hue/blob/577a0b6ed8ac845d3f3baa609f640d5937207194/apps/spark/java/livy-repl/src/test/scala/com/cloudera/hue/livy/repl/PythonInterpreterSpec.scala#L71
-
INTERPRETER MAGIC
table json plotting ...
-
NO MAGIC
>1+1
Interpreter
1+1
sparkIMain.interpret(1+1)
{"id":0,"output":{"application/json":2}}
-
[('',506610),('the',23407),('I',19540)...]
JSON MAGIC
>countssparkIMain.valueOfTerm(counts)
.toJson()
Interpreter
vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}
%jsoncounts
-
JSON MAGIC
>countssparkIMain.valueOfTerm(counts)
.toJson()
Interpreter
{"id":0,"output":{"application/json":[{"count":506610,"word":""},{"count":23407,"word":"the"},{"count":19540,"word":"I"},...]...}
vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}
%jsoncounts
-
[('',506610),('the',23407),('I',19540)...]
TABLE MAGIC
>counts
Interpreter
vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}
%tablecounts
sparkIMain.valueOfTerm(counts).guessHeaders().toList()
-
TABLE MAGIC
>countssparkIMain.valueOfTerm(counts)
.guessHeaders().toList()
Interpreter
vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}
%tablecounts"application/vnd.livy.table.v1+json":{"headers":[{"name":"count","type":"BIGINT_TYPE"},{"name":"name","type":"STRING_TYPE"}],"data":[[23407,"the"],[19540,"I"],[18358,"and"],...]}
-
PLOT MAGIC
>
sparkIMain.interpret(png(/tmp/plot.png)barplotdev.off())
Interpreter
...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))
-
PLOT MAGIC
>
sparkIMain.interpret(png(/tmp/plot.png)barplotdev.off())
Interpreter
...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))
-
PLOT MAGIC
>png(/tmp/..)>barplot>dev.off()
sparkIMain.interpret(png(/tmp/plot.png)barplotdev.off())
Interpreter
...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))
-
PLOT MAGIC
>png(/tmp/..)>barplot>dev.off()
sparkIMain.interpret(png(/tmp/plot.png)barplotdev.off())
File(/tmp/plot.png).read().toBase64()
Interpreter
...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))
-
PLOT MAGIC
>png(/tmp/..)>barplot>dev.off()
sparkIMain.interpret(png(/tmp/plot.png)barplotdev.off())
File(/tmp/plot.png).read().toBase64()
Interpreter
...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))
{"data":{"image/png":"iVBORw0KGgoAAAANSUhEUgAAAe"...}...}
-
PluggableBackends Livy'sSparkBackends Scala pyspark R
IPython/Jupytersupportcomingsoon
PLUGGABLE INTERPRETERS
-
Re-usingit GenericFrameworkforInterpreters
51Kernels
JUPYTER BACKEND
-
SPARK AS A SERVICE
-
REMEMBER AGAIN?YARNMaster
SparkClient
YARNNode
SparkContext
YARNNode
SparkWorker
YARNNode
SparkWorker
LivyServer
Scalatra
SessionManager
Session
SparkInterpreter
-
MULTI USERS
YARNNode
SparkContext
LivyServer
Scalatra
SessionManager
Session
SparkInterpreter YARN
Node
SparkContext
SparkInterpreter
YARNNode
SparkContext
SparkInterpreter
SparkClient
SparkClient
SparkClient
-
SHARED CONTEXTS?
YARNNode
SparkContext
LivyServer
Scalatra
SessionManager
Session
SparkInterpreter
SparkClient
SparkClient
SparkClient
-
SHARED RDD?
YARNNode
SparkContext
LivyServer
Scalatra
SessionManager
Session
SparkInterpreter
SparkClient
SparkClient
SparkClient
RDD
-
SHARED RDDS?
YARNNode
SparkContext
LivyServer
Scalatra
SessionManager
Session
SparkInterpreter
SparkClient
SparkClient
SparkClient
RDD
RDD
RDD
-
YARNNode
SparkContext
LivyServer
Scalatra
SessionManager
Session
SparkInterpreter
SparkClient
SparkClient
SparkClient
RDD
RDD
RDD
SECURE IT?
-
YARNNode
SparkContext
LivyServer
Scalatra
SessionManager
Session
SparkInterpreter
SparkClient
SparkClient
SparkClient
RDD
RDD
RDD
SECURE IT?
-
LivyServer
Spark
SparkClient
SparkClient
SparkClient
SPARK AS SERVICE
Spark
-
SHARING RDDS
-
PySparkshell
RDD
ShellPythonShell
-
PySparkshell
RDD
ShellPythonShell
-
PySparkshell
RDD
ShellPythonShell
r=sc.parallelize([])srdd=ShareableRdd(r)
-
PySparkshell
RDD{'ak':'Alaska'}
{'ca':'California'}
ShellPythonShell
r=sc.parallelize([])srdd=ShareableRdd(r)
-
PySparkshell
RDD{'ak':'Alaska'}
{'ca':'California'}
ShellPythonShell
curl-XPOST/sessions/0/statement{'code':srdd.get('ak')}
r=sc.parallelize([])srdd=ShareableRdd(r)
-
PySparkshell
RDD{'ak':'Alaska'}
{'ca':'California'}
ShellPythonShell
states=SharedRdd('host/sessions/0','srdd')states.get('ak')
r=sc.parallelize([])srdd=ShareableRdd(r)
curl-XPOST/sessions/0/statement{'code':srdd.get('ak')}
-
DEMO TIME
https://github.com/romainr/hadoop-tutorials-examples/tree/master/notebook/shared_rdd
https://github.com/romainr/hadoop-tutorials-examples/tree/master/notebook/shared_rddhttps://github.com/cloudera/hue/blob/577a0b6ed8ac845d3f3baa609f640d5937207194/apps/spark/java/livy-repl/src/test/scala/com/cloudera/hue/livy/repl/PythonInterpreterSpec.scala#L71
-
SSLSupport PersistentSessions Kerberos
SECURITY
-
SPARK MAGIC
FromMicrosopPythonmagicsforworkingwithremoteSparkclusters
OpenSource:hAps://github.com/jupyter-incubator/sparkmagic
https://github.com/jupyter-incubator/sparkmagic
-
FUTURE
Movetoextrepo? Security iPython/Jupyterbackendsandfileformat SharednamedRDD/contexts? Sharedata Sparkspecific,languagegeneric,both? LeverageHue4
https://issues.cloudera.org/browse/HUE-2990
https://issues.cloudera.org/browse/HUE-2990
-
OpenSource:hAps://github.com/cloudera/hue/tree/master/apps/spark/java
Readaboutit:hAp://gethue.com/spark/
Scala,Java,Python,R
TypeIntrospec:onforVisualiza:on
YARN-clusterorlocalmodes
Codesnippets/compiled
RESTAPI
Pluggablebackends
Magickeywords
Failureresilient
Security
LIVYSCHEAT SHEET
https://github.com/cloudera/hue/tree/master/apps/spark/javahttp://gethue.com/spark/
-
BEDANKT!
TWITTER
@gethue
USER GROUP
WEBSITE
hAp://gethue.com
LEARN
hAp://learn.gethue.com
http://twitter.com/gethuehttp://groups.google.com/a/cloudera.org/group/hue-userhttp://gethue.comhttp://learn.gethue.com