Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

of 83 /83
BUILDING A REST JOB SERVER FOR INTERACTIVE SPARK AS A SERVICE Romain Rigaux - Cloudera Erick Tryzelaar - Cloudera

Embed Size (px)

Transcript of Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

  • BUILDING A REST JOB SERVER FOR INTERACTIVE SPARK AS A SERVICERomain Rigaux - Cloudera Erick Tryzelaar - Cloudera

  • WHY?

  • NOTEBOOKS

    EASYACCESSFROMANYWHERE

    SHARESPARKCONTEXTSANDRDDs

    BUILDAPPS

    SPARKMAGIC

    WHY SPARKAS A SERVICE?

  • MARRIEDWITHFULLHADOOPECOSYSTEM

    WHY SPARKIN HUE?

  • HISTORYV1: OOZIE

    Itworks

    Codesnippet

    THE GOOD

    SubmitthroughOozie

    Shellac:on

    VerySlow

    Batch

    THE BAD

    workflow.xmlsnippet.py

    stdout

  • HISTORYV2: SPARK IGNITER

    ItworksbeAer

    THE GOOD

    CompilerJar

    Batchonly,noshell

    NoPython,R

    Security

    Singlepointoffailure

    THE BAD Compile

    Implement

    Upload

    jsonoutput

    Batch

    Scala

    jar

    Ooyala

  • HISTORYV3: NOTEBOOK

    Likespark-submit/sparkshells

    Scala/Python/Rshells

    Jar/PythonbatchJobs

    NotebookUI

    YARN

    THE GOOD

    Beta?

    THE BAD

    Livy

    codesnippet batch

  • GENERAL ARCHITECTURE

    Spark

    Spark

    Spark

    Livy YARN

    !"

    # $

  • Livy

    Spark

    Spark

    Spark

    YARN

    API

    !"

    # $

    GENERAL ARCHITECTURE

  • LIVY SPARK SERVER

  • LIVYSPARK SERVER

    RESTWebserverinScalaforSparksubmissions

    Interac:veShellSessionsorBatchJobs

    Backends:Scala,Java,Python,R

    NodependencyonHue

    OpenSource:hAps://github.com/cloudera/hue/tree/master/apps/spark/java

    Readaboutit:hAp://gethue.com/spark/

    https://github.com/cloudera/hue/tree/master/apps/spark/javahttp://gethue.com/spark/

  • ARCHITECTURE

    Standardwebservice:wrapperaroundspark-submit/Sparkshells YARNmode,Sparkdriversruninsidethecluster(supportscrashes) Noneedtoinheritanyinterfaceorcompilecode Extendedtoworkwithadditionalbackends

  • LIVY WEB SERVERARCHITECTURE

    LOCALDEVMODE YARNMODE

  • LOCAL MODE

    LivyServer

    Scalatra

    SessionManager

    Session

    SparkContextSpark

    Client

    SparkClient

    SparkInterpreter

  • LOCAL MODE

    LivyServer

    Scalatra

    SessionManager

    Session

    SparkClient

    SparkClient

    SparkContext

    SparkInterpreter

  • LOCAL MODE

    SparkClient

    1

    LivyServer

    Scalatra

    SessionManager

    Session

    SparkClient

    SparkContext

    SparkInterpreter

  • LOCAL MODE

    SparkClient

    1

    2

    LivyServer

    Scalatra

    SessionManager

    Session

    SparkClient

    SparkContext

    SparkInterpreter

  • LOCAL MODE

    SparkClient

    SparkInterpreter

    1

    2

    LivyServer

    Scalatra

    SessionManager

    Session

    SparkClient

    SparkContext

    3

  • LOCAL MODE

    SparkClient

    1

    2

    LivyServer

    Scalatra

    SessionManager

    Session

    SparkClient

    SparkContext

    3

    4 SparkInterpreter

  • LOCAL MODE

    SparkClient

    1

    2

    LivyServer

    Scalatra

    SessionManager

    Session

    SparkClient

    SparkContext

    3

    4

    5

    SparkInterpreter

  • YARN-CLUSTERMODE

    PRODUCTION SCALABLE

  • YARNMaster

    SparkClient

    YARNNode

    SparkContext

    YARNNode

    SparkWorker

    YARNNode

    SparkWorker

    LivyServer

    Scalatra

    SessionManager

    Session

    YARN-CLUSTERMODE

    SparkInterpreter

  • LivyServer

    YARNMaster

    Scalatra

    SparkClient

    SessionManager

    Session

    YARNNode

    SparkContext

    YARNNode

    SparkWorker

    YARNNode

    SparkWorker

    1

    YARN-CLUSTERMODE

    SparkInterpreter

  • YARNMaster

    SparkClient

    YARNNode

    SparkContext

    YARNNode

    SparkWorker

    YARNNode

    SparkWorker

    1

    2

    LivyServer

    Scalatra

    SessionManager

    Session

    YARN-CLUSTERMODE

    SparkInterpreter

  • YARNMaster

    SparkClient

    YARNNode

    SparkContext

    YARNNode

    SparkWorker

    YARNNode

    SparkWorker

    1

    2

    3

    LivyServer

    Scalatra

    SessionManager

    Session

    YARN-CLUSTERMODE

    SparkInterpreter

  • YARNMaster

    SparkClient

    YARNNode

    SparkContext

    YARNNode

    SparkWorker

    YARNNode

    SparkWorker

    1

    2

    3

    4LivyServer

    Scalatra

    SessionManager

    Session

    YARN-CLUSTERMODE

    SparkInterpreter

  • YARNMaster

    SparkClient

    YARNNode

    SparkContext

    YARNNode

    SparkWorker

    YARNNode

    SparkWorker

    1

    2

    3

    4

    5

    LivyServer

    Scalatra

    SessionManager

    Session

    YARN-CLUSTERMODE

    SparkInterpreter

  • YARNMaster

    SparkClient

    YARNNode

    SparkContext

    YARNNode

    SparkWorker

    YARNNode

    SparkWorker

    1

    2

    3

    4

    5

    6

    LivyServer

    Scalatra

    SessionManager

    Session

    YARN-CLUSTERMODE

    SparkInterpreter

  • YARNMaster

    SparkClient

    YARNNode

    SparkContext

    YARNNode

    SparkWorker

    YARNNode

    SparkWorker

    1 7

    2

    3

    4

    5

    6

    LivyServer

    Scalatra

    SessionManager

    Session

    YARN-CLUSTERMODE

    SparkInterpreter

  • SESSION CREATION AND EXECUTION%curl-XPOSTlocalhost:8998/sessions\-d'{"kind":"spark"}'{"id":0,"kind":"spark","log":[...],"state":"idle"}

    %curl-XPOSTlocalhost:8998/sessions/0/statements-d'{"code":"1+1"}'{"id":0,"output":{"data":{"text/plain":"res0:Int=2"},"execution_count":0,"status":"ok"},"state":"available"}

  • Jar

    Py

    Scala

    Python

    R

    Livy

    Spark

    Spark

    Spark

    YARN

    /batches

    /sessions

    BATCH OR INTERACTIVE

  • SHELL OR BATCH?YARNMaster

    SparkClient

    YARNNode

    SparkInterpreter

    SparkContext

    YARNNode

    SparkWorker

    YARNNode

    SparkWorker

    LivyServer

    Scalatra

    SessionManager

    Session

  • SHELLYARNMaster

    SparkClient

    YARNNode

    pyspark

    SparkContext

    YARNNode

    SparkWorker

    YARNNode

    SparkWorker

    LivyServer

    Scalatra

    SessionManager

    Session

  • BATCHYARNMaster

    SparkClient

    YARNNode

    spark-submit

    SparkContext

    YARNNode

    SparkWorker

    YARNNode

    SparkWorker

    LivyServer

    Scalatra

    SessionManager

    Session

  • LIVY INTERPRETERSScala,Python,R

  • REMEMBER?YARNMaster

    SparkClient

    YARNNode

    SparkContext

    YARNNode

    SparkWorker

    YARNNode

    SparkWorker

    LivyServer

    Scalatra

    SessionManager

    Session

    SparkInterpreter

  • INTERPRETERS

    Pipestdin/stdouttoarunningshell

    Executethecode/sendtoSparkworkers

    Performmagicopera:ons Oneinterpreterperlanguage Swappablewithotherkernels(python,spark..)

    Interpreter

    >println(1+1)2

    println(1+1)

    2

  • LivyServer

    INTERPRETER FLOW

    Interpreter

  • LivyServer

    >1+1

    Interpreter

    INTERPRETER FLOW

  • LivyServer

    {code:1+1}

    >1+1

    Interpreter

    INTERPRETER FLOW

  • LivyServer Interpreter

    1+1{code:1+1}

    >1+1

    INTERPRETER FLOW

  • LivyServer Interpreter

    1+1{code:1+1}

    >1+1

    Magic

    INTERPRETER FLOW

  • LivyServer

    2

    Interpreter

    1+1{code:1+1}

    >1+1

    Magic

    INTERPRETER FLOW

  • {data:{application/json:2}}

    LivyServer

    2

    Interpreter

    1+1{code:1+1}

    >1+1

    Magic

    INTERPRETER FLOW

  • {data:{application/json:2}}

    LivyServer

    2

    Interpreter

    1+1{code:1+1}

    >1+1

    2 Magic

    INTERPRETER FLOW

  • INTERPRETER FLOW CHART

    ReceivelinesSplitintoChunks

    Sendoutputtoserver

    Senderrortoserver

    Success

    ExecuteChunkMagic!

    Chunksle[?

    Magicchunk?

    No

    Yes

    NoYes

    Exampleofparsing

    https://github.com/cloudera/hue/blob/577a0b6ed8ac845d3f3baa609f640d5937207194/apps/spark/java/livy-repl/src/test/scala/com/cloudera/hue/livy/repl/PythonInterpreterSpec.scala#L71

  • INTERPRETER MAGIC

    table json plotting ...

  • NO MAGIC

    >1+1

    Interpreter

    1+1

    sparkIMain.interpret(1+1)

    {"id":0,"output":{"application/json":2}}

  • [('',506610),('the',23407),('I',19540)...]

    JSON MAGIC

    >countssparkIMain.valueOfTerm(counts)

    .toJson()

    Interpreter

    vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}

    %jsoncounts

  • JSON MAGIC

    >countssparkIMain.valueOfTerm(counts)

    .toJson()

    Interpreter

    {"id":0,"output":{"application/json":[{"count":506610,"word":""},{"count":23407,"word":"the"},{"count":19540,"word":"I"},...]...}

    vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}

    %jsoncounts

  • [('',506610),('the',23407),('I',19540)...]

    TABLE MAGIC

    >counts

    Interpreter

    vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}

    %tablecounts

    sparkIMain.valueOfTerm(counts).guessHeaders().toList()

  • TABLE MAGIC

    >countssparkIMain.valueOfTerm(counts)

    .guessHeaders().toList()

    Interpreter

    vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}

    %tablecounts"application/vnd.livy.table.v1+json":{"headers":[{"name":"count","type":"BIGINT_TYPE"},{"name":"name","type":"STRING_TYPE"}],"data":[[23407,"the"],[19540,"I"],[18358,"and"],...]}

  • PLOT MAGIC

    >

    sparkIMain.interpret(png(/tmp/plot.png)barplotdev.off())

    Interpreter

    ...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))

  • PLOT MAGIC

    >

    sparkIMain.interpret(png(/tmp/plot.png)barplotdev.off())

    Interpreter

    ...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))

  • PLOT MAGIC

    >png(/tmp/..)>barplot>dev.off()

    sparkIMain.interpret(png(/tmp/plot.png)barplotdev.off())

    Interpreter

    ...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))

  • PLOT MAGIC

    >png(/tmp/..)>barplot>dev.off()

    sparkIMain.interpret(png(/tmp/plot.png)barplotdev.off())

    File(/tmp/plot.png).read().toBase64()

    Interpreter

    ...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))

  • PLOT MAGIC

    >png(/tmp/..)>barplot>dev.off()

    sparkIMain.interpret(png(/tmp/plot.png)barplotdev.off())

    File(/tmp/plot.png).read().toBase64()

    Interpreter

    ...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))

    {"data":{"image/png":"iVBORw0KGgoAAAANSUhEUgAAAe"...}...}

  • PluggableBackends Livy'sSparkBackends Scala pyspark R

    IPython/Jupytersupportcomingsoon

    PLUGGABLE INTERPRETERS

  • Re-usingit GenericFrameworkforInterpreters

    51Kernels

    JUPYTER BACKEND

  • SPARK AS A SERVICE

  • REMEMBER AGAIN?YARNMaster

    SparkClient

    YARNNode

    SparkContext

    YARNNode

    SparkWorker

    YARNNode

    SparkWorker

    LivyServer

    Scalatra

    SessionManager

    Session

    SparkInterpreter

  • MULTI USERS

    YARNNode

    SparkContext

    LivyServer

    Scalatra

    SessionManager

    Session

    SparkInterpreter YARN

    Node

    SparkContext

    SparkInterpreter

    YARNNode

    SparkContext

    SparkInterpreter

    SparkClient

    SparkClient

    SparkClient

  • SHARED CONTEXTS?

    YARNNode

    SparkContext

    LivyServer

    Scalatra

    SessionManager

    Session

    SparkInterpreter

    SparkClient

    SparkClient

    SparkClient

  • SHARED RDD?

    YARNNode

    SparkContext

    LivyServer

    Scalatra

    SessionManager

    Session

    SparkInterpreter

    SparkClient

    SparkClient

    SparkClient

    RDD

  • SHARED RDDS?

    YARNNode

    SparkContext

    LivyServer

    Scalatra

    SessionManager

    Session

    SparkInterpreter

    SparkClient

    SparkClient

    SparkClient

    RDD

    RDD

    RDD

  • YARNNode

    SparkContext

    LivyServer

    Scalatra

    SessionManager

    Session

    SparkInterpreter

    SparkClient

    SparkClient

    SparkClient

    RDD

    RDD

    RDD

    SECURE IT?

  • YARNNode

    SparkContext

    LivyServer

    Scalatra

    SessionManager

    Session

    SparkInterpreter

    SparkClient

    SparkClient

    SparkClient

    RDD

    RDD

    RDD

    SECURE IT?

  • LivyServer

    Spark

    SparkClient

    SparkClient

    SparkClient

    SPARK AS SERVICE

    Spark

  • SHARING RDDS

  • PySparkshell

    RDD

    ShellPythonShell

  • PySparkshell

    RDD

    ShellPythonShell

  • PySparkshell

    RDD

    ShellPythonShell

    r=sc.parallelize([])srdd=ShareableRdd(r)

  • PySparkshell

    RDD{'ak':'Alaska'}

    {'ca':'California'}

    ShellPythonShell

    r=sc.parallelize([])srdd=ShareableRdd(r)

  • PySparkshell

    RDD{'ak':'Alaska'}

    {'ca':'California'}

    ShellPythonShell

    curl-XPOST/sessions/0/statement{'code':srdd.get('ak')}

    r=sc.parallelize([])srdd=ShareableRdd(r)

  • PySparkshell

    RDD{'ak':'Alaska'}

    {'ca':'California'}

    ShellPythonShell

    states=SharedRdd('host/sessions/0','srdd')states.get('ak')

    r=sc.parallelize([])srdd=ShareableRdd(r)

    curl-XPOST/sessions/0/statement{'code':srdd.get('ak')}

  • DEMO TIME

    https://github.com/romainr/hadoop-tutorials-examples/tree/master/notebook/shared_rdd

    https://github.com/romainr/hadoop-tutorials-examples/tree/master/notebook/shared_rddhttps://github.com/cloudera/hue/blob/577a0b6ed8ac845d3f3baa609f640d5937207194/apps/spark/java/livy-repl/src/test/scala/com/cloudera/hue/livy/repl/PythonInterpreterSpec.scala#L71

  • SSLSupport PersistentSessions Kerberos

    SECURITY

  • SPARK MAGIC

    FromMicrosopPythonmagicsforworkingwithremoteSparkclusters

    OpenSource:hAps://github.com/jupyter-incubator/sparkmagic

    https://github.com/jupyter-incubator/sparkmagic

  • FUTURE

    Movetoextrepo? Security iPython/Jupyterbackendsandfileformat SharednamedRDD/contexts? Sharedata Sparkspecific,languagegeneric,both? LeverageHue4

    https://issues.cloudera.org/browse/HUE-2990

    https://issues.cloudera.org/browse/HUE-2990

  • OpenSource:hAps://github.com/cloudera/hue/tree/master/apps/spark/java

    Readaboutit:hAp://gethue.com/spark/

    Scala,Java,Python,R

    TypeIntrospec:onforVisualiza:on

    YARN-clusterorlocalmodes

    Codesnippets/compiled

    RESTAPI

    Pluggablebackends

    Magickeywords

    Failureresilient

    Security

    LIVYSCHEAT SHEET

    https://github.com/cloudera/hue/tree/master/apps/spark/javahttp://gethue.com/spark/

  • BEDANKT!

    TWITTER

    @gethue

    USER GROUP

    [email protected]

    WEBSITE

    hAp://gethue.com

    LEARN

    hAp://learn.gethue.com

    http://twitter.com/gethuehttp://groups.google.com/a/cloudera.org/group/hue-userhttp://gethue.comhttp://learn.gethue.com