Building a REST Job Server for Interactive Spark as a Service

83
BUILDING A REST JOB SERVER FOR INTERACTIVE SPARK AS A SERVICE Romain Rigaux - Cloudera Erick Tryzelaar - Cloudera

Transcript of Building a REST Job Server for Interactive Spark as a Service

Page 1: Building a REST Job Server for Interactive Spark as a Service

BUILDING A REST JOB SERVER FOR INTERACTIVE SPARK AS A SERVICERomain Rigaux - Cloudera Erick Tryzelaar - Cloudera

Page 2: Building a REST Job Server for Interactive Spark as a Service

WHY?

Page 3: Building a REST Job Server for Interactive Spark as a Service
Page 4: Building a REST Job Server for Interactive Spark as a Service
Page 5: Building a REST Job Server for Interactive Spark as a Service

NOTEBOOKS

EASYACCESSFROMANYWHERE

SHARESPARKCONTEXTSANDRDDs

BUILDAPPS

SPARKMAGIC

WHY SPARKAS A SERVICE?

Page 6: Building a REST Job Server for Interactive Spark as a Service

MARRIEDWITHFULLHADOOPECOSYSTEM

WHY SPARKIN HUE?

Page 7: Building a REST Job Server for Interactive Spark as a Service

HISTORYV1: OOZIE

• Itworks

• Codesnippet

THE GOOD

• SubmitthroughOozie

• Shellac:on

• VerySlow

• Batch

THE BAD

workflow.xmlsnippet.py

stdout

Page 8: Building a REST Job Server for Interactive Spark as a Service

HISTORYV2: SPARK IGNITER

• ItworksbeAer

THE GOOD

• CompilerJar

• Batchonly,noshell

• NoPython,R

• Security

• Singlepointoffailure

THE BAD Compile

Implement

Upload

jsonoutput

Batch

Scala

jar

Ooyala

Page 9: Building a REST Job Server for Interactive Spark as a Service

HISTORYV3: NOTEBOOK

• Likespark-submit/sparkshells

• Scala/Python/Rshells

• Jar/PythonbatchJobs

• NotebookUI

• YARN

THE GOOD

• Beta?

THE BAD

Livy

codesnippet batch

Page 10: Building a REST Job Server for Interactive Spark as a Service

GENERAL ARCHITECTURE

Spark

Spark

Spark

Livy YARN

!"

# $

Page 11: Building a REST Job Server for Interactive Spark as a Service

Livy

Spark

Spark

Spark

YARN

API

!"

# $

GENERAL ARCHITECTURE

Page 12: Building a REST Job Server for Interactive Spark as a Service

LIVY SPARK SERVER

Page 13: Building a REST Job Server for Interactive Spark as a Service

LIVYSPARK SERVER

•RESTWebserverinScalaforSparksubmissions

• Interac:veShellSessionsorBatchJobs

•Backends:Scala,Java,Python,R

•NodependencyonHue

•OpenSource:hAps://github.com/cloudera/

hue/tree/master/apps/spark/java

•Readaboutit:hAp://gethue.com/spark/

Page 14: Building a REST Job Server for Interactive Spark as a Service

ARCHITECTURE

• Standardwebservice:wrapperaroundspark-submit/Sparkshells• YARNmode,Sparkdriversruninsidethecluster(supportscrashes)• Noneedtoinheritanyinterfaceorcompilecode• Extendedtoworkwithadditionalbackends

Page 15: Building a REST Job Server for Interactive Spark as a Service

LIVY WEB SERVERARCHITECTURE

LOCAL“DEV”MODE YARNMODE

Page 16: Building a REST Job Server for Interactive Spark as a Service

LOCAL MODE

LivyServer

Scalatra

SessionManager

Session

SparkContextSpark

Client

SparkClient

SparkInterpreter

Page 17: Building a REST Job Server for Interactive Spark as a Service

LOCAL MODE

LivyServer

Scalatra

SessionManager

Session

SparkClient

SparkClient

SparkContext

SparkInterpreter

Page 18: Building a REST Job Server for Interactive Spark as a Service

LOCAL MODE

SparkClient

1

LivyServer

Scalatra

SessionManager

Session

SparkClient

SparkContext

SparkInterpreter

Page 19: Building a REST Job Server for Interactive Spark as a Service

LOCAL MODE

SparkClient

1

2

LivyServer

Scalatra

SessionManager

Session

SparkClient

SparkContext

SparkInterpreter

Page 20: Building a REST Job Server for Interactive Spark as a Service

LOCAL MODE

SparkClient

SparkInterpreter

1

2

LivyServer

Scalatra

SessionManager

Session

SparkClient

SparkContext

3

Page 21: Building a REST Job Server for Interactive Spark as a Service

LOCAL MODE

SparkClient

1

2

LivyServer

Scalatra

SessionManager

Session

SparkClient

SparkContext

3

4 SparkInterpreter

Page 22: Building a REST Job Server for Interactive Spark as a Service

LOCAL MODE

SparkClient

1

2

LivyServer

Scalatra

SessionManager

Session

SparkClient

SparkContext

3

4

5

SparkInterpreter

Page 23: Building a REST Job Server for Interactive Spark as a Service

YARN-CLUSTERMODE

PRODUCTION SCALABLE

Page 24: Building a REST Job Server for Interactive Spark as a Service

YARNMaster

SparkClient

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

LivyServer

Scalatra

SessionManager

Session

YARN-CLUSTERMODE

SparkInterpreter

Page 25: Building a REST Job Server for Interactive Spark as a Service

LivyServer

YARNMaster

Scalatra

SparkClient

SessionManager

Session

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

1

YARN-CLUSTERMODE

SparkInterpreter

Page 26: Building a REST Job Server for Interactive Spark as a Service

YARNMaster

SparkClient

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

1

2

LivyServer

Scalatra

SessionManager

Session

YARN-CLUSTERMODE

SparkInterpreter

Page 27: Building a REST Job Server for Interactive Spark as a Service

YARNMaster

SparkClient

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

1

2

3

LivyServer

Scalatra

SessionManager

Session

YARN-CLUSTERMODE

SparkInterpreter

Page 28: Building a REST Job Server for Interactive Spark as a Service

YARNMaster

SparkClient

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

1

2

3

4LivyServer

Scalatra

SessionManager

Session

YARN-CLUSTERMODE

SparkInterpreter

Page 29: Building a REST Job Server for Interactive Spark as a Service

YARNMaster

SparkClient

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

1

2

3

4

5

LivyServer

Scalatra

SessionManager

Session

YARN-CLUSTERMODE

SparkInterpreter

Page 30: Building a REST Job Server for Interactive Spark as a Service

YARNMaster

SparkClient

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

1

2

3

4

5

6

LivyServer

Scalatra

SessionManager

Session

YARN-CLUSTERMODE

SparkInterpreter

Page 31: Building a REST Job Server for Interactive Spark as a Service

YARNMaster

SparkClient

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

1 7

2

3

4

5

6

LivyServer

Scalatra

SessionManager

Session

YARN-CLUSTERMODE

SparkInterpreter

Page 32: Building a REST Job Server for Interactive Spark as a Service

SESSION CREATION AND EXECUTION%curl-XPOSTlocalhost:8998/sessions\-d'{"kind":"spark"}'{"id":0,"kind":"spark","log":[...],"state":"idle"}

%curl-XPOSTlocalhost:8998/sessions/0/statements-d'{"code":"1+1"}'{"id":0,"output":{"data":{"text/plain":"res0:Int=2"},"execution_count":0,"status":"ok"},"state":"available"}

Page 33: Building a REST Job Server for Interactive Spark as a Service

Jar

Py

Scala

Python

R

Livy

Spark

Spark

Spark

YARN

/batches

/sessions

BATCH OR INTERACTIVE

Page 34: Building a REST Job Server for Interactive Spark as a Service

SHELL OR BATCH?YARNMaster

SparkClient

YARNNode

SparkInterpreter

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

LivyServer

Scalatra

SessionManager

Session

Page 35: Building a REST Job Server for Interactive Spark as a Service

SHELLYARNMaster

SparkClient

YARNNode

pyspark

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

LivyServer

Scalatra

SessionManager

Session

Page 36: Building a REST Job Server for Interactive Spark as a Service

BATCHYARNMaster

SparkClient

YARNNode

spark-submit

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

LivyServer

Scalatra

SessionManager

Session

Page 37: Building a REST Job Server for Interactive Spark as a Service

LIVY INTERPRETERSScala,Python,R…

Page 38: Building a REST Job Server for Interactive Spark as a Service

REMEMBER?YARNMaster

SparkClient

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

LivyServer

Scalatra

SessionManager

Session

SparkInterpreter

Page 39: Building a REST Job Server for Interactive Spark as a Service

INTERPRETERS

• Pipestdin/stdouttoarunningshell

• Executethecode/sendtoSparkworkers

• Performmagicopera:ons

• Oneinterpreterperlanguage• “Swappable”withotherkernels(python,spark..)

Interpreter

>println(1+1)2

println(1+1)

2

Page 40: Building a REST Job Server for Interactive Spark as a Service

LivyServer

INTERPRETER FLOW

Interpreter

Page 41: Building a REST Job Server for Interactive Spark as a Service

LivyServer

>1+1

Interpreter

INTERPRETER FLOW

Page 42: Building a REST Job Server for Interactive Spark as a Service

LivyServer

{“code”:“1+1”}

>1+1

Interpreter

INTERPRETER FLOW

Page 43: Building a REST Job Server for Interactive Spark as a Service

LivyServer Interpreter

1+1{“code”:“1+1”}

>1+1

INTERPRETER FLOW

Page 44: Building a REST Job Server for Interactive Spark as a Service

LivyServer Interpreter

1+1{“code”:“1+1”}

>1+1

Magic

INTERPRETER FLOW

Page 45: Building a REST Job Server for Interactive Spark as a Service

LivyServer

2

Interpreter

1+1{“code”:“1+1”}

>1+1

Magic

INTERPRETER FLOW

Page 46: Building a REST Job Server for Interactive Spark as a Service

{“data”:{“application/json”:“2”}}

LivyServer

2

Interpreter

1+1{“code”:“1+1”}

>1+1

Magic

INTERPRETER FLOW

Page 47: Building a REST Job Server for Interactive Spark as a Service

{“data”:{“application/json”:“2”}}

LivyServer

2

Interpreter

1+1{“code”:“1+1”}

>1+1

2 Magic

INTERPRETER FLOW

Page 48: Building a REST Job Server for Interactive Spark as a Service

INTERPRETER FLOW CHART

ReceivelinesSplitintoChunks

Sendoutputtoserver

Senderrortoserver

Success

ExecuteChunkMagic!

Chunksle[?

Magicchunk?

No

Yes

NoYes

Exampleofparsing

Page 49: Building a REST Job Server for Interactive Spark as a Service

INTERPRETER MAGIC

• table• json• plotting• ...

Page 50: Building a REST Job Server for Interactive Spark as a Service

NO MAGIC

>1+1

Interpreter

1+1

sparkIMain.interpret(“1+1”)

{"id":0,"output":{"application/json":2}}

Page 51: Building a REST Job Server for Interactive Spark as a Service

[('',506610),('the',23407),('I',19540)...]

JSON MAGIC

>countssparkIMain.valueOfTerm(“counts”)

.toJson()

Interpreter

vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}

%jsoncounts

Page 52: Building a REST Job Server for Interactive Spark as a Service

JSON MAGIC

>countssparkIMain.valueOfTerm(“counts”)

.toJson()

Interpreter

{"id":0,"output":{"application/json":[{"count":506610,"word":""},{"count":23407,"word":"the"},{"count":19540,"word":"I"},...]...}

vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}

%jsoncounts

Page 53: Building a REST Job Server for Interactive Spark as a Service

[('',506610),('the',23407),('I',19540)...]

TABLE MAGIC

>counts

Interpreter

vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}

%tablecounts

sparkIMain.valueOfTerm(“counts”).guessHeaders().toList()

Page 54: Building a REST Job Server for Interactive Spark as a Service

TABLE MAGIC

>countssparkIMain.valueOfTerm(“counts”)

.guessHeaders().toList()

Interpreter

vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}

%tablecounts"application/vnd.livy.table.v1+json":{"headers":[{"name":"count","type":"BIGINT_TYPE"},{"name":"name","type":"STRING_TYPE"}],"data":[[23407,"the"],[19540,"I"],[18358,"and"],...]}

Page 55: Building a REST Job Server for Interactive Spark as a Service

PLOT MAGIC

>

sparkIMain.interpret(“png(‘/tmp/plot.png’)barplotdev.off()”)

Interpreter

...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))

Page 56: Building a REST Job Server for Interactive Spark as a Service

PLOT MAGIC

>

sparkIMain.interpret(“png(‘/tmp/plot.png’)barplotdev.off()”)

Interpreter

...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))

Page 57: Building a REST Job Server for Interactive Spark as a Service

PLOT MAGIC

>png(‘/tmp/..’)>barplot>dev.off()

sparkIMain.interpret(“png(‘/tmp/plot.png’)barplotdev.off()”)

Interpreter

...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))

Page 58: Building a REST Job Server for Interactive Spark as a Service

PLOT MAGIC

>png(‘/tmp/..’)>barplot>dev.off()

sparkIMain.interpret(“png(‘/tmp/plot.png’)barplotdev.off()”)

File(’/tmp/plot.png’).read().toBase64()

Interpreter

...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))

Page 59: Building a REST Job Server for Interactive Spark as a Service

PLOT MAGIC

>png(‘/tmp/..’)>barplot>dev.off()

sparkIMain.interpret(“png(‘/tmp/plot.png’)barplotdev.off()”)

File(’/tmp/plot.png’).read().toBase64()

Interpreter

...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))

{"data":{"image/png":"iVBORw0KGgoAAAANSUhEUgAAAe…"...}...}

Page 60: Building a REST Job Server for Interactive Spark as a Service

• PluggableBackends• Livy'sSparkBackends– Scala– pyspark– R

• IPython/Jupytersupportcomingsoon

PLUGGABLE INTERPRETERS

Page 61: Building a REST Job Server for Interactive Spark as a Service

• Re-usingit• GenericFrameworkforInterpreters

• 51Kernels

JUPYTER BACKEND

Page 62: Building a REST Job Server for Interactive Spark as a Service

SPARK AS A SERVICE

Page 63: Building a REST Job Server for Interactive Spark as a Service

REMEMBER AGAIN?YARNMaster

SparkClient

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

LivyServer

Scalatra

SessionManager

Session

SparkInterpreter

Page 64: Building a REST Job Server for Interactive Spark as a Service

MULTI USERS

YARNNode

SparkContext

LivyServer

Scalatra

SessionManager

Session

SparkInterpreter YARN

Node

SparkContext

SparkInterpreter

YARNNode

SparkContext

SparkInterpreter

SparkClient

SparkClient

SparkClient

Page 65: Building a REST Job Server for Interactive Spark as a Service

SHARED CONTEXTS?

YARNNode

SparkContext

LivyServer

Scalatra

SessionManager

Session

SparkInterpreter

SparkClient

SparkClient

SparkClient

Page 66: Building a REST Job Server for Interactive Spark as a Service

SHARED RDD?

YARNNode

SparkContext

LivyServer

Scalatra

SessionManager

Session

SparkInterpreter

SparkClient

SparkClient

SparkClient

RDD

Page 67: Building a REST Job Server for Interactive Spark as a Service

SHARED RDDS?

YARNNode

SparkContext

LivyServer

Scalatra

SessionManager

Session

SparkInterpreter

SparkClient

SparkClient

SparkClient

RDD

RDD

RDD

Page 68: Building a REST Job Server for Interactive Spark as a Service

YARNNode

SparkContext

LivyServer

Scalatra

SessionManager

Session

SparkInterpreter

SparkClient

SparkClient

SparkClient

RDD

RDD

RDD

SECURE IT?

Page 69: Building a REST Job Server for Interactive Spark as a Service

YARNNode

SparkContext

LivyServer

Scalatra

SessionManager

Session

SparkInterpreter

SparkClient

SparkClient

SparkClient

RDD

RDD

RDD

SECURE IT?

Page 70: Building a REST Job Server for Interactive Spark as a Service

LivyServer

Spark

SparkClient

SparkClient

SparkClient

SPARK AS SERVICE

Spark

Page 71: Building a REST Job Server for Interactive Spark as a Service

SHARING RDDS

Page 72: Building a REST Job Server for Interactive Spark as a Service

PySparkshell

RDD

ShellPythonShell

Page 73: Building a REST Job Server for Interactive Spark as a Service

PySparkshell

RDD

ShellPythonShell

Page 74: Building a REST Job Server for Interactive Spark as a Service

PySparkshell

RDD

ShellPythonShell

r=sc.parallelize([])srdd=ShareableRdd(r)

Page 75: Building a REST Job Server for Interactive Spark as a Service

PySparkshell

RDD{'ak':'Alaska'}

{'ca':'California'}

ShellPythonShell

r=sc.parallelize([])srdd=ShareableRdd(r)

Page 76: Building a REST Job Server for Interactive Spark as a Service

PySparkshell

RDD{'ak':'Alaska'}

{'ca':'California'}

ShellPythonShell

curl-XPOST/sessions/0/statement{'code':srdd.get('ak')}

r=sc.parallelize([])srdd=ShareableRdd(r)

Page 77: Building a REST Job Server for Interactive Spark as a Service

PySparkshell

RDD{'ak':'Alaska'}

{'ca':'California'}

ShellPythonShell

states=SharedRdd('host/sessions/0','srdd')states.get('ak')

r=sc.parallelize([])srdd=ShareableRdd(r)

curl-XPOST/sessions/0/statement{'code':srdd.get('ak')}

Page 79: Building a REST Job Server for Interactive Spark as a Service

• SSLSupport• PersistentSessions• Kerberos

SECURITY

Page 80: Building a REST Job Server for Interactive Spark as a Service

SPARK MAGIC

• FromMicrosop

•PythonmagicsforworkingwithremoteSpark

clusters

•OpenSource:hAps://github.com/jupyter-

incubator/sparkmagic

Page 81: Building a REST Job Server for Interactive Spark as a Service

FUTURE

•Movetoextrepo?

• Security• iPython/Jupyterbackendsandfileformat

• SharednamedRDD/contexts?

• Sharedata• Sparkspecific,languagegeneric,both?• LeverageHue4

https://issues.cloudera.org/browse/HUE-2990

Page 82: Building a REST Job Server for Interactive Spark as a Service

• OpenSource:hAps://github.com/cloudera/

hue/tree/master/apps/spark/java

• Readaboutit:hAp://gethue.com/spark/

•Scala,Java,Python,R

•TypeIntrospec:onforVisualiza:on

•YARN-clusterorlocalmodes

•Codesnippets/compiled

•RESTAPI

•Pluggablebackends

•Magickeywords

•Failureresilient

•Security

LIVY’SCHEAT SHEET

Page 83: Building a REST Job Server for Interactive Spark as a Service

BEDANKT!

TWITTER

@gethue

USER GROUP

hue-user@

WEBSITE

hAp://gethue.com

LEARN

hAp://learn.gethue.com