Download - Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

Transcript
Page 1: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

BUILDING A REST JOB SERVER FOR INTERACTIVE SPARK AS A SERVICERomain Rigaux - Cloudera Erick Tryzelaar - Cloudera

Page 2: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

WHY?

Page 3: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service
Page 4: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service
Page 5: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

NOTEBOOKS

EASYACCESSFROMANYWHERE

SHARESPARKCONTEXTSANDRDDs

BUILDAPPS

SPARKMAGIC

WHY SPARKAS A SERVICE?

Page 6: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

MARRIEDWITHFULLHADOOPECOSYSTEM

WHY SPARKIN HUE?

Page 7: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

HISTORYV1: OOZIE

• Itworks

• Codesnippet

THE GOOD

• SubmitthroughOozie

• Shellac:on

• VerySlow

• Batch

THE BAD

workflow.xmlsnippet.py

stdout

Page 8: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

HISTORYV2: SPARK IGNITER

• ItworksbeAer

THE GOOD

• CompilerJar

• Batchonly,noshell

• NoPython,R

• Security

• Singlepointoffailure

THE BAD Compile

Implement

Upload

jsonoutput

Batch

Scala

jar

Ooyala

Page 9: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

HISTORYV3: NOTEBOOK

• Likespark-submit/sparkshells

• Scala/Python/Rshells

• Jar/PythonbatchJobs

• NotebookUI

• YARN

THE GOOD

• Beta?

THE BAD

Livy

codesnippet batch

Page 10: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

GENERAL ARCHITECTURE

Spark

Spark

Spark

Livy YARN

!"

# $

Page 11: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

Livy

Spark

Spark

Spark

YARN

API

!"

# $

GENERAL ARCHITECTURE

Page 12: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LIVY SPARK SERVER

Page 13: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LIVYSPARK SERVER

•RESTWebserverinScalaforSparksubmissions

• Interac:veShellSessionsorBatchJobs

•Backends:Scala,Java,Python,R

•NodependencyonHue

•OpenSource:hAps://github.com/cloudera/

hue/tree/master/apps/spark/java

•Readaboutit:hAp://gethue.com/spark/

Page 14: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

ARCHITECTURE

• Standardwebservice:wrapperaroundspark-submit/Sparkshells• YARNmode,Sparkdriversruninsidethecluster(supportscrashes)• Noneedtoinheritanyinterfaceorcompilecode• Extendedtoworkwithadditionalbackends

Page 15: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LIVY WEB SERVERARCHITECTURE

LOCAL“DEV”MODE YARNMODE

Page 16: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LOCAL MODE

LivyServer

Scalatra

SessionManager

Session

SparkContextSpark

Client

SparkClient

SparkInterpreter

Page 17: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LOCAL MODE

LivyServer

Scalatra

SessionManager

Session

SparkClient

SparkClient

SparkContext

SparkInterpreter

Page 18: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LOCAL MODE

SparkClient

1

LivyServer

Scalatra

SessionManager

Session

SparkClient

SparkContext

SparkInterpreter

Page 19: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LOCAL MODE

SparkClient

1

2

LivyServer

Scalatra

SessionManager

Session

SparkClient

SparkContext

SparkInterpreter

Page 20: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LOCAL MODE

SparkClient

SparkInterpreter

1

2

LivyServer

Scalatra

SessionManager

Session

SparkClient

SparkContext

3

Page 21: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LOCAL MODE

SparkClient

1

2

LivyServer

Scalatra

SessionManager

Session

SparkClient

SparkContext

3

4 SparkInterpreter

Page 22: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LOCAL MODE

SparkClient

1

2

LivyServer

Scalatra

SessionManager

Session

SparkClient

SparkContext

3

4

5

SparkInterpreter

Page 23: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

YARN-CLUSTERMODE

PRODUCTION SCALABLE

Page 24: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

YARNMaster

SparkClient

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

LivyServer

Scalatra

SessionManager

Session

YARN-CLUSTERMODE

SparkInterpreter

Page 25: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LivyServer

YARNMaster

Scalatra

SparkClient

SessionManager

Session

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

1

YARN-CLUSTERMODE

SparkInterpreter

Page 26: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

YARNMaster

SparkClient

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

1

2

LivyServer

Scalatra

SessionManager

Session

YARN-CLUSTERMODE

SparkInterpreter

Page 27: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

YARNMaster

SparkClient

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

1

2

3

LivyServer

Scalatra

SessionManager

Session

YARN-CLUSTERMODE

SparkInterpreter

Page 28: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

YARNMaster

SparkClient

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

1

2

3

4LivyServer

Scalatra

SessionManager

Session

YARN-CLUSTERMODE

SparkInterpreter

Page 29: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

YARNMaster

SparkClient

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

1

2

3

4

5

LivyServer

Scalatra

SessionManager

Session

YARN-CLUSTERMODE

SparkInterpreter

Page 30: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

YARNMaster

SparkClient

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

1

2

3

4

5

6

LivyServer

Scalatra

SessionManager

Session

YARN-CLUSTERMODE

SparkInterpreter

Page 31: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

YARNMaster

SparkClient

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

1 7

2

3

4

5

6

LivyServer

Scalatra

SessionManager

Session

YARN-CLUSTERMODE

SparkInterpreter

Page 32: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

SESSION CREATION AND EXECUTION%curl-XPOSTlocalhost:8998/sessions\-d'{"kind":"spark"}'{"id":0,"kind":"spark","log":[...],"state":"idle"}

%curl-XPOSTlocalhost:8998/sessions/0/statements-d'{"code":"1+1"}'{"id":0,"output":{"data":{"text/plain":"res0:Int=2"},"execution_count":0,"status":"ok"},"state":"available"}

Page 33: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

Jar

Py

Scala

Python

R

Livy

Spark

Spark

Spark

YARN

/batches

/sessions

BATCH OR INTERACTIVE

Page 34: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

SHELL OR BATCH?YARNMaster

SparkClient

YARNNode

SparkInterpreter

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

LivyServer

Scalatra

SessionManager

Session

Page 35: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

SHELLYARNMaster

SparkClient

YARNNode

pyspark

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

LivyServer

Scalatra

SessionManager

Session

Page 36: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

BATCHYARNMaster

SparkClient

YARNNode

spark-submit

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

LivyServer

Scalatra

SessionManager

Session

Page 37: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LIVY INTERPRETERSScala,Python,R…

Page 38: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

REMEMBER?YARNMaster

SparkClient

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

LivyServer

Scalatra

SessionManager

Session

SparkInterpreter

Page 39: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

INTERPRETERS

• Pipestdin/stdouttoarunningshell

• Executethecode/sendtoSparkworkers

• Performmagicopera:ons

• Oneinterpreterperlanguage• “Swappable”withotherkernels(python,spark..)

Interpreter

>println(1+1)2

println(1+1)

2

Page 40: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LivyServer

INTERPRETER FLOW

Interpreter

Page 41: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LivyServer

>1+1

Interpreter

INTERPRETER FLOW

Page 42: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LivyServer

{“code”:“1+1”}

>1+1

Interpreter

INTERPRETER FLOW

Page 43: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LivyServer Interpreter

1+1{“code”:“1+1”}

>1+1

INTERPRETER FLOW

Page 44: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LivyServer Interpreter

1+1{“code”:“1+1”}

>1+1

Magic

INTERPRETER FLOW

Page 45: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LivyServer

2

Interpreter

1+1{“code”:“1+1”}

>1+1

Magic

INTERPRETER FLOW

Page 46: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

{“data”:{“application/json”:“2”}}

LivyServer

2

Interpreter

1+1{“code”:“1+1”}

>1+1

Magic

INTERPRETER FLOW

Page 47: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

{“data”:{“application/json”:“2”}}

LivyServer

2

Interpreter

1+1{“code”:“1+1”}

>1+1

2 Magic

INTERPRETER FLOW

Page 48: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

INTERPRETER FLOW CHART

ReceivelinesSplitintoChunks

Sendoutputtoserver

Senderrortoserver

Success

ExecuteChunkMagic!

Chunksle[?

Magicchunk?

No

Yes

NoYes

Exampleofparsing

Page 49: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

INTERPRETER MAGIC

• table• json• plotting• ...

Page 50: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

NO MAGIC

>1+1

Interpreter

1+1

sparkIMain.interpret(“1+1”)

{"id":0,"output":{"application/json":2}}

Page 51: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

[('',506610),('the',23407),('I',19540)...]

JSON MAGIC

>countssparkIMain.valueOfTerm(“counts”)

.toJson()

Interpreter

vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}

%jsoncounts

Page 52: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

JSON MAGIC

>countssparkIMain.valueOfTerm(“counts”)

.toJson()

Interpreter

{"id":0,"output":{"application/json":[{"count":506610,"word":""},{"count":23407,"word":"the"},{"count":19540,"word":"I"},...]...}

vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}

%jsoncounts

Page 53: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

[('',506610),('the',23407),('I',19540)...]

TABLE MAGIC

>counts

Interpreter

vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}

%tablecounts

sparkIMain.valueOfTerm(“counts”).guessHeaders().toList()

Page 54: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

TABLE MAGIC

>countssparkIMain.valueOfTerm(“counts”)

.guessHeaders().toList()

Interpreter

vallines=sc.textFile("shakespeare.txt");valcounts=lines.flatMap(line=>line.split("")).map(word=>(word,1)).reduceByKey(_+_).sortBy(-_._2).map{case(w,c)=>Map("word"->w,"count"->c)}

%tablecounts"application/vnd.livy.table.v1+json":{"headers":[{"name":"count","type":"BIGINT_TYPE"},{"name":"name","type":"STRING_TYPE"}],"data":[[23407,"the"],[19540,"I"],[18358,"and"],...]}

Page 55: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

PLOT MAGIC

>

sparkIMain.interpret(“png(‘/tmp/plot.png’)barplotdev.off()”)

Interpreter

...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))

Page 56: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

PLOT MAGIC

>

sparkIMain.interpret(“png(‘/tmp/plot.png’)barplotdev.off()”)

Interpreter

...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))

Page 57: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

PLOT MAGIC

>png(‘/tmp/..’)>barplot>dev.off()

sparkIMain.interpret(“png(‘/tmp/plot.png’)barplotdev.off()”)

Interpreter

...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))

Page 58: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

PLOT MAGIC

>png(‘/tmp/..’)>barplot>dev.off()

sparkIMain.interpret(“png(‘/tmp/plot.png’)barplotdev.off()”)

File(’/tmp/plot.png’).read().toBase64()

Interpreter

...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))

Page 59: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

PLOT MAGIC

>png(‘/tmp/..’)>barplot>dev.off()

sparkIMain.interpret(“png(‘/tmp/plot.png’)barplotdev.off()”)

File(’/tmp/plot.png’).read().toBase64()

Interpreter

...barplot(sorted_data$count,names.arg=sorted_data$value,main="Resourcehits",las=2,col=colfunc(nrow(sorted_data)),ylim=c(0,300))

{"data":{"image/png":"iVBORw0KGgoAAAANSUhEUgAAAe…"...}...}

Page 60: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

• PluggableBackends• Livy'sSparkBackends– Scala– pyspark– R

• IPython/Jupytersupportcomingsoon

PLUGGABLE INTERPRETERS

Page 61: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

• Re-usingit• GenericFrameworkforInterpreters

• 51Kernels

JUPYTER BACKEND

Page 62: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

SPARK AS A SERVICE

Page 63: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

REMEMBER AGAIN?YARNMaster

SparkClient

YARNNode

SparkContext

YARNNode

SparkWorker

YARNNode

SparkWorker

LivyServer

Scalatra

SessionManager

Session

SparkInterpreter

Page 64: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

MULTI USERS

YARNNode

SparkContext

LivyServer

Scalatra

SessionManager

Session

SparkInterpreter YARN

Node

SparkContext

SparkInterpreter

YARNNode

SparkContext

SparkInterpreter

SparkClient

SparkClient

SparkClient

Page 65: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

SHARED CONTEXTS?

YARNNode

SparkContext

LivyServer

Scalatra

SessionManager

Session

SparkInterpreter

SparkClient

SparkClient

SparkClient

Page 66: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

SHARED RDD?

YARNNode

SparkContext

LivyServer

Scalatra

SessionManager

Session

SparkInterpreter

SparkClient

SparkClient

SparkClient

RDD

Page 67: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

SHARED RDDS?

YARNNode

SparkContext

LivyServer

Scalatra

SessionManager

Session

SparkInterpreter

SparkClient

SparkClient

SparkClient

RDD

RDD

RDD

Page 68: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

YARNNode

SparkContext

LivyServer

Scalatra

SessionManager

Session

SparkInterpreter

SparkClient

SparkClient

SparkClient

RDD

RDD

RDD

SECURE IT?

Page 69: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

YARNNode

SparkContext

LivyServer

Scalatra

SessionManager

Session

SparkInterpreter

SparkClient

SparkClient

SparkClient

RDD

RDD

RDD

SECURE IT?

Page 70: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

LivyServer

Spark

SparkClient

SparkClient

SparkClient

SPARK AS SERVICE

Spark

Page 71: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

SHARING RDDS

Page 72: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

PySparkshell

RDD

ShellPythonShell

Page 73: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

PySparkshell

RDD

ShellPythonShell

Page 74: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

PySparkshell

RDD

ShellPythonShell

r=sc.parallelize([])srdd=ShareableRdd(r)

Page 75: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

PySparkshell

RDD{'ak':'Alaska'}

{'ca':'California'}

ShellPythonShell

r=sc.parallelize([])srdd=ShareableRdd(r)

Page 76: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

PySparkshell

RDD{'ak':'Alaska'}

{'ca':'California'}

ShellPythonShell

curl-XPOST/sessions/0/statement{'code':srdd.get('ak')}

r=sc.parallelize([])srdd=ShareableRdd(r)

Page 77: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

PySparkshell

RDD{'ak':'Alaska'}

{'ca':'California'}

ShellPythonShell

states=SharedRdd('host/sessions/0','srdd')states.get('ak')

r=sc.parallelize([])srdd=ShareableRdd(r)

curl-XPOST/sessions/0/statement{'code':srdd.get('ak')}

Page 79: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

• SSLSupport• PersistentSessions• Kerberos

SECURITY

Page 80: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

SPARK MAGIC

• FromMicrosop

•PythonmagicsforworkingwithremoteSpark

clusters

•OpenSource:hAps://github.com/jupyter-

incubator/sparkmagic

Page 81: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

FUTURE

•Movetoextrepo?

• Security• iPython/Jupyterbackendsandfileformat

• SharednamedRDD/contexts?

• Sharedata• Sparkspecific,languagegeneric,both?• LeverageHue4

https://issues.cloudera.org/browse/HUE-2990

Page 82: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

• OpenSource:hAps://github.com/cloudera/

hue/tree/master/apps/spark/java

• Readaboutit:hAp://gethue.com/spark/

•Scala,Java,Python,R

•TypeIntrospec:onforVisualiza:on

•YARN-clusterorlocalmodes

•Codesnippets/compiled

•RESTAPI

•Pluggablebackends

•Magickeywords

•Failureresilient

•Security

LIVY’SCHEAT SHEET

Page 83: Spark Summit Europe: Building a REST Job Server for interactive Spark as a service

BEDANKT!

TWITTER

@gethue

USER GROUP

hue-user@

WEBSITE

hAp://gethue.com

LEARN

hAp://learn.gethue.com