Livy: A REST Web Service For Apache Spark

20
Livy: A REST Web Service for Spark Pravin Mittal, Microsoft Anand Iyer, Cloudera

Transcript of Livy: A REST Web Service For Apache Spark

Page 1: Livy: A REST Web Service For Apache Spark

Livy:ARESTWebServiceforSpark

Pravin Mittal,MicrosoftAnandIyer,Cloudera

Page 2: Livy: A REST Web Service For Apache Spark

ReducefrictiontouseSparkwhilemaintainingallitspowerandflexibility

Page 3: Livy: A REST Web Service For Apache Spark

WhatisLivy?

AServicethatmanageslongrunningSparkContextsinyourcluster• OpenSourceApacheLicensed• RESTbasedinterface• LetsyoumanagemultipleSparkContexts• Finegrainedjobsubmission• RetrievejobresultsoverRESTasynchronouslyorsynchronously• ClientAPIsinjava,scala andsooninpython

Page 4: Livy: A REST Web Service For Apache Spark

WhatisLivy?

LivyServer

Cluster (ManagedbyYARN,Mesos, etc)

Driver ExecutorExecutor

Clients

Driver ExecutorExecutor

HTTP

Context 1

Context 2

Context 2

Context 1

Page 5: Livy: A REST Web Service For Apache Spark

SparkonAzureHDInsight

FullyManagedService• 100%opensourceApacheSparkandHadoopbits• LatestreleasesofSpark• FullysupportedbyMicrosoftandHortonworks• 99.9%AzureCloudSLA;24/7ManagedService• Certifications:PCI,ISO27018,SOC,HIPAA,EU-MC

Optimizedforexperimentationanddevelopment• Jupyter Notebooks(scala,python,automaticdatavisualizations)• IntelliJplugin(jobsubmission,remotedebugging)• ODBCconnectorforPowerBI,Tableau,Qlik,SAP,Excel,etc

Page 6: Livy: A REST Web Service For Apache Spark

MakeSparkSimple- IntegratedwithAzureEcosystem

• MicrosoftRServer- Multi-threadedmathlibrariesandtransparentparallelizationinRServermeanshandlingupto1000xmoredataandup to50xfasterspeedsthanopensourceR.ThisisbasedonopensourceR,itdoesrequireanychangetoRscripts

• AzureDataLakeStore– HDFSforthecloud,optimizedformassivethroughput, Ultra-highcapacity,LowLatency,SecureACLsupport• AzureDataFactoryorchestratesSparkETLpipeline• PowerBI connector forSparkforrichvisualization. New inPowerBIisastreamingconnectorallowingyoutopublish real-timeeventsfromSpark

StreamingdirectlytoPowerBI.• EventsHub connectorasadatasource forSparkstreaming• AzureSQLDatawarehouse &Hbase connector forfast&scalablestorage

Page 7: Livy: A REST Web Service For Apache Spark

Jupyter-Spark IntegrationviaLivy• Sparkmagic isanopensourcelibrarythatMicrosoftisincubatingundertheJupyter Incubatorprogram• Thousands ofSparkclustersinproduction providingfeedbacktofurtherimprovetheexperience

https://github.com/jupyter-incubator/sparkmagic

Page 8: Livy: A REST Web Service For Apache Spark

ArchitecturalAdvantagesofJupyterintegrationviaLivy• RunSparkcodecompletelyremotely;noSparkcomponentsneedtobe

installedontheJupyter server• Multi-languagesupport;thePython,ScalaandRkernelsareequally

feature-rich• Supportformultipleendpoints;youcanuseasinglenotebooktostart

multipleSparkjobsindifferentlanguagesandagainstdifferentremoteclusters

• EasyintegrationwithanyPythonlibraryfordatascienceorvisualization,likePandasorPlotly

Page 9: Livy: A REST Web Service For Apache Spark

HN0 HN1

YARNRM

LivyServer

MetadataID,State,...

Gateway

HDICluster

WN0

Driver

WN1

Executor

WN2

Executor

JobStatus

Page 10: Livy: A REST Web Service For Apache Spark

HN0 HN1

YARNRM

LivyServer

MetadataID,State,UniqueTag,ApplicationID

Gateway

HDICluster

WN0

Driver

WN1

Executor

WN2

Executor

JobStatus

ZK0

ZK1

ZK2

LivyMetadata

ID,UniqueTag,

AppID

Page 11: Livy: A REST Web Service For Apache Spark

DEMO

Page 12: Livy: A REST Web Service For Apache Spark

Resources

https://github.com/aggFTW/sparksummit2016

Page 13: Livy: A REST Web Service For Apache Spark

LivyArchitectureHighlights

Page 14: Livy: A REST Web Service For Apache Spark

Cluster (ManagedbyYARN,Mesos, etc)

Managemultipleindependent SparkContexts

LivyServer

Executor

Driver ExecutorExecutorClientA

HTTP

Context 1

Context 1

Driver ExecutorExecutor

Context 2

Driver ExecutorExecutor

Context 3

Executor

Executor

Client BContext 2

Client CContext 2

ClientDContext 3

Page 15: Livy: A REST Web Service For Apache Spark

Cluster (ManagedbyYARN,Mesos, etc)

UserImpersonation

LivyServer

Driver ExecutorExecutor

HTTP

Context 1

Driver ExecutorExecutor

Context 2

Executor

ClientauthenticatesasuserA

Context 1

Clientauthenticatesasuser B

Context 2

Running asuserA.Only has access tofiles andresources thatuserAhas

access to.

Running asuser B.Only has access tofiles andresources thatuser Bhas

access to.

Page 16: Livy: A REST Web Service For Apache Spark

LivyClientAPI

Page 17: Livy: A REST Web Service For Apache Spark

ClientAPI:JobInterface

Page 18: Livy: A REST Web Service For Apache Spark

ClientAPI:Submittingajob

Page 19: Livy: A REST Web Service For Apache Spark

ClientAPIArchitecture

LivyServer

Cluster (ManagedbyYARN,Mesos, etc)

Driver ExecutorExecutorClient

Application Context 1

SerializedResultData

SerializedClosure

Page 20: Livy: A REST Web Service For Apache Spark

Communityhttp://livy.io