Livy: A REST Web Service For Apache Spark
-
Upload
jen-aman -
Category
Data & Analytics
-
view
1.967 -
download
9
Transcript of Livy: A REST Web Service For Apache Spark
Livy:ARESTWebServiceforSpark
Pravin Mittal,MicrosoftAnandIyer,Cloudera
ReducefrictiontouseSparkwhilemaintainingallitspowerandflexibility
WhatisLivy?
AServicethatmanageslongrunningSparkContextsinyourcluster• OpenSourceApacheLicensed• RESTbasedinterface• LetsyoumanagemultipleSparkContexts• Finegrainedjobsubmission• RetrievejobresultsoverRESTasynchronouslyorsynchronously• ClientAPIsinjava,scala andsooninpython
WhatisLivy?
LivyServer
Cluster (ManagedbyYARN,Mesos, etc)
Driver ExecutorExecutor
Clients
Driver ExecutorExecutor
HTTP
Context 1
Context 2
Context 2
Context 1
SparkonAzureHDInsight
FullyManagedService• 100%opensourceApacheSparkandHadoopbits• LatestreleasesofSpark• FullysupportedbyMicrosoftandHortonworks• 99.9%AzureCloudSLA;24/7ManagedService• Certifications:PCI,ISO27018,SOC,HIPAA,EU-MC
Optimizedforexperimentationanddevelopment• Jupyter Notebooks(scala,python,automaticdatavisualizations)• IntelliJplugin(jobsubmission,remotedebugging)• ODBCconnectorforPowerBI,Tableau,Qlik,SAP,Excel,etc
MakeSparkSimple- IntegratedwithAzureEcosystem
• MicrosoftRServer- Multi-threadedmathlibrariesandtransparentparallelizationinRServermeanshandlingupto1000xmoredataandup to50xfasterspeedsthanopensourceR.ThisisbasedonopensourceR,itdoesrequireanychangetoRscripts
• AzureDataLakeStore– HDFSforthecloud,optimizedformassivethroughput, Ultra-highcapacity,LowLatency,SecureACLsupport• AzureDataFactoryorchestratesSparkETLpipeline• PowerBI connector forSparkforrichvisualization. New inPowerBIisastreamingconnectorallowingyoutopublish real-timeeventsfromSpark
StreamingdirectlytoPowerBI.• EventsHub connectorasadatasource forSparkstreaming• AzureSQLDatawarehouse &Hbase connector forfast&scalablestorage
Jupyter-Spark IntegrationviaLivy• Sparkmagic isanopensourcelibrarythatMicrosoftisincubatingundertheJupyter Incubatorprogram• Thousands ofSparkclustersinproduction providingfeedbacktofurtherimprovetheexperience
https://github.com/jupyter-incubator/sparkmagic
ArchitecturalAdvantagesofJupyterintegrationviaLivy• RunSparkcodecompletelyremotely;noSparkcomponentsneedtobe
installedontheJupyter server• Multi-languagesupport;thePython,ScalaandRkernelsareequally
feature-rich• Supportformultipleendpoints;youcanuseasinglenotebooktostart
multipleSparkjobsindifferentlanguagesandagainstdifferentremoteclusters
• EasyintegrationwithanyPythonlibraryfordatascienceorvisualization,likePandasorPlotly
HN0 HN1
YARNRM
LivyServer
MetadataID,State,...
Gateway
HDICluster
WN0
Driver
WN1
Executor
WN2
Executor
JobStatus
HN0 HN1
YARNRM
LivyServer
MetadataID,State,UniqueTag,ApplicationID
Gateway
HDICluster
WN0
Driver
WN1
Executor
WN2
Executor
JobStatus
ZK0
ZK1
ZK2
LivyMetadata
ID,UniqueTag,
AppID
DEMO
Resources
https://github.com/aggFTW/sparksummit2016
LivyArchitectureHighlights
Cluster (ManagedbyYARN,Mesos, etc)
Managemultipleindependent SparkContexts
LivyServer
Executor
Driver ExecutorExecutorClientA
HTTP
Context 1
Context 1
Driver ExecutorExecutor
Context 2
Driver ExecutorExecutor
Context 3
Executor
Executor
Client BContext 2
Client CContext 2
ClientDContext 3
Cluster (ManagedbyYARN,Mesos, etc)
UserImpersonation
LivyServer
Driver ExecutorExecutor
HTTP
Context 1
Driver ExecutorExecutor
Context 2
Executor
ClientauthenticatesasuserA
Context 1
Clientauthenticatesasuser B
Context 2
Running asuserA.Only has access tofiles andresources thatuserAhas
access to.
Running asuser B.Only has access tofiles andresources thatuser Bhas
access to.
LivyClientAPI
ClientAPI:JobInterface
ClientAPI:Submittingajob
ClientAPIArchitecture
LivyServer
Cluster (ManagedbyYARN,Mesos, etc)
Driver ExecutorExecutorClient
Application Context 1
SerializedResultData
SerializedClosure
Communityhttp://livy.io