Spark Compute as a Service at Paypal with Prabhu Kasinathan

30
Prabhu Paypal, Inc. Spark Compute as a Service @ Paypal

Transcript of Spark Compute as a Service at Paypal with Prabhu Kasinathan

Page 1: Spark Compute as a Service at Paypal with Prabhu Kasinathan

PrabhuPaypal, Inc.

Spark Compute as a Service @ Paypal

Page 2: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Scale

Page 3: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Paypal ScaleBusiness

$354BTotal Paymentup 28% YoY

$102BMobile Paymentup 55% YoY

6.1BTotal Transactionsup 24% YoY

2.0BMobile Transactionsup 43% YoY

• One of the world’s largest internet payment companies

• 203+M active accounts on 200 markets around the world

• PayPal platform includes Braintree, Venmo, Paydiant, PP Credit and Xoom

Page 4: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Paypal ScaleCore Data Platform

70+PBData

40,000+YarnJobsPerDay

15+HadoopClusters

5+Compute

Page 5: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Spark on Yarn

Page 6: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Batch

EdgeNodes

Interactive

EdgeNodes

Job

Spark on YarnDeployment

Page 7: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Challenges

Page 8: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Challenges

• Need extensive support and maintenance for CLI

• Need to deploy entire stack of software

• Need to sync configurations across systems

• Need extensive testing of jobs before any upgrade

Batch

EdgeNodes

Interactive

EdgeNodes

Job

Administrators

Page 9: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Challenges

Batch

EdgeNodes

Interactive

EdgeNodes

Job

Developers

• No REST-friendly

• No low-latency/sub-seconds execution

• No cache sharing across jobs

• No modularity and easy-restartability

Page 10: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Challenges

Batch

EdgeNodes

Interactive

EdgeNodes

Job

Analysts/Scientists

• No easy way of interactive applications

• No multi-tenancy support and private workspace

• No direct spark sql execution

• No Kerberos integration

Page 11: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Challenges

• Different ways of jobs execution and coding standards

• No uniform logging, monitoring and alerting

• Limited audit and control

• No statement level history or metrics

Batch

EdgeNodes

Interactive

EdgeNodes

Job

Operations/Security

Page 12: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Building SCaaS

Spark Compute Platform

Page 13: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Batch

EdgeNodes

Interactive

EdgeNodes

Job

Building SCaaSWhere we started!

Page 14: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Batch

Interactive

JobJobServer

Building SCaaSAdding REST Job Server

Page 15: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Batch

Interactive

Job

LIVYGRID

JobServer

Building SCaaSAdding HA and Enhance Livy

PayPal Livy Version

ü Multi-Nodes High Availabilityü Kerberos Authentication

Changesü SQL Interpreterü Session Manager

Enhancements ü Session GC Improvementsü Plug-in Loggerü Yarn Poll Re-architectureü Multiple Spark Versions

Supportü White/Black list User

Authenticationü Dockersü Hbase Supportü Flink/Beam Support

Page 16: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Batch

Interactive

Job

LIVYGRID

JobServer

LivyAPI

NAS

Batch

Building SCaaSAdding Livy API and Utilities

Batch Utilitiesü startSparkBatch

ü stopSparkBatchü listSparkBatch

ü startSparklingWaterü stopSparklingWater

ü startSparkSqlü stopSparkSql

ü startSparkSessionü execSparkFileü execSparkCodeü stopSparkSessionü listSparkSession

ü livy-spark.jar

Interactive Utilitiesü livy-spark

Page 17: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Batch

Job

LIVYGRID

JobServer

LivyAPI

NAS

Batch

Interactive Interactive

Building SCaaSAdding Zeppelin

Page 18: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Batch

Job

LIVYGRID

JobServer

LivyAPI

NAS

Batch

Interactive Interactive

In InIn

In

Building SCaaSAdding Jupyter

Page 19: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Batch

Job

LIVYGRID

JobServer

LivyAPI

NAS

Batch

Interactive Interactive

In InIn

In

Interactive

Interactive

Building SCaaSAdding SQL Client

Page 20: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Batch

Job

LIVYGRID

JobServer

LivyAPI

NAS

Batch

Interactive Interactive

In InIn

In

Interactive

Interactive

Sparkling

WaterInteractive

Interactive

Building SCaaSAdding Sparkling Water

Page 21: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Batch

Job

LIVYGRID

JobServer

LivyAPI

NAS

Batch

Interactive Interactive

In InIn

In

Interactive

Interactive

Sparkling

Water

Interactive

Log

Building SCaaSAdding Spark Logger

Interactive

Page 22: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Batch

Job

LIVYGRID

JobServer

LivyAPI

NAS

Batch

Interactive Interactive

In InIn

In

Interactive

Interactive

Sparkling

Water

Interactive

Log Log

Building SCaaSAdding Livy Logger

Interactive

Page 23: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Batch

Job

LIVYGRID

JobServer

LivyAPI

NAS

Batch

Interactive Interactive

In InIn

In

Interactive

Interactive

Sparkling

WaterInteractive

Interactive

Log Log

Indexing

Search

Building SCaaSAdding Search Engine

Page 24: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Batch

Job

LIVYGRID

JobServer

LivyAPI

NAS

Batch

Interactive Interactive

In InIn

In

Interactive

Interactive

Sparkling

WaterInteractive

Interactive

Log Log

Indexing

Search

Metrics

Building SCaaSAdding Monitoring/Alerting

Page 25: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Batch

Job

LIVYGRID

JobServer

LivyAPI

NAS

Batch

Interactive Interactive

In InIn

In

Interactive

Interactive

Sparkling

WaterInteractive

Interactive

Log Log

Indexing

Search

Metrics

HistoryServer

Building SCaaSAdding Standard Tools

Page 26: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Batch

Job

LIVYGRID

JobServer

LivyAPI

NAS

Batch

Interactive Interactive

In InIn

In

Interactive

Interactive

Sparkling

WaterInteractive

Interactive

Log Log

Indexing

Search

Metrics

HistoryServer

Building SCaaSHere, you go! The SCaaS Framework

Page 27: Spark Compute as a Service at Paypal with Prabhu Kasinathan

SCaaS

Benefits

Page 28: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Administratorsü Less maintenance on CLIü Deploy software stack only on Job Serverü Configurations at one placeü Easy platform/software upgrade

Developersü REST-friendly and Docker-friendlyü Low-latency/sub-seconds executionü Sharing cache across jobsü Modularity and easy restartability

Analysts/Scientistsü User friendly interactive applicationsü Multi-tenancy and Private workspaceü Direct spark sql executionü Kerberos Support

Operations/Securityü Standardized coding and unified executionü Uniformed logging, monitoring and alertingü Fine-grained auditü Complete statement level history and metrics

SCaaSBenefits

Page 29: Spark Compute as a Service at Paypal with Prabhu Kasinathan

SCaaS Demo

Page 30: Spark Compute as a Service at Paypal with Prabhu Kasinathan

Thank You

Follow Ushttps://www.paypal-engineering.com

@paypaleng