Post on 22-Jan-2018
PrabhuPaypal, Inc.
Spark Compute as a Service @ Paypal
Scale
Paypal ScaleBusiness
$354BTotal Paymentup 28% YoY
$102BMobile Paymentup 55% YoY
6.1BTotal Transactionsup 24% YoY
2.0BMobile Transactionsup 43% YoY
• One of the world’s largest internet payment companies
• 203+M active accounts on 200 markets around the world
• PayPal platform includes Braintree, Venmo, Paydiant, PP Credit and Xoom
Paypal ScaleCore Data Platform
70+PBData
40,000+YarnJobsPerDay
15+HadoopClusters
5+Compute
Spark on Yarn
Batch
EdgeNodes
Interactive
EdgeNodes
Job
Spark on YarnDeployment
Challenges
Challenges
• Need extensive support and maintenance for CLI
• Need to deploy entire stack of software
• Need to sync configurations across systems
• Need extensive testing of jobs before any upgrade
Batch
EdgeNodes
Interactive
EdgeNodes
Job
Administrators
Challenges
Batch
EdgeNodes
Interactive
EdgeNodes
Job
Developers
• No REST-friendly
• No low-latency/sub-seconds execution
• No cache sharing across jobs
• No modularity and easy-restartability
Challenges
Batch
EdgeNodes
Interactive
EdgeNodes
Job
Analysts/Scientists
• No easy way of interactive applications
• No multi-tenancy support and private workspace
• No direct spark sql execution
• No Kerberos integration
Challenges
• Different ways of jobs execution and coding standards
• No uniform logging, monitoring and alerting
• Limited audit and control
• No statement level history or metrics
Batch
EdgeNodes
Interactive
EdgeNodes
Job
Operations/Security
Building SCaaS
Spark Compute Platform
Batch
EdgeNodes
Interactive
EdgeNodes
Job
Building SCaaSWhere we started!
Batch
Interactive
JobJobServer
Building SCaaSAdding REST Job Server
Batch
Interactive
Job
LIVYGRID
JobServer
Building SCaaSAdding HA and Enhance Livy
PayPal Livy Version
ü Multi-Nodes High Availabilityü Kerberos Authentication
Changesü SQL Interpreterü Session Manager
Enhancements ü Session GC Improvementsü Plug-in Loggerü Yarn Poll Re-architectureü Multiple Spark Versions
Supportü White/Black list User
Authenticationü Dockersü Hbase Supportü Flink/Beam Support
Batch
Interactive
Job
LIVYGRID
JobServer
LivyAPI
NAS
Batch
Building SCaaSAdding Livy API and Utilities
Batch Utilitiesü startSparkBatch
ü stopSparkBatchü listSparkBatch
ü startSparklingWaterü stopSparklingWater
ü startSparkSqlü stopSparkSql
ü startSparkSessionü execSparkFileü execSparkCodeü stopSparkSessionü listSparkSession
ü livy-spark.jar
Interactive Utilitiesü livy-spark
Batch
Job
LIVYGRID
JobServer
LivyAPI
NAS
Batch
Interactive Interactive
Building SCaaSAdding Zeppelin
Batch
Job
LIVYGRID
JobServer
LivyAPI
NAS
Batch
Interactive Interactive
In InIn
In
Building SCaaSAdding Jupyter
Batch
Job
LIVYGRID
JobServer
LivyAPI
NAS
Batch
Interactive Interactive
In InIn
In
Interactive
Interactive
Building SCaaSAdding SQL Client
Batch
Job
LIVYGRID
JobServer
LivyAPI
NAS
Batch
Interactive Interactive
In InIn
In
Interactive
Interactive
Sparkling
WaterInteractive
Interactive
Building SCaaSAdding Sparkling Water
Batch
Job
LIVYGRID
JobServer
LivyAPI
NAS
Batch
Interactive Interactive
In InIn
In
Interactive
Interactive
Sparkling
Water
Interactive
Log
Building SCaaSAdding Spark Logger
Interactive
Batch
Job
LIVYGRID
JobServer
LivyAPI
NAS
Batch
Interactive Interactive
In InIn
In
Interactive
Interactive
Sparkling
Water
Interactive
Log Log
Building SCaaSAdding Livy Logger
Interactive
Batch
Job
LIVYGRID
JobServer
LivyAPI
NAS
Batch
Interactive Interactive
In InIn
In
Interactive
Interactive
Sparkling
WaterInteractive
Interactive
Log Log
Indexing
Search
Building SCaaSAdding Search Engine
Batch
Job
LIVYGRID
JobServer
LivyAPI
NAS
Batch
Interactive Interactive
In InIn
In
Interactive
Interactive
Sparkling
WaterInteractive
Interactive
Log Log
Indexing
Search
Metrics
Building SCaaSAdding Monitoring/Alerting
Batch
Job
LIVYGRID
JobServer
LivyAPI
NAS
Batch
Interactive Interactive
In InIn
In
Interactive
Interactive
Sparkling
WaterInteractive
Interactive
Log Log
Indexing
Search
Metrics
HistoryServer
Building SCaaSAdding Standard Tools
Batch
Job
LIVYGRID
JobServer
LivyAPI
NAS
Batch
Interactive Interactive
In InIn
In
Interactive
Interactive
Sparkling
WaterInteractive
Interactive
Log Log
Indexing
Search
Metrics
HistoryServer
Building SCaaSHere, you go! The SCaaS Framework
SCaaS
Benefits
Administratorsü Less maintenance on CLIü Deploy software stack only on Job Serverü Configurations at one placeü Easy platform/software upgrade
Developersü REST-friendly and Docker-friendlyü Low-latency/sub-seconds executionü Sharing cache across jobsü Modularity and easy restartability
Analysts/Scientistsü User friendly interactive applicationsü Multi-tenancy and Private workspaceü Direct spark sql executionü Kerberos Support
Operations/Securityü Standardized coding and unified executionü Uniformed logging, monitoring and alertingü Fine-grained auditü Complete statement level history and metrics
SCaaSBenefits
SCaaS Demo
Thank You
Follow Ushttps://www.paypal-engineering.com
@paypaleng