Spark Streaming Resiliency (Bay Area Spark Meetup)
-
Upload
prasanna-padmanabhan -
Category
Technology
-
view
804 -
download
3
Transcript of Spark Streaming Resiliency (Bay Area Spark Meetup)
● Deployment Setup
● Background
Agenda
● Use cases for Real Time Stream Processing
● Creating Chaos
● Motivations for Spark
● Spark Streaming Primer
● Injecting Chaos in Spark
● Future
Agenda
● Background
● Use cases for Real Time Stream Processing
● Motivations for Spark
● Creating Chaos
● Spark Streaming Primer
● Deployment Setup
● Injecting Chaos in Spark
● Future
Scale at Netflix
● 400 Billion events per day
● 8 Million events/sec during peak
● Numerous types of events (UI
Events, Play Events, Impression
events etc)
What do we do with it?
● Event logs are captured into Hadoop (EMR)
● Run ETL jobs using Hive/Presto to
○ Provide input to pre-compute recommendations
○ User behavior analysis
○ Data analysis and Reporting
Agenda
● Background
● Use cases for Real Time Stream Processing
● Motivations for Spark
● Creating Chaos
● Spark Streaming Primer
● Deployment Setup
● Injecting Chaos in Spark
● Future
Use Cases for Stream Processing
Faster identification of Data Anomalies and Regressions
Bad iPhone push
Agenda
● Background
● Use cases for Real Time Stream Processing
● Motivations for Spark
● Creating Chaos
● Spark Streaming Primer
● Deployment Setup
● Injecting Chaos in Spark
● Future
Motivations for Spark
● Popular compute engine for
batch processing
● Widely used for Offline
Experimentations at Netflix
● Improves agility with
Interactive queries Interactive Experimenter’s Notebook
Motivations for Spark
Single platform to build batch and real-time applications
S3
Micro Services
Spark
Spark Streaming
Recommender Systems
Batch Data
Streaming Data
Agenda
● Background
● Use cases for Real Time Stream Processing
● Motivations for Spark
● Creating Chaos
● Spark Streaming Primer
● Deployment Setup
● Injecting Chaos in Spark
● Future
Chaos Monkey Approach
● Simulate failures by randomly
killing components
● Failures inevitably happen when
least desired
● Lather, Rinse, Repeat!
Agenda
● Background
● Use cases for Real Time Stream Processing
● Motivations for Spark
● Creating Chaos
● Spark Streaming Primer
● Deployment Setup
● Injecting Chaos in Spark
● Future
Spark Components
Spark Driver
Cluster Manager (Mesos, YARN,
Standalone)
Task Task
Worker Node
Executor
Task Task
Worker Node
Executor
.
.
.
Spark Driver
Spark Driver
Cluster Manager (Mesos, YARN,
Standalone)
Task Task
Worker Node
Executor
Task Task
Worker Node
Executor
.
.
.
Main Program, DAG Scheduler
Cluster Manager
Spark Driver
Cluster Manager (Mesos, YARN,
Standalone)
Task Task
Worker Node
Executor
Task Task
Worker Node
Executor
.
.
.
Resource Allocation
Spark Worker
Spark Driver
Cluster Manager (Mesos, YARN,
Standalone)
Task Task
Worker Node
Executor
Task Task
Worker Node
Executor
.
.
.
Runs Worker Process &
Monitors Executors
How does streaming work?
● Data Streams are processed in batches
● Each batch processed in Spark
● Results are pushed out in batch
Agenda
● Background
● Use cases for Real Time Stream Processing
● Motivations for Spark
● Creating Chaos
● Spark Streaming Primer
● Deployment Setup
● Injecting Chaos in Spark
● Future
Application Details
● Process subset of UI Events from Kafka
● Compute aggregate metrics
● Publish metrics to Atlas
● Spark 1.2.0
Standalone Cluster Manager
● Provide resource management and resiliency
● All in one package
○ Built-in, easy to deploy
○ Troubleshoot issues with single team
(Databricks)
Agenda
● Background
● Use cases for Real Time Stream Processing
● Motivations for Spark
● Creating Chaos
● Spark Streaming Primer
● Deployment Setup
● Injecting Chaos in Spark
● Future
Stream Resiliency
● Streaming application
continues to run
● Partial data loss during
failure is acceptable
Driver Resiliency (Client Mode)
WorkerMaster
Worker
Worker
Client
Driver
./spark-submit --deploy-mode “client”
Entire Application is killed
Driver Resiliency (Client Mode)
WorkerMaster
Worker
Worker
Client
Driver
Driver Resiliency (Cluster Mode)(with supervise)
WorkerMaster
Worker
Worker
Client
./spark-submit --deploy-mode “cluster” --supervise
Driver Resiliency (Cluster Mode)(with supervise)
WorkerMaster
Worker
Worker
Client
Driver
Driver runs in the worker
Driver Resiliency (Cluster Mode)(with supervise)
WorkerMaster
Worker
Worker
Client
Driver
Driver is started in a new worker
Driver Resiliency (Cluster Mode)(with supervise)
WorkerMaster
Worker
Worker
Client
Driver
Driver is started in a new worker
Master Resiliency (Multi Master)
Worker
Worker
Worker
Client
Standby MasterActive Master Active Master
Standby becomes Active
Master Resiliency (Multi Master)
Worker
Worker
Worker
Client
Standby MasterActive Master Active Master
Standby becomes Active
Executor runs as child process of Worker
Worker Resiliency
WorkerMaster
Worker
Worker
Client
ExecutorDriver
Worker
Worker Resiliency
WorkerMaster
Worker
Worker
Client
ExecutorDriver
Driver and Executor are also killed
Worker
Worker Resiliency
WorkerMaster
Worker
Worker
Client
ExecutorDriver
Driver and Executor are also killed
Worker is relaunched
Driver and executor are also relaunched
Worker
Worker Resiliency
WorkerMaster
Worker
Worker
Client
ExecutorDriver
Driver and Executor are also killed
Worker is relaunched
Driver and executor are also relaunched
Worker
Executor Resiliency
WorkerMaster
Worker
Worker
Client
Driver Executor
Executor is relaunched
Executor
Executor Resiliency
WorkerMaster
Worker
Worker
Client
Driver Executor
Executor is relaunched
Executor
Tasks in flight are rescheduled
Executor Resiliency
WorkerMaster
Worker
Worker
Client
Driver Executor
Executor is relaunched
Executor
Tasks in flight are rescheduled
Agenda
● Background
● Use cases for Real Time Stream Processing
● Motivations for Spark
● Creating Chaos
● Spark Streaming Primer
● Deployment Setup
● Injecting Chaos in Spark
● Future
Future
● Lambda Architecture
● Operational Enhancements
○ Dynamic scaling
○ Additional spark instrumentation
● http://bit.ly/persinfra
(Senior Software Engineer - Personalization Infra)
We are hiring!