Extending Spark Streaming to Support Complex Event Processing

34
이름 김병진, 권오찬 Extending Spark Streaming to Support Complex Event Processing 소속 삼성전자 2015. 10. 27.

Transcript of Extending Spark Streaming to Support Complex Event Processing

Page 1: Extending Spark Streaming to Support Complex Event Processing

이름 김병진, 권오찬

Extending Spark Streaming to Support Complex Event Processing

소속 삼성전자

2015. 10. 27.

Page 2: Extending Spark Streaming to Support Complex Event Processing

Agenda

• Background

• Motivation

• Streaming SQL

• Auto Scaling

• Future Work

Page 3: Extending Spark Streaming to Support Complex Event Processing

Background - Spark

• Apache Spark is a high performance data processing platform

• Use a distributed memory cache (RDD*)

• Process batch and stream data in the common platform

• Be developed by 700+ contributors

3x faster than Storm (Stream, 30 Nodes)

100x faster than Hadoop (Batch, 50 Nodes, Clustering Alg.)

*RDD: Resilient Distributed Datasets

Page 4: Extending Spark Streaming to Support Complex Event Processing

Background – Spark RDD

• Resilient Distributed Datasets

• RDDs are immutable

• Transformations are lazy operations which build a RDD’s lineage graph

• E.g.: map, filter, join, union

• Actions launch a computation to return a value or write data

• E.g.: count, collect, reduce, save

• The scheduler builds a DAG of stages to execute

• If a task fails, spark re-runs it on another node as long as its stage’s parent are still

available

Page 5: Extending Spark Streaming to Support Complex Event Processing

Background - Spark Modules

• Spark Streaming

• Divide stream data into micro-batches

• Compute the batches with fault tolerance

• Spark SQL

• Support SQL to handle structured data

• Provide a common way to access a variety of data

sources (Hive, Avro, Parquet, ORC, JSON, and JDBC)

Spark

Spark Streaming

batches of X seconds

live data stream

processed results

DStreams

RDDs

map, reduce, count, …

Page 6: Extending Spark Streaming to Support Complex Event Processing

Motivation – Support CEP in Spark

• Current spark is not enough to support CEP*

• Do not support continuous query language to process stream data

• Do not support auto-scaling to elastically allocate resources

• We have solved the issues:

• Extend Intel’s Streaming SQL package

• Improve performance by optimizing time-based windowed aggregation

• Support query chains by implementing “Insert Into” queries

• Implement elastic-seamless resource allocation

• Can do auto-scale in/out

*Complex Event Processing

Page 7: Extending Spark Streaming to Support Complex Event Processing

Streaming SQL

Page 8: Extending Spark Streaming to Support Complex Event Processing

Streaming SQL - Intel

• Streaming SQL is a third party library of Spark Packages

• Build on top of Spark Streaming and Spark SQL Catalyst

• Manipulate stream data like static structured data in database

• Process queries continuously

• Support time based windowing join/aggregation queries

http://spark-packages.org/package/Intel-bigdata/spark-streamingsql

Streaming SQL

Page 9: Extending Spark Streaming to Support Complex Event Processing

Streaming SQL Query

Schema DStream

Optimized Logical Plan

Streaming SQL - Plans

• Logical Plan

• Modify streaming SQL optimizer

• Add windowed aggregate logical plan

• Physical Plan

• Add windowed aggregate physical plan to call new WindowedStateDStream class

• Implement new expression functions (Count, Sum, Average, Min, Max, Distinct)

• Develop FixedSizedAggregator for efficiency which is the concept of IBM InfoSphere Streams

Samsung

Stream Plan

Intel

Logical Plan

Physical Plan

Page 10: Extending Spark Streaming to Support Complex Event Processing

Streaming SQL – Windowed State

• WindowedStateDStream class

• Modify StateDStream class to support windowed computing

• Add inverse update function to evaluate old values

• Add filter function to remove obsolete keys

Intel Samsung

Compute all elements in the window

time 1 time 2 time 3 time 4 time 5

Original DStream Windowed State DStream

time 1 time 2 time 3 time 4 time 5

Original DStream Windowed DStream

Compute only Delta (old and new elements)

window-based operation Update (in)

InvUpdate (out)

State: Array[AggregateFunction]

Count

Update InvUpdate

Update

InvUpdate

State

Sum

Update InvUpdate

State

Avg

Update InvUpdate

State

Min

Update InvUpdate

State

Max

Update InvUpdate

State

Page 11: Extending Spark Streaming to Support Complex Event Processing

Streaming SQL – Fixed Sized Aggregator

• Fixed-Sized Aggregator* of IBM InfoSphere Streams

• Use fixed sized binary tree

• Maintain leaf nodes as a circular buffer using front and back pointers

• Very efficient for non-invertable expressions (e.g. Min, Max)

• Example - Min Aggregator

4 7 3

4 3

3

4 7 3 2

4 2

2

7 3 2

7 2

2

9 7 3 2

7 2

2

4

4

4

4 7

4

4

*K. Tangwongsan, M. Hirzel, S. Schneider, K-L. Wu, “General incremental sliding-window aggregation,” In VLDB, 2015.

Page 12: Extending Spark Streaming to Support Complex Event Processing

Streaming SQL – Aggregate Functions

• Windowed Aggregate Functions

• Add invUpdate function

• Reduce objects for efficient serialization

• Implement Fixed-Sized Aggregator for Min, Max functions

Aggregate Functions State Update InvUpdate

Count countValue: Long Increase countValue Decrease countValue

Sum sumValue: Any Add to sumValue Subtract to sumValue

Average sumValue: Any countValue: Long

Increase countValue Add to sumValue

Decrease countValue Subtract to sumValue

Min fat: FixedSizedAggregator Insert minimum to fat Remove the oldest from fat

Max fat: FixedSizedAggregator Insert maximum to fat Remove the oldest from fat

CountDistinct distinctMap: mutable.HashMap[Row, Long]

Increase count of distinct value in map

Decrease count of distinct value in map

Page 13: Extending Spark Streaming to Support Complex Event Processing

Streaming SQL – Insert Into

• Support “Insert Into” query

• Implement Data Sources API

• Implement the insert function in Kafka relation

• Modified some physical plans to support streaming

• Modified physical planning strategies to assign a specific plan

• Convert current RDD to a DataFrame and then insert it

• Support query chaining

The result a query can be reused by multiple queries

Example2

Kafka Kafka

INSERT INTO TABLE Example1 SELECT duid, time FROM ParsedTable

INSERT INTO TABLE Example2 SELECT duid, COUNT (*) FROM Example1 GROUP BY duid Parsed

Table Example1

Kafka

Kafka Relation InsertIntoStreamSource

StreamExecutedCommand InsertIntoTable

Kafka

Logical Plan Physical Plan

Kafka Relation

DataFrame

Page 14: Extending Spark Streaming to Support Complex Event Processing

Streaming SQL – Evaluation

• Experimental Environment

• Processing Node: Intel i5 2.67GHz, 4 Cores, 4GB per node

Spark Cluster: 7 Nodes

Kafka Cluster: 3 Nodes

• Test Query:

SELECT t.word, COUNT(t.word), SUM(t.num), AVG(t.num), MIN(t.num), MAX(t.num)

FROM (SELECT * FROM t_kafka) OVER (WINDOW 'x' SECONDS, SLIDE ‘1' SECONDS) AS t

GROUP BY t.word

• Default Set:

100 EPS, 100 Keys, 20 sec Window, 1 sec Slide, 10 Executors, 10 Reducers

Spark Cluster

Test Agent

Kafka Cluster

Spark UI

Page 15: Extending Spark Streaming to Support Complex Event Processing

Streaming SQL – Evaluation

• Test Result

• Show low processing delays despite of heavy event loads or large-sized windows

• Need memory optimization

0

200

400

600

800

1000

1200

1400

20 40 60 80 100

Del

ay (

ms)

Window Size (sec)

Intel Samsung

0100020003000400050006000700080009000

10000

20 40 60 80 100

Mem

ory

(K

B)

Window Size (sec)

Intel Samsung

0

1000

2000

3000

4000

5000

10 100 1000 10000

Del

ay (

ms)

EPS

Intel Samsung

0

2000

4000

6000

8000

10000

12000

10 100 1000 10000

Mem

ory

(K

B)

EPS

Intel Samsung

Page 16: Extending Spark Streaming to Support Complex Event Processing

Auto Scaling

Page 17: Extending Spark Streaming to Support Complex Event Processing

Time-varying Event Rate in Real World

• Spark Streaming

• Data can be ingested from many sources like Kafka, Flume, Twitter, etc.

• Live input data streams are divided into batches, and they are processed

• In streaming application, event rate may change frequently over time

• How can this be dealt with?

< Event Rate > < Batch Processing Time>

High event rate Can’t be processed in real-time

Page 18: Extending Spark Streaming to Support Complex Event Processing

Spark AS-IS

• Spark currently supports dynamic resource allocation on Yarn (SPARK-3174) and coarse-grained Mesos (SPARK-6287)

• Existing Dynamic Resource Allocation is not optimized for streaming

• Even though event rate becomes smaller, executors may not be removed due to

scheduling policy

• It is difficult to determine appropriate configuration parameters

• Backpressure (SPARK-7398)

• Enable the Spark streaming to control the receiving rate dynamically for handling bursty input

streams

• Not real-time

Driver

Task queue Task

Executor #1

. . .

Executor #2

Executor #n

• Upper/lower bound for the number of executors

• spark.dynamicAllocation.minExecutors

• spark.dynamicAllocation.maxExecutors

• Scale-out condition

• schedulerBacklogTimeout < staying time of a task in task queue

• Scale-in condition

• executorIdleTimeout < staying time of an executor in idle state

Page 19: Extending Spark Streaming to Support Complex Event Processing

• Goal

• Allocate resources to streaming applications dynamically as the rate of incoming events varies over time

• Enable the applications to meet real-time deadlines • Utilize the resource efficiently

• Cloud Architecture

Elastic-seamless Resource Allocation

Flint*

*Flint: our spark job manager

Page 20: Extending Spark Streaming to Support Complex Event Processing

• Spark currently supports three cluster managers

• Standalone, Apache Mesos, Hadoop Yarn

• Spark on Mesos

• Fine-grained mode

• Each Spark task runs as a separate Mesos task.

• Launching overhead is big, so it is not suitable for streaming.

• Coarse-grained mode

• Launch only one long-running Spark task on each Mesos machine

• Cannot scale-out more than the number of Mesos machines

• Cannot control executor resource

• Only available for total resource

• Both of two modes have some problems to achieve our goal

Spark Deployment Architecture

Page 21: Extending Spark Streaming to Support Complex Event Processing

Flint Architecture Overview

Slave 1 Slave 2 Slave n

. . .

MESOS Cluster

Cluster 1 YARN

Cluster n YARN

. . .

. . .

NM 1

NM 2

. . . NM n NM 1

NM 2

NM n . . .

App 1 SPARK

App n SPARK

EXEC 1

EXEC 2

EXEC n . . . EXEC 1

EXEC 2

EXEC n . . .

FLINT

REST Manager

Scheduler

Application Manager

Application Handler

Application Handler

Application Handler

Deploy Manager

Log Manager

• Marathon, Zookeeper, ETCD, HDFS

Spark on on-demand Yarn

Page 22: Extending Spark Streaming to Support Complex Event Processing

• Job submission process

1. Request to deploy dockerized YARN via Marathon

2. Launch ResourceManager, NodeManagers for Spark driver/executors

3. Check ResourceManager status

4. Submit Spark Job and check Job status

5. Watch Spark driver status and get Spark endpoint

Flint Architecture: Job Submission

Slave 1 Slave n

. . .

MESOS Cluster

Slave 1 Slave n

. . .

MESOS Cluster

Cluster YARN

NM 1

NM 2

. . . NM n

Slave 1 Slave n

. . .

MESOS Cluster

Cluster YARN

NM 1

NM 2

. . . NM n

App SPARK

EXEC 1

EXEC 2

EXEC n . . .

Create Yarn Cluster

Submit Spark Job

Page 23: Extending Spark Streaming to Support Complex Event Processing

• Scale-out Process

1. Request to increase the instance of NodeManager via Marathon

2. Launch new NodeManager

3. Scale-out Spark executor

Flint Architecture: Scale-out

Slave 1 Slave n

. . .

MESOS Cluster

Cluster YARN

NM 1 . . .

NM n

App SPARK

EXEC 1 EXEC n . . .

Slave 1 Slave n

. . .

MESOS Cluster

Cluster YARN

NM 1 . . .

NM n

App SPARK

EXEC 1 EXEC n . . .

NM 2

Slave 1 Slave n

. . .

MESOS Cluster

Cluster YARN

NM 1 . . .

NM n

NM 2

App SPARK

EXEC 1 EXEC n . . .

EXEC 2

Request new NodeManager Scale-out Spark executor

Page 24: Extending Spark Streaming to Support Complex Event Processing

• Scale-in Process

1. Get Executor’s info and select spark victims

2. Inactivate spark victims and kill them after n x batch interval

3. Get Yarn victims and decommission NodeManager via ResourceManager

4. Get mesos task id of Yarn victims and kill mesos tasks of victims

Flint Architecture: Scale-in

Slave 1 Slave n

. . .

MESOS Cluster

Cluster YARN

NM 1 . . .

NM n

NM 2

App SPARK

EXEC 1 EXEC n . . .

EXEC 2

Slave 1 Slave n

. . .

MESOS Cluster

Cluster YARN

NM 1 . . .

NM n

NM 2

App SPARK

EXEC 1 EXEC n . . .

Slave 1 Slave n

. . .

MESOS Cluster

Cluster YARN

NM 1 . . .

NM n

NM 2

App SPARK

EXEC 1 EXEC n . . .

Scale-in Spark executor Remove NodeManager

Page 25: Extending Spark Streaming to Support Complex Event Processing

• Auto-scaling Mechanism

• Real-time constraint

• Batch processing time < Batch interval

• Scale-out condition

• α x batch interval < batch processing delay

• Scale-in condition

• β x batch interval > batch processing delay

Flint Architecture: Auto-scaling

( 0 < β < α ≤ 1 )

Processing Time

Batch Interval

α x Batch Interval β x Batch Interval

Scale-in Scale-out

0 Time

Page 26: Extending Spark Streaming to Support Complex Event Processing

• Timeout & Retry for Killed Executor

• Although Spark executor is killed by admin, Spark re-tries to connect it.

Spark Issues: Scale-in (1/3)

Scale-in

Scheduling Delay

Event Rate

Processing Delay

Driver Exec*

scale-in

Exec Task

RDD req

1st try

2nd try

3rd try

fail

Processing Delay

• spark.shuffle.io.maxRetries = 3 • spark.shuffle.io.retryWait = 5s

Page 27: Extending Spark Streaming to Support Complex Event Processing

• Timeout & Retry for Killed Executor

• Advertisement for killed executor

• Inactivate executor before scale-in

• Inactivate executor and kill them after n x batch interval

• During inactivating stage, the executor does not receive any task from driver.

Spark Issues: Scale-in (2/3)

Driver Exec*

scale-in

Exec Task

RDD req

fail

advertise

Fast Fail

Page 28: Extending Spark Streaming to Support Complex Event Processing

• Data Locality

• Spark scheduler consider data locality

• Processing/Node/Rack locality

• spark.locality.wait = 3s

• However, waiting time for locality is big burden to streaming applications

• The waiting time should be much less than batch interval for streaming

• Otherwise, streaming may not be processed in real-time

• Thus, Flint overrides “spark.locality.wait” to very small value if application type is streaming

Spark Issues: Scale-in (3/3)

Scheduler Cluster

T T T T

3s10ms (max)

rsc alloc

T

Page 29: Extending Spark Streaming to Support Complex Event Processing

• Data Localization

• During scale-out process, new Yarn container performs to localize some data

(e.g. spark jar, application jar)

• Data localization incurs high disk I/O, so

• Solution

• Prefetch if possible

• Disk isolation, SSD

Spark Issues: Scale-out (1/2)

Existing App

New App

Disk

Access disk during localization (50-70 MB/sec)

Performance Degradation

Page 30: Extending Spark Streaming to Support Complex Event Processing

• RDD Replication

• When a new executor is added, a receiver requests to replicate received RDD blocks to the

executor

• New executor does not ready to receive RDD blocks during an initialization

• At this situation, the receiver waits until new executor is ready

• Solution

• A receiver does not replicate RDD blocks to new executor during initialization

• E.g., spark.blockmanager.peer.bootstrap = 10s

Spark Issues: Scale-out (2/2)

Kafka

R

RT

R R

Executor R

R

T

R

Executor 1

R

T

R

Executor 2

RT

T

Receiver Task

Task

R RDD Replicate factor = 2

Page 31: Extending Spark Streaming to Support Complex Event Processing

• Demo Environment

• Data source: Kafka

• Auto-scaling parameter

• α=0.4, β=0.9

• SQL query

• SELECT t.word, COUNT(DISTINCT t.num), SUM(t.num), AVG(t.num), MIN(t.num), MAX(t.num) FROM

(SELECT * FROM t_kafka) OVER (WINDOW 300 SECONDS, SLIDE 3 SECONDS) AS t GROUP BY t.word

Demo (1/2)

Task

Task

Producer

Rcvr

8 3 4 2

5 7 9 1

0 3 7 8

Task Driver

0: 23 1: 34 2: 41 …

Flint

Monitoring S

cale

-in/O

ut

Page 32: Extending Spark Streaming to Support Complex Event Processing

Demo (2/2)

Page 33: Extending Spark Streaming to Support Complex Event Processing

Future Work

• Streaming SQL

• Apply Tungsten Framework

• Small-sized object, code gen, GC free memory management

• Share RDDs

• Map stream tables to RDD

• Auto Scaling

• Support to dynamically allocate resources for batch (non-streaming) applications

• Support unified schedulers for heterogeneous applications

• Batch, streaming, ad-hoc

Page 34: Extending Spark Streaming to Support Complex Event Processing

THANK YOU!

https://github.com/samsung/spark-cep