Apache Flink: Streaming Done Right @ FOSDEM 2016

21
Apache Flink Streaming Done Right Till Rohrmann [email protected] @stsffap

Transcript of Apache Flink: Streaming Done Right @ FOSDEM 2016

Apache Flink Streaming Done Right

Till Rohrmann [email protected] @stsffap

What Is Apache Flink? §  Apache TLP since December

2014

§  Parallel streaming data flow runtime

§  Low latency & high throughput §  Exactly once semantics §  Stateful operations

2

Why Stream Processing? §  Most problems have streaming nature §  Stream processing gives lower latency §  Data volumes more easily tamed §  More predictable resource consumption

3

Eventstream

Counting Tweet Impressions

4

Inputstream

InputGroup

#tweets #tweets

#tweets #tweets

How Would I Do It with Flink?

5

caseclassTweet(id:Long,timestamp:Long,count:Long)valenv=StreamExecutionEnvironment.getEnvironment()valtweets:DataStream[Tweet]=env.addSource(newMyTwitterSource())valresult:DataStream[Tweet]=tweets.keyBy("id").timeWindow(Time.minutes(10)).sum("count")result.print()env.execute()

Wehavetodefinea6meframeforthe

aggrega6on

Expressive Windows

Time and count based; Event-time support; Custom windows

6

Stateful Operators §  What if a window grows too large? §  Solution: Stateful mapper with counter

7

Count tweet impressions

#42

Operator

User function

Operator state

#13 #37

Read Write

Intuitive API

8

caseclassTweet(id:Long,timestamp:Long,count:Long)valenv=StreamExecutionEnvironment.getEnvironment()valtweets:DataStream[Tweet]=env.addSource(newMyTwitterSource())valresult:DataStream[Tweet]=tweets.keyBy("id").mapWithState{(tweet,state:Option[Long])=>statematch{caseSome(counter)=>(tweet.copy(count=counter+1L),Some(counter+1L))caseNone=>(tweet,Some(1L))}}

What If a Failure Occurs?

Loss of data and incorrect count! 9

Count tweet impressions

#42

Operator

User function

Operator state

#?? #??

Read Write

We Need to Save Our State! §  Consistent snapshots of distributed data

stream and operator state

10

How Does It Work? §  Markers for checkpoints §  Injected in the data flow

11

Processing Guarantees §  At most once •  No guarantees at all

§  At least once •  Ensure that all operators

see all events §  Exactly once

12

Flink gives you all guarantees

Checkpointed Operators §  State is automatically checkpointed §  State backend is configurable

13

Barriers

Count tweet impressions

#42

Operator

User function

Operator state

#13 #37

Read Write

What About Performance?

14

Continuous streaming

Latency-bound buffering

Distributed Snapshots

High Throughput & Low Latency

With configurable throughput/latency tradeoff

What’s coming next?

15

Asynchronous Snapshots §  Taking snapshots stalls the operator §  Solution: Out of core & asynchronous snapshots

16 Async/incremental snapshots

Spill to disk

Barriers

Count tweet impressions

#42

User function

Operator state

#13 #37

Read Write

Streams with Varying Data Rate

17 time

even

ts/s

econ

d

With static resources: Provision for max. rate

Idle capacity

(1) Adjust Parallelism

18

Initial configuration

Scale Out (for load)

Scale In (save resources)

(2) Dynamic Worker Pool

19

JobManager

Resource Manager

Pool of Cluster Resources YARN/Mesos/…

Allocate

TaskManager

TaskManager Release

Declarative Queries §  StreamSQL

§  Complex event processing

20

valtabEnv=newTableEnvironment(env)tabEnv.registerStream(stream,“myStream”,(“ID”,“MEASURE”,“COUNT”))valsqlQuery=tabEnv.sql(“SELECTID,MEASUREFROMmyStreamWHERECOUNT>17”)

Sober Drunk

Car

Afoot drink

drink

enters car

walks

Raisealert

Where to Find Us?

www.flink.apache.org https://github.com/apache/flink

@ApacheFlink 21