Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink®...
Transcript of Apache Flink® and IoT: How StatefulEvent-Time Processing ... · ApacheCon NA 2017 - Apache Flink®...
1
Aljoscha Krettek@aljoscha
ApacheCon North AmericaMay, 2017
Apache Flink® and IoT: How Stateful Event-Time Processing Enables Accurate Analytics
What I’d Like to Talk About
2
§ IoT and event-time stream processing
§ Stateful stream processing
§ Streaming architecture and Flink
3
Original creators of Apache Flink®
Providers of the dA Platform, a supported
Flink distribution
IoT and Event-time Stream Processing
4
Example Event Sources
5
A Simple Definition
6
IoT use cases from the system’s perspective:
A large number of (distributed) things continuously generating a large amount of data.
IoT: Some Insights
7
§ Data is continuously produced → Stream Processing
§ Events have a timestamp→ Event-time based processing
§ Data/Events can arrive with huge delays/out-of-order
§ Most analyses happen on time windows
What Is Event-Time Processing
8
1977 1980 1983 1999 2002 2005 2015
Processing Time
EpisodeIV
EpisodeV
EpisodeVI
EpisodeI
EpisodeII
EpisodeIII
EpisodeVII
Event Time
What Is Event-Time Processing
9
1312735961112
1234567891011121314Processing Time
Event timestamp
Message Queue
What’s The Problem?
10
13
12
735961112
1234567891011121314Processing Time
Processing-Time Windows 137356
12 137 356Event-Time Windows
12
1112
Mismatch between event time and processing time.
Sources of Time Mismatch§ Big Mismatch• Network disconnects• Slow network
§ Small Mismatch• The nature of distributed systems• Differing system clock time
11
Small Event-Time Mismatch
12
Robust Stream Processing with Apache Flink®:A Simple Walkthroughhttp://data-artisans.com/robust-stream-processing-flink-walkthrough/
13
14
15
Recap: Event-Time§ IoT use cases need event-time
processing§ Even small mismatch of event
time/processing time will lead to wrong results
16
(Stateful) Stream Processing
17
Stream Processing
18
Computation
Computations on never-ending “streams” of data records (“events”)
Distributed Stream Processing
19
Computation
Computation spread across many machines
Computation Computation
Stateful Stream Processing
20
Computation
State
State is usually partitioned by some key in the data
Stateful Stream Processing II
21
§ Result depends on history of stream§ A stateful stream processor should
gives the tools to manage state• Recover, roll back, version upgrade, etc.
22
app state
app state
app state
event log
Queryservice
Recap: Stateful Streams§ Continuous processing of data that is
continuously generated§ I.e., pretty much all “big” data§ It’s all about state and time§ Flink does all of that
23
Operational Issues
24
Operational Questions§ What happens in case of failures?§ What if I need to update my code/Flink?§ Can I re-process my data?§ How can I execute my programs?
25
Failure Handling§ JobManager High-Availability using
ZooKeeper§ Periodic checkpoints of state to
persistent storage (HDFS, S3, …)§ In case of failure: rollback to previous
consistent state
26
Savepoints§ A persistent snapshot of all state§ When starting an application, state can
be initialized from a savepoint§ In-between savepoint and restore we can
update Flink version or user code
27
Closing
28
TL;DR§ Stateful stream processing is nice 😎§ IoT use cases require proper time
management§ Apache Flink is a stateful stream
processor with plenty of nifty features
29
30
Thank you!
@aljoscha@ApacheFlink@dataArtisans
Backup Slides
31
Event-Time Processing
32
What Is Event-Time Processing
33
1977 1980 1983 1999 2002 2005 2015
Processing Time
EpisodeIV
EpisodeV
EpisodeVI
EpisodeI
EpisodeII
EpisodeIII
EpisodeVII
Event Time
What Is Event-Time Processing
34
1312735961112
1234567891011121314Processing Time
Event timestamp
Message Queue
What is Event-Time Streaming§ Events have timestamps
§ Processing depends on timestamps
§ An event-time stream processor should give you the tools to reason about time• Handle streams that are out of
order35
Your code
state
t3 t1 t2t4 t1-t2 t3-t4
Recap: Event-Time§ IoT use cases need event-time
processing§ Even small mismatch of event
time/processing time will lead to wrong results
36
History of Flink
37
A brief History of Flink
38
January ‘10 December ‘14
v0.5 v0.6 v0.7
March ‘16
Flink ProjectIncubation
Top LevelProject
v0.8 v0.10
Release1.0
ProjectStratosphere
(Flink precursor)
v0.9
April ‘14
A brief History of Flink
39
January ‘10 December ‘14
v0.5 v0.6 v0.7
March ‘16
Flink ProjectIncubation
Top LevelProject
v0.8 v0.10
Release1.0
ProjectStratosphere
(Flink precursor)
v0.9
April ‘14
The academia gap:Reading/writing papers, teaching, worrying about
thesis
Realizing this might be interesting to people
beyond academia(even more so,
actually)