Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha...
-
Upload
big-data-spain -
Category
Technology
-
view
309 -
download
1
Transcript of Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha...
1!
Aljoscha Krettek @aljoscha
Big Data Spain November 17, 2016
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics
What I’d Like to Talk About
2
§ Streaming Architecture and Flink
§ IoT and Event-Time based stream processing
§ Use-Case Examples
3
Original creators of Apache Flink®
Providers of the dA Platform, a supported
Flink distribution
Intro: The Streaming Architecture
4
Rethinking Data Architecture
§ Better app isolation
§ Real-time reaction to events
§ Robust continuous applications
§ Process both real-time and historical data 5
6
app state
app state
app state
event log
Query service
What is (Distributed) Streaming § Streaming:
Computations on never-ending “streams” of data records (“events”)
§ Distributed: Computation spread across many machines
7
Your code
Your code
Your code
Your code
What is Stateful Streaming § Computation and state
• E.g., counters, windows of past events, state machines, trained ML models
§ Result depends on history of stream
§ A stateful stream processor should gives the tools to manage state • Recover, roll back, version,
upgrade, etc 8
Your code
state
What is Event-Time Streaming § Data records associated with
timestamps (time series data)
§ Processing depends on timestamps
§ An event-time stream processor should give you the tools to reason about time • Handle streams that are out of order • Core feature is watermarks – a clock
to measure event time
9
Your code
state
t3 t1 t2 t4 t1-t2 t3-t4
Recap: What is Streaming? § Continuous processing on data that is
continuously generated § I.e., pretty much all “big” data § It’s all about state and time § Flink does all of what we just saw
10
IoT and Event-time Stream Processing
11
12 1read.bi/1yDOQQ3
The 'Internet Of Everything' Will Generate $14.4 Trillion Of Value Over The Next Decade.1
Example Event Sources
13
A Simple Definition
14
IoT use cases from the system’s perspective: A large number of (distributed) things generating a large amount of data.
Important Properties
15
§ Data is continuously produced → Stream Processing
§ Events have a timestamp that has to be considered → Event-time based processing
§ Data/Events can arrive with huge delays § Most analyses happen on time windows
Remember: Streaming technology is enabling the obvious: continuous
processing on data that is continuously produced
Hint: you already have streaming data 16
What Is Event-Time Processing
17
13127359611 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 Processing Time
Event timestamp
Message Queue
What’s The Problem?
18
13
12
7359611 12
1 2 3 4 5 6 7 8 9 10 11 12 13 14 Processing Time
Processing-Time Windows 137356
12 137 356Event-Time Windows
12
11 12
Mismatch between event time and processing time.
Sources of Time Mismatch § Big Mismatch • Network disconnects • Slow network
§ Small Mismatch • The nature of distributed systems • Differing system clock time
19
Big Event-Time Mismatch
20
1977 1980 1983 1999 2002 2005 2015
Processing Time
Episode IV
Episode V
Episode VI
Episode I
Episode II
Episode III
Episode VII
Event Time
Small Event-Time Mismatch
21
Robust Stream Processing with Apache Flink®: A Simple Walkthrough http://data-artisans.com/robust-stream-processing-flink-walkthrough/
22
23
24
Recap: Event-Time § IoT use cases need event-time
processing § Even small mismatch of event time/
processing time will lead to wrong results
25
Use-Case Examples
26
30 Flink applications in production for more than one year. 10 billion events (2TB) processed daily
Complex jobs of > 30 operators running 24/7, processing 30 billion events daily, maintaining state of 100s of GB with exactly-once guarantees
Largest job has > 20 operators, runs on > 5000 vCores in 1000-node cluster, processes millions of events per second
27
King § Challenges: • Many games (Candy Crush, Farm Heroes, Pet
Rescue, and Bubble Witch…) • 300 million monthly unique users • 30 billion events received every day
§ Need Event-Time Based statistics
28 https://techblog.king.com/rbea-scalable-real-time-analytics-king/
Solution: RBEA
29 https://techblog.king.com/rbea-scalable-real-time-analytics-king/
Solution: RBEA § Multiplexing of multiple data scientist
requests into a single Flink job § Groovy as language for analysis scripts § Event-time windowing
30 https://techblog.king.com/rbea-scalable-real-time-analytics-king/
Bouygues Telecom
31 http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/
~120 users*
5 Flink Production Apps
750 TB Storage
4 billion Events/ day
2015
~300 users*
30 Flink Production Apps
2 PB Storage5
10 billion Events/ day
2016
* Users of the information system
Bouygues: Challenges § Low latency & streaming fashion counters § Massive amounts of data + bursty loads § Reliability § Multiple flow correlation § Time management: • Out of order & late events → our worst enemies • Flexible window management
32 http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/
33 http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/
In Summary
34
§ If you need to ask: you already have a streaming use case!
§ IoT requires Proper Time Management § Apache Flink has done that for a long
time now*
* Since version 0.10
35!
Thank you! @aljoscha @ApacheFlink @dataArtisans
36
One day of hands-on Flink training One day of conference Tickets are on sale Call for Papers is already open Please visit our website: http://sf.flink-forward.org Follow us on Twitter: @FlinkForward
We are hiring!data-artisans.com/careers
Appendix
38