Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha...

39

Transcript of Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha...

Page 1: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek
Page 2: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

1!

Aljoscha Krettek @aljoscha

Big Data Spain November 17, 2016

Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics

Page 3: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

What I’d Like to Talk About

2

§  Streaming Architecture and Flink

§  IoT and Event-Time based stream processing

§  Use-Case Examples

Page 4: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

3

Original creators of Apache Flink®

Providers of the dA Platform, a supported

Flink distribution

Page 5: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Intro: The Streaming Architecture

4

Page 6: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Rethinking Data Architecture

§  Better app isolation

§  Real-time reaction to events

§  Robust continuous applications

§  Process both real-time and historical data 5

Page 7: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

6

app state

app state

app state

event log

Query service

Page 8: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

What is (Distributed) Streaming §  Streaming:

Computations on never-ending “streams” of data records (“events”)

§  Distributed: Computation spread across many machines

7

Your code

Your code

Your code

Your code

Page 9: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

What is Stateful Streaming §  Computation and state

•  E.g., counters, windows of past events, state machines, trained ML models

§  Result depends on history of stream

§  A stateful stream processor should gives the tools to manage state •  Recover, roll back, version,

upgrade, etc 8

Your code

state

Page 10: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

What is Event-Time Streaming §  Data records associated with

timestamps (time series data)

§  Processing depends on timestamps

§  An event-time stream processor should give you the tools to reason about time •  Handle streams that are out of order •  Core feature is watermarks – a clock

to measure event time

9

Your code

state

t3 t1 t2 t4 t1-t2 t3-t4

Page 11: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Recap: What is Streaming? §  Continuous processing on data that is

continuously generated §  I.e., pretty much all “big” data §  It’s all about state and time §  Flink does all of what we just saw

10

Page 12: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

IoT and Event-time Stream Processing

11

Page 13: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

12 1read.bi/1yDOQQ3

The 'Internet Of Everything' Will Generate $14.4 Trillion Of Value Over The Next Decade.1

Page 14: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Example Event Sources

13

Page 15: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

A Simple Definition

14

IoT use cases from the system’s perspective: A large number of (distributed) things generating a large amount of data.

Page 16: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Important Properties

15

§  Data is continuously produced → Stream Processing

§  Events have a timestamp that has to be considered → Event-time based processing

§  Data/Events can arrive with huge delays §  Most analyses happen on time windows

Page 17: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Remember: Streaming technology is enabling the obvious: continuous

processing on data that is continuously produced

Hint: you already have streaming data 16

Page 18: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

What Is Event-Time Processing

17

13127359611 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 Processing Time

Event timestamp

Message Queue

Page 19: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

What’s The Problem?

18

13

12

7359611 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 Processing Time

Processing-Time Windows 137356

12 137 356Event-Time Windows

12

11 12

Mismatch between event time and processing time.

Page 20: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Sources of Time Mismatch §  Big Mismatch •  Network disconnects •  Slow network

§  Small Mismatch •  The nature of distributed systems •  Differing system clock time

19

Page 21: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Big Event-Time Mismatch

20

1977 1980 1983 1999 2002 2005 2015

Processing Time

Episode IV

Episode V

Episode VI

Episode I

Episode II

Episode III

Episode VII

Event Time

Page 22: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Small Event-Time Mismatch

21

Robust Stream Processing with Apache Flink®: A Simple Walkthrough http://data-artisans.com/robust-stream-processing-flink-walkthrough/

Page 23: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

22

Page 24: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

23

Page 25: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

24

Page 26: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Recap: Event-Time §  IoT use cases need event-time

processing §  Even small mismatch of event time/

processing time will lead to wrong results

25

Page 27: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Use-Case Examples

26

Page 28: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

30 Flink applications in production for more than one year. 10 billion events (2TB) processed daily

Complex jobs of > 30 operators running 24/7, processing 30 billion events daily, maintaining state of 100s of GB with exactly-once guarantees

Largest job has > 20 operators, runs on > 5000 vCores in 1000-node cluster, processes millions of events per second

27

Page 29: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

King §  Challenges: •  Many games (Candy Crush, Farm Heroes, Pet

Rescue, and Bubble Witch…) •  300 million monthly unique users •  30 billion events received every day

§  Need Event-Time Based statistics

28 https://techblog.king.com/rbea-scalable-real-time-analytics-king/

Page 30: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Solution: RBEA

29 https://techblog.king.com/rbea-scalable-real-time-analytics-king/

Page 31: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Solution: RBEA §  Multiplexing of multiple data scientist

requests into a single Flink job §  Groovy as language for analysis scripts §  Event-time windowing

30 https://techblog.king.com/rbea-scalable-real-time-analytics-king/

Page 32: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Bouygues Telecom

31 http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/

~120 users*

5 Flink Production Apps

750 TB Storage

4 billion Events/ day

2015

~300 users*

30 Flink Production Apps

2 PB Storage5

10 billion Events/ day

2016

* Users of the information system

Page 33: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Bouygues: Challenges §  Low latency & streaming fashion counters §  Massive amounts of data + bursty loads §  Reliability §  Multiple flow correlation §  Time management: •  Out of order & late events → our worst enemies •  Flexible window management

32 http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/

Page 34: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

33 http://flink-forward.org/kb_sessions/a-brief-history-of-time-with-apache-flink-real-time-monitoring-and-analysis-with-flink-kafka-hb/

Page 35: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

In Summary

34

§  If you need to ask: you already have a streaming use case!

§  IoT requires Proper Time Management §  Apache Flink has done that for a long

time now*

* Since version 0.10

Page 36: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

35!

Thank you! @aljoscha @ApacheFlink @dataArtisans

Page 37: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

36

One day of hands-on Flink training One day of conference Tickets are on sale Call for Papers is already open Please visit our website: http://sf.flink-forward.org Follow us on Twitter: @FlinkForward

Page 38: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

We are hiring!data-artisans.com/careers

Page 39: Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Analytics by Aljoscha Krettek

Appendix

38