Log Event Stream Processing In Flink Way

18
Log Event Stream Processing In Flink Way George T. C. Lai

Transcript of Log Event Stream Processing In Flink Way

Page 1: Log Event Stream Processing In Flink Way

Log Event Stream Processing In Flink Way

George T. C. Lai

Page 2: Log Event Stream Processing In Flink Way

Who Am I?

● BlueWell Technology○ Big Data Architect○ Focuses

■ Open DC/OS■ CoreOS■ Kubernetes■ Apache Flink■ Data Science

Page 3: Log Event Stream Processing In Flink Way

What We Have Done?

Page 4: Log Event Stream Processing In Flink Way
Page 5: Log Event Stream Processing In Flink Way

A Glance of Log Events

Event Timestamp Compound Key Event Body

Page 6: Log Event Stream Processing In Flink Way

Brief Introduction To Apache Flink

Page 7: Log Event Stream Processing In Flink Way

Basic Structure

● For each Apache Flink DataStream Program○ Obtain an execution environment.

■ StreamExecutionEnvironment.getExecutionEnvironment()■ determine the method to handle timestamp (Time characteristic)

○ Load/create data sources.■ read from file■ read from socket■ read from built-in sources (Kafka, RabbitMQ, etc.)

○ Execute transformations on them.■ filter, map, reduce, etc. (Task chaining)

○ Specify where to save results of the computations.■ stdout (print)■ write to files■ write to built-in sinks (elasticsearch, Kafka, etc.)

○ Trigger the program execution.

Page 8: Log Event Stream Processing In Flink Way

Time Handling & Time Characteristics

Logs have its own timestamp.

Page 9: Log Event Stream Processing In Flink Way

Windows

● The concept of Windows○ cut an infinite stream into slices with finite elements.○ based on timestamp or some criteria.

● Construction of Windows○ Keyed Windows

■ an infinite DataStream is divided based on both window and key■ elements with different keys can be processed concurrently

○ Non-keyed Windows

● We focus on the keyed (hosts) windowing.

Page 10: Log Event Stream Processing In Flink Way

Windows

● Basic Structure○ Key○ Window assigner○ Window function

■ reduce()■ fold()■ apply()

val input: DataStream[T] = ...

input

.keyBy(<key selector>)

.window(<window assigner>)

.<windowed transformation>(<window function>)

Page 11: Log Event Stream Processing In Flink Way

Window Assigner - Global Windows

● Single per-key global window.● Only useful if a custom trigger is

specified.

Page 12: Log Event Stream Processing In Flink Way

Window Assigner - Tumbling Windows

● Defined by window size.● Windows are disjoint.

The window assigner we adopted!!

Page 13: Log Event Stream Processing In Flink Way

Window Assigner - Sliding Windows

● Defined by both window size and sliding size.

● Windows may have overlap.

Page 14: Log Event Stream Processing In Flink Way

Window Assigner - Session Windows

● Defined by gap of time.● Window time

○ starts at individual time points.○ ends once there has been a certain period of

inactivity.

Page 15: Log Event Stream Processing In Flink Way

Window Functions

● WindowFunction○ Cache elements internally○ Provides Window meta information (e.g., start time, end time, etc.)

● ReduceFunction○ Incrementally aggregation○ No access to Window meta information

● FoldFunction○ Incrementally aggregation○ No access to Window meta information

● WindowFunction with ReduceFunction / FoldFunction○ Incrementally aggregation○ Has access to Window meta information

Page 16: Log Event Stream Processing In Flink Way

Dealing with Data Lateness

● Set allowed lateness to Windows○ new in 1.1.0○ watermark passes end timestamp of window + allowedLateness.○ defaults to 0, drop event once it is late.

Page 17: Log Event Stream Processing In Flink Way

Flink VS Spark Based On My Limited Experience

● Data streams manipulation by means of built-in API○ Flink DataStream API (fine-grained)○ Spark Streaming (coarse-grained)

● Intrinsic○ Flink is a stream-intrinsic engine, time window○ Spark is a batch-intrinsic engine, mini-batch

Page 18: Log Event Stream Processing In Flink Way

We’re all set. Thank you!!!

Just Flin

k It!