Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19...

25
© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS Die Lambda Architektur als Grundmuster für Big Data Projekte Karl-Heinz Sylla, Dr. Michael Mock Fraunhofer IAIS 53757 Sankt Augustin [email protected] [email protected] GI Workshop Architekturen 2016 Hildesheim 23.6.2016 FP7 Grant 619461 http://www.ferari-project.eu Lambda Architecture: A blueprint for Big Data Applications 1

Transcript of Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19...

Page 1: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Die Lambda Architektur als Grundmuster für Big Data Projekte

Karl-Heinz Sylla, Dr. Michael Mock

Fraunhofer IAIS 53757 Sankt Augustin

[email protected] [email protected]

GI Workshop Architekturen 2016 Hildesheim 23.6.2016

FP7 Grant 619461 http://www.ferari-project.eu

Lambda Architecture: A blueprint for Big Data Applications

1

Page 2: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Big Data Projects – Big Data Components

"Big Data"-topic is the cause to

• understand, select, apply and experience analytic algorithms

• learn the features and the impacts of new technology

• start pilot projects for proof of concept

Select

• Requirements, Ideas, Tasks

• Data

• Tools

• Big Data Components

Understand Big Data Technology

Setup a justifiable, useful application architecture

2

Page 3: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

3

Nathan Marz with James Warren Big Data: Principles and best practices

of scalable real-time data systems Manning 2015

Batch view = Function(All Data)

Realtime view

= Function(Realtime view, New data)

Query = Function(Batch view, Realtime view)

Lambda Architecture

Page 4: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Answer queries consistently and fast by using precomputed views

4

precomputed Batch v iew

Information: General collection of knowledge relevant for the Big Data System.

Data: Information that can't be derived from anything else.

Query: Questions that are asked about the data.

Views : Information that has been derived from the base data. Views help to answer questions fast and cons istently

The master dataset holds base data-records that are:

raw

immutable

timestamped and eternally true

Transformation Fast, consistent Query

Batch view = Function(All Data)

Query = Function(Batch view)

Master dataset

Page 5: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Information available for queries has high latency

5

Included within Batch View

Time now

Not included

Batch transformation takes time

Views are out of date

Not included under process by batch workflow new Included within Batch View

Batch v iew Query Master dataset

Page 6: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Bridge the latency gap of batch processing by immediate processing of incoming data.

6

New Data

Realtime v iew #1

Realtime v iew #2

Stream Processor

Stream processing may compute an approximation of exact values, that are computed in batch runs.

Information that become available in Batch view may be removed from Realtime view.

"Eventual Accuracy"

Page 7: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Avoid complexity in core components "Complexity Isolation" into the Real-time view

7

Master dataset

Batch v iew

Realtime v iew

New data

Random write/update/read access

is very complex for scalable data stores.

(eventually consistent – vector clocks …)

no update

fast random read

Highly available

Immutable

Scalable

Simple

Query

Create / Read - append only

Event Sourcing

Immutable

Scalable

Simple

Page 8: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Components in a Big Data "Lambda" Architecture

8

Batch function

Speed function

Master dataset

log messages

sensor data

process data

Batch v iew

Realtime view

me

ssa

ge

pa

ssin

g

message passing

merge

query

New Data

Page 9: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Component Selection / Horizontal Scalability

9

log messages

sensor data

process data

> volume

> users

> users, volume

> velocity > volume, velocity

> parallelization

Batch function

Speed function

Batch v iew

Realtime view

me

ssa

ge

pa

ssin

g

message passing

Master dataset

New Data

Page 10: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

ETL Extract-Transform-Load + flexible Analysis

Batch Function

Speed Function

Master Dataset

log messages

sensor data

process data

Batch View

Realtime View m

ess

ag

e p

ass

ing

message passing

process

query

New Data

10

In Memory / RDBMS

Page 11: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Reporting

11

Batch function

Master dataset

log messages

sensor data

process data

report 2016-01-12 report 2016-01-13

report 2016-01-14

….

me

ssa

ge

pa

ssin

g

report shipment

New Data

Page 12: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Monitoring

12

Speed function

log messages

sensor data

process data

RDBMS message passing

control and analysis panel

New Data

Page 13: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Model learning and realtime model-application

13

Batch function

Speed function

Master dataset

log messages

sensor data

process data

(predictive) models

Realtime view

me

ssa

ge

pa

ssin

g

message passing

monitor and control dashboard

New Data

Page 14: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

New Data acquisition, filtering, cleaning … and ingestion is stream processing

14

Batch function

Speed function

Master

dataset

(predictive) models

Realtime v iew

me

ssa

ge

pa

ssin

g

message passing

monitor and control dashboard

New Data

Log

data

sensor

sensor

process

data

Page 15: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Lambda Architecture or Stream Processing

http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple

15

Page 16: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Data Stream

Data Stream

Data Stream

Bottlenecks

Massive Data Streams Create Bottlenecks

16

Page 17: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Anonymized CDR Data from HT (Croatia Telekom)

Fraud Rules provided by HT

Standard Approach: CDR data is imported into separate database where fraud rules are applied

=> import delays analysis

Challenge: Apply Rules to Data Streams

=> faster detection of fraud

=> Express fraud rules with Complex Event Patterns !

Use Case: Fraud Detection on CDR Data

17

Page 18: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Data Stream

Data Stream

Data Stream

In-Stream Processing

In-Situ Processing

Complex Event Processing

Use Cases:

Fraud Mining

Health-Monitoring

“global condition”

but monitor by local conditions

Express a

In-Situ and In-Stream Processing

18

Page 19: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Complex Event Processing

19

streaming with higher level abstractions

Event Process ing in Action, O. Etzion and P. Niblett, Manning 2010

Event Producers Complex Event Processing System

Event Consumers

Event Process ing Logic

Systems, Sensors, Business Process, Networks, Web, Industry, …

Dashboards, Data stores, Applications, Actuators, Business Processes, …

Page 20: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Complex Event Processing Principles

20

Input (simple) Events Event Process ing Logic Output (complex, derived) Events

- Timestamp

- Event Source

- Attributes

Set of patterns that define conditions on one or multiple events. If events occur that match a pattern, a new event is generated (derived event).

- Timestamp

- (Event Sources)

- Attributes

Page 21: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Input Events and Example Complex Event for Fraud

21

Input Event: CDR (call detail record) data

subscriber number, called number, timestamp, duration, start cell, end cell, …

Complex Event 1: LongCallAtNight A long call to premium distance is made during night hours.

Complex Event 2: FrequentLongCallsAtNight At least three of these LongCallAtNight-events per calling number

Page 22: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

FERARI: towards In-Situ Complex Event Processing

CEP Optimizer

logical plan

physical plan

event stream

analyzer

cost

Site Configurations

Fraud Detection

Dashboard

runtime statistics

CEP Model

Application

Rules

real-time input streams

Realtime

view

Push Pull

Page 23: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Proton on Storm

23

Proton - IBM Proactive Technology Online

research asset developed by IBM Research Haifa https://github.com/ishkin/Proton

Used in many EU Projects (FIWARE, Fispace, Psymbiosis, SPEEDD)

Patterns defined by Event Process ing Networks (EPN)

Proton on Storm

Developed by IBM in the FERARI EU Project Grant No 619491

http://www.ferari-project.eu

Distributed, scalable CEP on Storm

Use case example: fraud mining on telekom data

https://bitbucket.org/sbothe-iais/ferari

Page 24: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

Selected FERARI References

I. Flouris, V. Manikaki, N. Giatrakos, A. Deligiannakis, M. Garofalakis, M. Mock, S, Bothe, I. Skarbovsky, F. Fournier, M.Stajcer, T. Krizan, J. Yom-Tov T. Curin :FERARI: A Prototype for Complex Event Process ing over Streaming Multi -cloud Platforms , ACM SIGMOD 2016, Demo

I. Flouris, V. Manikaki, N. Giatrakos, A. Deligiannakis, M. Garofalakis, M. Mock, S, Bothe, I. Skarbovsky, F. Fournier, M.Stajcer, T. Krizan, J. Yom-Tov M. Volarevic : Complex Event Process ing over Streaming Multi-cloud Platforms - The FERARI Approach, ACM DEBS 2016, Demo

N. Giatrakos, A.Deligiannakis, M.Garofalakis: Scalable Approximate Query Tracking over Highly Distributed Data Stream , ACM SIGMOD'2016

O. Etzion, F. Fournier, I. Skarbosky: A Model Driven Approach for Event Process ing Applications , ACM DEBS 2016

http://www.ferari-project.eu

Open Source Repository https://bitbucket.org/sbothe-iais/ferari

24

Page 25: Die Lambda Architektur als Grundmuster für Big Data Projekte · Complex Event Processing 19 streaming with higher level abstractions Event Processing in Action, O. Etzion and P.

© Fraunhofer-Institut für Intelligente Analyse- und Informationssysteme IAIS

wrap up

25

Big Data - Proof of concept - Experimental applications

The Lambda Architecture

Principles of data process ing

Architecture template to recognize Big Data issues

Guide to Component Selection

Examples of Architecture instantiations

Streaming and the Internet of things - Industrie 4.0

Big Data process ing becomes Stream Process ing

In-S itu complex event process ing