Download - Flink4 jug

Big Data series : Apache Flink

Jérôme BlachonLaurent TardifStéphane Thiers

Juin 2015 : Jug Grenoble Septembre 2015 : Jug Lausanne

Qui sommes nous

Jérôme Blachon Laurent Tardif Stéphane Thiers

Un peu d’histoire

La stack

Flink Demo

Comment ca marche

Les plus

Roadmap

La soirée

Histoire

BigData success story

Map / ReduceOSDI 04

Map / ReduceOSDI 04

Hadoop1

Dryad Euro’Sys

07

Dryad Euro’Sys

07 TEZ

RDDs HotCloud’10,

NSDI’12

RDDs HotCloud’10,

NSDI’12Spark

PACTsSOCC’10, VLDB’12

PACTsSOCC’10, VLDB’12 Flink

Map/Reduce extended to DAGBacktracking recovery

Map/Reduce extended to DAGBacktracking recovery

Small recoverable tasksSequencial code

Small recoverable tasksSequencial code

Functional implementation of Dryad

recovery

Functional implementation of Dryad

recovery

Cyclic Graph (and incremental construction)Query Processing runtime embed in DAG

engine

Cyclic Graph (and incremental construction)Query Processing runtime embed in DAG

engine

Stonebraker/ Cetintemel /

Zdonik2005

Stonebraker/ Cetintemel /

Zdonik2005

● Keep data moving● Low latency on critical path

● Query on stream● High level language

● Handle stream imperfection● Timeout (ex: avg of last 25 securities)

● Out of order (must leave window open)

● Generate predictable outcomes● Time ordered

Criteria for stream processing (1/2)

● Integrate stored / streaming data● Uniform language for both stored and streamed data

● Combine streamed and stored data

● Data safety / availability● Resistant to failure

● Partition and scale automatically● Process and respond instantaneously

● 100 000 msg / s

Criteria for stream processing (2/2)

Big data stack

The stack

Data Processing engine Data Processing engine

User requirementUser requirement

App and ressource managementApp and ressource management

Storage / streamStorage / stream

Eco system

Applications

Data processing engines

App and resourcemanagement

Storage/Stream

Une autre vue

http://practicalanalytics.wordpress.com

Word count

The hello world

// read test file or in Memory, and generate a set of StringDataSet<String> text = getTextDataSet(env);DataSet<Tuple2<String, Integer>> counts = // split up the lines in pairs (2-tuples) containing: (word,1)text.flatMap(new Tokenizer())

// group by the tuple field "0" and sum up tuple field "1“.groupBy(0).sum(1);

Word count“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",

"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",

"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",

(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

(or,1)(or,1)

(to,2)(to,2)

(be,2)(be,2)

(or,1)(or,1)

Flatmap(tojenizer)

groupby

sum

Data in memory

public static final String[] WORDS = new String[] {"To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune","Or to take arms against a sea of troubles,","And by opposing end them?--To die,--to sleep,--","No more; and by a sleep to say we end","The heartache, and the thousand natural shocks","That flesh is heir to,--'tis a consummation","Devoutly to be wish'd. To die,--to sleep;--",….

File

private static DataSet<String> getTextDataSet(ExecutionEnvironment env) {

return env.readTextFile(textPath);

}

With POJO

public static class Word {

// fields

private String word;

private Integer frequency;

// constructors

public Word() { }

public Word(String word, int i) {

this.word = word;

this.frequency = i; }

// getters setters

// to String

@Override

public String toString() {

return "Word="+word+" freq="+frequency;

}

Pojo“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",





Word 1 {to,1}Word 1 {to,1}

Word 2 {be,1}Word 2 {be,1}

Word 3 {or,1}Word 3 {or,1}

Word 1 {to,1}Word 5 {to,1}Word 1 {to,1}Word 5 {to,1}

Word 2 {be,2}Word 6 {be,1}Word 2 {be,2}Word 6 {be,1}

Word 3 {be,1}Word 3 {be,1}

Word7 {to,2}Word7 {to,2}

Word8 {be,2}Word8 {be,2}

Word9 {or,1}Word9 {or,1}

Flatmap(tokenizer)

groupby

sum

JDBC

(“To be, or not to be,--that is the question:--")(“To be, or not to be,--that is the question:--")

("Whether 'tis nobler in the mind to suffer")("Whether 'tis nobler in the mind to suffer")

(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

(or,1)(or,1)

(to,2)(to,2)

(be,2)(be,2)

(or,1)(or,1)

Map +Flatmap(tokenizer)

groupby

sum

hamlet

“To be, or not to be,--that is the question:--",

"Whether 'tis nobler in the mind to suffer",

Stream“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",





(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

(or,1)(or,1)

(to,2)(to,2)

(be,2)(be,2)

Flatmap(tokenizer)

groupby

sum

(or,1)(or,1)






(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

(or,1)(or,1)

(to,2)(to,2)

(be,2)(be,2)

Flatmap(tokenizer)

groupby

sum

"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",

(or,1)(or,1)






(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

(or,1)

(to,2)(to,2)

(be,2)(be,2)

Flatmap(tokenizer)

groupby

sum



(or,1)(or,1)






(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

(or,1)(or,1)

(to,2)(to,2)

(be,2)(be,2)

Flatmap(tokenizer)

groupby

sum



(or,1)(or,1)

(or,1)(or,1)






(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

(or,1)(or,1)

(to,2)(to,2)

(be,2)(be,2)

(or,2)

Flatmap(tokenizer)

groupby

sum



(or,1)(or,1)

(or,1)(or,1)

Multiple “To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",



(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

(or,1)(or,1)

(to,2)(to,2)

(be,2)(be,2)

Flatmap(tokenizer)

groupby

sum

(or,1)(or,1)







Multiple“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",



(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

......

(to,2)(to,2)

(be,2)(be,2)

Flatmap(tokenizer)

groupby

sum





“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",

(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

Groupby + sum

(to,6)(to,6)

(be,6)(be,6)

(or,3)(or,3)

......

...... ......

Demo

Produit1 , 14 , 1/6/2015Produit1 , 14 , 1/6/2015

Produit2 , 13.5 , 1/6/2015Produit2 , 13.5 , 1/6/2015

Produit3 , 24 , 1/6/2015Produit3 , 24 , 1/6/2015

Produit1 , 14 , 30/5/2015Produit1 , 14 , 30/5/2015Produit2 , 13 , 30/5/2015Produit2 , 13 , 30/5/2015Produit3 , 24 , 30/5/2015Produit3 , 24 , 30/5/2015Produit4 , 124 , 30/5/2015Produit4 , 124 , 30/5/2015

Produit1 Prix moyen (sur 7j) : 14 Prix moyen (sur 30j) : 14 Prix moyen (sur 365j) : 13.5

Produit1 Prix moyen (sur 7j) : 14 Prix moyen (sur 30j) : 14 Prix moyen (sur 365j) : 13.5

Produit1 : 14 , 1/6/2015 14 , 30/5/2015 13 , 29/5/2015

Produit1 : 14 , 1/6/2015 14 , 30/5/2015 13 , 29/5/2015

Produit2 : 13.5 , 1/6/2015 13 , 30/5/2015 13 , 29/5/2015

Produit2 : 13.5 , 1/6/2015 13 , 30/5/2015 13 , 29/5/2015

Demo 2 : twitter

twit, Flink is…, 1/6/2015twit, Flink is…, 1/6/2015

twit, #Flink, 1/6/2015twit, #Flink, 1/6/2015






Cloud TagCloud Tag@writer1: #Flink, 1/6/2015 #Flink, 30/5/2015 #Flink, 29/5/2015

@writer1: #Flink, 1/6/2015 #Flink, 30/5/2015 #Flink, 29/5/2015



JiraJira

stackoverflowstackoverflow

Demo 3 : scala shell

… Word count demo from flink scalashell ...

Demo 4 : ML demo

Classifier (SVM) from MLLib– Scala only

Learn + Predict

Some basics (covered by demo)type, streaming, loop,….

Tuples avec des types primitifsDataSet<Tuple2<String, Integer>> wordCounts = env.fromElements(

new Tuple2<String, Integer>("hello", 1),new Tuple2<String, Integer>("world", 2));

Pojo (constructor + get/set) public class WordWithCount {

public String word; public int count; public WordCount() {} public WordCount(String word, int count) {

this.word = word; this.count = count;

} }

Hadoop org.apache.hadoop.Writable interface

Data

//local file systemDataSet<String> localLines = env.readTextFile("file:///path/to/my/textfile");

// read text file from a HDFS running at nnHost:nnPort DataSet<String> hdfsLines = env.readTextFile("hdfs://nnHost:nnPort/path/to/my/textfile");

// read a CSV file with three fields DataSet<Tuple3<Integer, String, Double>> csvInput = env.readCsvFile("hdfs:///the/CSV/file") .types(Integer.class, String.class, Double.class);

// create a set from some given elements DataSet<String> value = env.fromElements("Foo", "bar", "foobar", "fubar");

Data sources : File based

// Read data from a relational database using the JDBC input format DataSet<Tuple2<String, Integer> dbData =

env.createInput( // create and configure input format

JDBCInputFormat.buildJDBCInputFormat() .setDrivername("org.apache.derby.jdbc.EmbeddedDriver")

.setDBUrl("jdbc:derby:memory:persons")

.setQuery("select name, age from persons")

.finish(),

// specify type information for DataSet

new TupleTypeInfo(Tuple2.class, STRING_TYPE_INFO, INT_TYPE_INFO) );

Data sources

// text

data DataSet<String> textData = // [...]

// write DataSet to a file on the local file system textData.writeAsText("file:///my/result/on/localFS");

// write DataSet to a file on a HDFS with a namenode running at nnHost:nnPort textData.writeAsText("hdfs://nnHost:nnPort/my/result/on/localFS");

// write DataSet to a file and overwrite the file if it exists textData.writeAsText("file:///my/result/on/localFS", WriteMode.OVERWRITE);

// tuples as lines with pipe as the separator "a|b|c"

DataSet<Tuple3<String, Integer, Double>> values = // [...] values.writeAsCsv("file:///path/to/the/result/file", "\n", "|");

Data Sinks

Variable and storage

DataSet<Tuple...> large = env.readCsv(...);DataSet<Tuple...> medium = env.readCsv(...);DataSet<Tuple...> small = env.readCsv(...);

DataSet<Tuple...> LargeAndMedium = large.join(medium) .where(3).equals(1)

.with(new JoinFunction() { ... });

DataSet<Tuple...> LargeMediumAndSmall= small.join(joined1).where(0).equals(2)

.with(new JoinFunction() { ... });

DataSet<Tuple...> result = LargeMediumAndSmall.groupBy(3).aggregate(MAX, 2); DataSet<Tuple...> otherresult = LargeMedium.groupBy(3).aggregate(MAX, 2); DataSet<Tuple...> oneMoreresult = Large.groupBy(3).aggregate(MAX, 2);

Map

Filter

Reduce

Join

Cross

Union

First-n

….

Lazy Evaluation

Operators

Datastream

continuous, parallel, immutable stream of data

Socket stream (twitter, …)

Message Queue connector (RabbitMQ)

FileStream

Streaming

Iterative

Algorithms that need iterations Clustering (K-Means, Canopy, …)

Gradient descent (e.g., Logistic Regression, Matrix Factorization)

Graph Algorithms (e.g., PageRank, Line-Rank, components, paths, reachability, centrality, )

Graph communities / dense sub-components

Inference (believe propagation)

…

Loop makes multiple passes over the data

40

Windowing

(to,2)(to,2) (be,2)(be,2)……

.window(Count.of(4)).every(Count.of(2))

41

CountTime

….

CountTime

….

CountTime

….

CountTime

….

Windowing

(to,2)(to,2) (be,2)(be,2) (or,1)(or,1) (lord,2)(lord,2)……


42

CountTime

….

CountTime

….

CountTime

….

CountTime

….

Windowing

(to,2)(to,2) (be,2)(be,2) (or,1)(or,1) (lord,2)(lord,2) (my,2)(my,2) (king,1)(king,1)……


43

CountTime

….

CountTime

….

CountTime

….

CountTime

….

Go inside Flink

Comment ca marche : idée naïve

CodeCode

Flink

Job Manager

Job Manager

Execution Plan

Execution Plan

DataData

ResultsResults

Execution plan

We have resources, let’s optimize it !

CodeCode

Flink

Job Manager

Job Manager

Execution Plan

Execution Plan

DataData

ResultResult

DataData

ResultResult

DataData

ResultResult

DataData

ResultResult

Distributed Runtime

49

Master (Job Manager) handles job submission, scheduling, and metadata

Workers (Task Managers) execute operations

Data can be streamed between nodes

All operators startin-memory and graduallygo out-of-core

How the magic happen - Flink Runtime - Flink Optimizer

50

The optimizer is the component that selects an execution plan for a Common API program

Think of an AI system manipulating your program for you

But don’t be scared – it works• Relational databases

have been doing this for decades – Flink ports the technology to API-based systems

Flink Optimizer

51

Program lifecycle

52

val source1 = …val source2 = …val m axed = source1 .m ap(v = > (v._1,v._2, m ath.m ax(v._1,v._2))val filtered = source2 .filter(v = > (v._1 > 4))val result = m axed .join(filtered).w here(0).equalTo(0) .filter(_1 > 3) .groupBy(0) .reduceG roup {… … }

1

3

4

5

2

Forwarded fields@ForwardedFields("f0->f2")

public class MyMap implements MapFunction<Tuple2<…>, Tuple3<…>> {

@Override public Tuple3<…> map(Tuple2<…> val) {

return new Tuple3<…>("foo", val.f1 / 2, val.f0);} }

Some fancy stuff to help him

PartitioningPartitioning controls how individual data points of a stream are

distributed/ordering among the parallel instances of the transformation operators. There are several partitioning types supported in Flink Streaming:

Ex :

Forward(default): Forward partitioning directs the output data to the next operator on the same machine (if possible) avoiding expensive network I/O

Shuffle: Shuffle partitioning randomly partitions the output data stream to the next operator using uniform distribution.

Rebalance: Rebalance partitioning directs the output data stream to the next operator in a round-robin fashion

Broadcast: Broadcast partitioning sends the output data stream to all parallel instances of the next operator. Usage: dataStream.broadcast()

Some fancy stuff to help him

Performance

● -Plus d'info soon

● Demo sur 100.000 produits/3 ans de prix => ~ 20 minutes

● Sur un “petit cluster” de 3 noeuds : 4 procs, 8gb de ram virtualisé

Performance

Limites

API still moving

Diagnosic is hard …. Flink, hadoop, network, OS , jvm …

Heap usage (too ?) important

Limitation

API & Big Data eco system

The growing Flink stack

60

Flink Optimizer Flink Stream Builder

Common API

Scala API Java API

Python API

(upcoming)

Graph APIApache MRQL

Flink Local Runtime

Embedded environment(Java collections)

Local Environment(for debugging)

Remote environment(Regular cluster execution)

Apache Tez

Data storage

HDFS Files S3 JDBC Redis Rabbit

MQKafka

Azure tables

…

Single node execution Standalone or YARN cluster

Roadmap

61

Flink Roadmap

Currently being discussed by the Flink community

Flink has a major release every 3 months, and one or more bug-fixing releases between major releases

Caveat: rough roadmap, depends on volunteer work, outcome of community discussion, and Apache open source processes

62

Roadmap for 2015 (highlights)

Q1 Q2 Q3

APIs Logical Query integration

Additional operators

Interactive programs

Interactive Scala shell

SQL-on-Flink

Optimizer Semantic annotations

HCatalog integration

Optimizer hints

Runtime Dual engine (blocking & pipelining)

Fine-grained fault tolerance

Dynamic memory allocation

Streaming Better memory management

More operators in API

At-least-once processing guarantees

Unify batch and streaming

Exactly-once processing guarantees

ML library First version Additional algorithms

Mahout integration

Graph library

First version

Integration

Tez, Samoa Mahout

63

Integration with other projects

Machine Learning – Samoa (incubating):

distributed streaming machine learning (ML) framework

Apache Tez (run complex directed-acyclic-graph of tasks for processing data ) (simplify Pig, Hive task definition)

Storage – Tachyon(Tachyon is a

memory-centric distributed storage system)

Mahout (Data analytics) – H2O (distributed scalable

machine learning system)

Apache Hive (High level langage for data processing)

● Expected Q3/Q4 2015

Apache Zepelin (inc.) A web-based notebook that enables interactive data analytics.

64

And many more…

Runtime: even better performance and robustnessUsing off-heap memory, dynamic memory allocation

Improvements to the Flink optimizerIntegration with HCatalog, better statistics

Runtime optimization

Streaming graph and ML pipeline libraries

65

Sumary and conclusion

Flink is optimized for cyclic or iterative processes by using iterative transformations on collections.

Flink streaming processes data streams as true streams, i.e., data elements are immediately "pipelined" though a streaming program as soon as they arrive. This allows to perform flexible window operations on streams.

Built-in optimizer

Flink in one slide

flink.apache.orghttp://flink-forward.org/ : 15 oct : Berlin

http://flink-forward.org/