Download - Flink4 jug

Transcript
Page 1: Flink4 jug

Big Data series : Apache Flink

Jérôme BlachonLaurent TardifStéphane Thiers

Juin 2015 : Jug Grenoble Septembre 2015 : Jug Lausanne

Page 2: Flink4 jug

Qui sommes nous

Jérôme Blachon Laurent Tardif Stéphane Thiers

Page 3: Flink4 jug

Un peu d’histoire

La stack

Flink Demo

Comment ca marche

Les plus

Roadmap

La soirée

Page 4: Flink4 jug

Histoire

Page 5: Flink4 jug

BigData success story

Map / ReduceOSDI 04

Map / ReduceOSDI 04

Hadoop1

Dryad Euro’Sys

07

Dryad Euro’Sys

07 TEZ

RDDs HotCloud’10,

NSDI’12

RDDs HotCloud’10,

NSDI’12Spark

PACTsSOCC’10, VLDB’12

PACTsSOCC’10, VLDB’12 Flink

Map/Reduce extended to DAGBacktracking recovery

Map/Reduce extended to DAGBacktracking recovery

Small recoverable tasksSequencial code

Small recoverable tasksSequencial code

Functional implementation of Dryad

recovery

Functional implementation of Dryad

recovery

Cyclic Graph (and incremental construction)Query Processing runtime embed in DAG

engine

Cyclic Graph (and incremental construction)Query Processing runtime embed in DAG

engine

Stonebraker/ Cetintemel /

Zdonik2005

Stonebraker/ Cetintemel /

Zdonik2005

Page 6: Flink4 jug

● Keep data moving● Low latency on critical path

● Query on stream● High level language

● Handle stream imperfection● Timeout (ex: avg of last 25 securities)

● Out of order (must leave window open)

● Generate predictable outcomes● Time ordered

Criteria for stream processing (1/2)

Page 7: Flink4 jug

● Integrate stored / streaming data● Uniform language for both stored and streamed data

● Combine streamed and stored data

● Data safety / availability● Resistant to failure

● Partition and scale automatically● Process and respond instantaneously

● 100 000 msg / s

Criteria for stream processing (2/2)

Page 8: Flink4 jug

Big data stack

Page 9: Flink4 jug

The stack

Data Processing engine Data Processing engine

User requirementUser requirement

App and ressource managementApp and ressource management

Storage / streamStorage / stream

Page 10: Flink4 jug

Eco system

Applications

Data processing engines

App and resourcemanagement

Storage/Stream

Page 11: Flink4 jug

Une autre vue

http://practicalanalytics.wordpress.com

Page 12: Flink4 jug

Demo

Page 13: Flink4 jug

Word count

The hello world

// read test file or in Memory, and generate a set of StringDataSet<String> text = getTextDataSet(env);DataSet<Tuple2<String, Integer>> counts = // split up the lines in pairs (2-tuples) containing: (word,1)text.flatMap(new Tokenizer())

// group by the tuple field "0" and sum up tuple field "1“.groupBy(0).sum(1);

Page 14: Flink4 jug

Word count“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",

"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",

"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",

(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

(or,1)(or,1)

(to,2)(to,2)

(be,2)(be,2)

(or,1)(or,1)

Flatmap(tojenizer)

groupby

sum

Page 15: Flink4 jug

Data in memory

public static final String[] WORDS = new String[] {"To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune","Or to take arms against a sea of troubles,","And by opposing end them?--To die,--to sleep,--","No more; and by a sleep to say we end","The heartache, and the thousand natural shocks","That flesh is heir to,--'tis a consummation","Devoutly to be wish'd. To die,--to sleep;--",….

Page 16: Flink4 jug

File

private static DataSet<String> getTextDataSet(ExecutionEnvironment env) {

return env.readTextFile(textPath);

}

Page 17: Flink4 jug

With POJO

public static class Word {

// fields

private String word;

private Integer frequency;

// constructors

public Word() { }

public Word(String word, int i) {

this.word = word;

this.frequency = i; }

// getters setters

// to String

@Override

public String toString() {

return "Word="+word+" freq="+frequency;

}

Page 18: Flink4 jug

Pojo“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",

"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",

"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",

Word 1 {to,1}Word 1 {to,1}

Word 2 {be,1}Word 2 {be,1}

Word 3 {or,1}Word 3 {or,1}

Word 1 {to,1}Word 5 {to,1}Word 1 {to,1}Word 5 {to,1}

Word 2 {be,2}Word 6 {be,1}Word 2 {be,2}Word 6 {be,1}

Word 3 {be,1}Word 3 {be,1}

Word7 {to,2}Word7 {to,2}

Word8 {be,2}Word8 {be,2}

Word9 {or,1}Word9 {or,1}

Flatmap(tokenizer)

groupby

sum

Page 19: Flink4 jug

JDBC

(“To be, or not to be,--that is the question:--")(“To be, or not to be,--that is the question:--")

("Whether 'tis nobler in the mind to suffer")("Whether 'tis nobler in the mind to suffer")

(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

(or,1)(or,1)

(to,2)(to,2)

(be,2)(be,2)

(or,1)(or,1)

Map +Flatmap(tokenizer)

groupby

sum

hamlet

“To be, or not to be,--that is the question:--",

"Whether 'tis nobler in the mind to suffer",

Page 20: Flink4 jug

Stream“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",

"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",

"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",

(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

(or,1)(or,1)

(to,2)(to,2)

(be,2)(be,2)

Flatmap(tokenizer)

groupby

sum

(or,1)(or,1)

Page 21: Flink4 jug

Stream“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",

"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",

"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",

(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

(or,1)(or,1)

(to,2)(to,2)

(be,2)(be,2)

Flatmap(tokenizer)

groupby

sum

"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",

(or,1)(or,1)

Page 22: Flink4 jug

Stream“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",

"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",

"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",

(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

(or,1)

(to,2)(to,2)

(be,2)(be,2)

Flatmap(tokenizer)

groupby

sum

"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",

"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",

(or,1)(or,1)

Page 23: Flink4 jug

Stream“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",

"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",

"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",

(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

(or,1)(or,1)

(to,2)(to,2)

(be,2)(be,2)

Flatmap(tokenizer)

groupby

sum

"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",

"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",

(or,1)(or,1)

(or,1)(or,1)

Page 24: Flink4 jug

Stream“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",

"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",

"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",

(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

(or,1)(or,1)

(to,2)(to,2)

(be,2)(be,2)

Flatmap(tokenizer)

groupby

sum

"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",

"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",

(or,1)(or,1)

(or,1)(or,1)

Page 25: Flink4 jug

Stream“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",

"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",

"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",

(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

(or,1)(or,1)

(to,2)(to,2)

(be,2)(be,2)

(or,2)

Flatmap(tokenizer)

groupby

sum

"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",

"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",

(or,1)(or,1)

(or,1)(or,1)

Page 26: Flink4 jug

Multiple “To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",

(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

(or,1)(or,1)

(to,2)(to,2)

(be,2)(be,2)

Flatmap(tokenizer)

groupby

sum

(or,1)(or,1)

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",

“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",

Page 27: Flink4 jug

Multiple“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",

(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

......

(to,2)(to,2)

(be,2)(be,2)

Flatmap(tokenizer)

groupby

sum

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",

“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",

(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)

(be,1)(be,1)

(or,1)(or,1)

(to,1)(to,1)(to,1)(to,1)

(be,1)(be,1)(be,1)(be,1)

Groupby + sum

(to,6)(to,6)

(be,6)(be,6)

(or,3)(or,3)

......

...... ......

Page 28: Flink4 jug

Demo

Produit1 , 14 , 1/6/2015Produit1 , 14 , 1/6/2015

Produit2 , 13.5 , 1/6/2015Produit2 , 13.5 , 1/6/2015

Produit3 , 24 , 1/6/2015Produit3 , 24 , 1/6/2015

Produit1 , 14 , 30/5/2015Produit1 , 14 , 30/5/2015Produit2 , 13 , 30/5/2015Produit2 , 13 , 30/5/2015Produit3 , 24 , 30/5/2015Produit3 , 24 , 30/5/2015Produit4 , 124 , 30/5/2015Produit4 , 124 , 30/5/2015

Produit1 Prix moyen (sur 7j) : 14 Prix moyen (sur 30j) : 14 Prix moyen (sur 365j) : 13.5

Produit1 Prix moyen (sur 7j) : 14 Prix moyen (sur 30j) : 14 Prix moyen (sur 365j) : 13.5

Produit1 : 14 , 1/6/2015 14 , 30/5/2015 13 , 29/5/2015

Produit1 : 14 , 1/6/2015 14 , 30/5/2015 13 , 29/5/2015

Produit2 : 13.5 , 1/6/2015 13 , 30/5/2015 13 , 29/5/2015

Produit2 : 13.5 , 1/6/2015 13 , 30/5/2015 13 , 29/5/2015

Page 29: Flink4 jug

Demo 2 : twitter

twit, Flink is…, 1/6/2015twit, Flink is…, 1/6/2015

twit, #Flink, 1/6/2015twit, #Flink, 1/6/2015

twit, #Flink, 1/6/2015twit, #Flink, 1/6/2015

twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015

twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015

twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015

twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015

Cloud TagCloud Tag@writer1: #Flink, 1/6/2015 #Flink, 30/5/2015 #Flink, 29/5/2015

@writer1: #Flink, 1/6/2015 #Flink, 30/5/2015 #Flink, 29/5/2015

@writer3: #Flink, 1/6/2015 #Flink, 30/5/2015 #Flink, 29/5/2015

@writer3: #Flink, 1/6/2015 #Flink, 30/5/2015 #Flink, 29/5/2015

JiraJira

stackoverflowstackoverflow

Page 30: Flink4 jug

Demo 3 : scala shell

… Word count demo from flink scalashell ...

Page 31: Flink4 jug

Demo 4 : ML demo

Classifier (SVM) from MLLib– Scala only

Learn + Predict

Page 32: Flink4 jug

Some basics (covered by demo)type, streaming, loop,….

Page 33: Flink4 jug

Tuples avec des types primitifsDataSet<Tuple2<String, Integer>> wordCounts = env.fromElements(

new Tuple2<String, Integer>("hello", 1),new Tuple2<String, Integer>("world", 2));

Pojo (constructor + get/set) public class WordWithCount {

public String word; public int count; public WordCount() {} public WordCount(String word, int count) {

this.word = word; this.count = count;

} }

Hadoop org.apache.hadoop.Writable interface

Data

Page 34: Flink4 jug

//local file systemDataSet<String> localLines = env.readTextFile("file:///path/to/my/textfile");

// read text file from a HDFS running at nnHost:nnPort DataSet<String> hdfsLines = env.readTextFile("hdfs://nnHost:nnPort/path/to/my/textfile");

// read a CSV file with three fields DataSet<Tuple3<Integer, String, Double>> csvInput = env.readCsvFile("hdfs:///the/CSV/file") .types(Integer.class, String.class, Double.class);

// create a set from some given elements DataSet<String> value = env.fromElements("Foo", "bar", "foobar", "fubar");

Data sources : File based

Page 35: Flink4 jug

// Read data from a relational database using the JDBC input format DataSet<Tuple2<String, Integer> dbData =

env.createInput( // create and configure input format

JDBCInputFormat.buildJDBCInputFormat() .setDrivername("org.apache.derby.jdbc.EmbeddedDriver")

.setDBUrl("jdbc:derby:memory:persons")

.setQuery("select name, age from persons")

.finish(),

// specify type information for DataSet

new TupleTypeInfo(Tuple2.class, STRING_TYPE_INFO, INT_TYPE_INFO) );

Data sources

Page 36: Flink4 jug

// text

data DataSet<String> textData = // [...]

// write DataSet to a file on the local file system textData.writeAsText("file:///my/result/on/localFS");

// write DataSet to a file on a HDFS with a namenode running at nnHost:nnPort textData.writeAsText("hdfs://nnHost:nnPort/my/result/on/localFS");

// write DataSet to a file and overwrite the file if it exists textData.writeAsText("file:///my/result/on/localFS", WriteMode.OVERWRITE);

// tuples as lines with pipe as the separator "a|b|c"

DataSet<Tuple3<String, Integer, Double>> values = // [...] values.writeAsCsv("file:///path/to/the/result/file", "\n", "|");

Data Sinks

Page 37: Flink4 jug

Variable and storage

DataSet<Tuple...> large = env.readCsv(...);DataSet<Tuple...> medium = env.readCsv(...);DataSet<Tuple...> small = env.readCsv(...);

DataSet<Tuple...> LargeAndMedium = large.join(medium) .where(3).equals(1)

.with(new JoinFunction() { ... });

DataSet<Tuple...> LargeMediumAndSmall= small.join(joined1).where(0).equals(2)

.with(new JoinFunction() { ... });

DataSet<Tuple...> result = LargeMediumAndSmall.groupBy(3).aggregate(MAX, 2); DataSet<Tuple...> otherresult = LargeMedium.groupBy(3).aggregate(MAX, 2); DataSet<Tuple...> oneMoreresult = Large.groupBy(3).aggregate(MAX, 2);

Page 38: Flink4 jug

Map

Filter

Reduce

Join

Cross

Union

First-n

….

Lazy Evaluation

Operators

Page 39: Flink4 jug

Datastream

continuous, parallel, immutable stream of data

Socket stream (twitter, …)

Message Queue connector (RabbitMQ)

FileStream

Streaming

Page 40: Flink4 jug

Iterative

Algorithms that need iterations Clustering (K-Means, Canopy, …)

Gradient descent (e.g., Logistic Regression, Matrix Factorization)

Graph Algorithms (e.g., PageRank, Line-Rank, components, paths, reachability, centrality, )

Graph communities / dense sub-components

Inference (believe propagation)

Loop makes multiple passes over the data

40

Page 41: Flink4 jug

Windowing

(to,2)(to,2) (be,2)(be,2)……

.window(Count.of(4)).every(Count.of(2))

41

CountTime

….

CountTime

….

CountTime

….

CountTime

….

Page 42: Flink4 jug

Windowing

(to,2)(to,2) (be,2)(be,2) (or,1)(or,1) (lord,2)(lord,2)……

.window(Count.of(4)).every(Count.of(2))

42

CountTime

….

CountTime

….

CountTime

….

CountTime

….

Page 43: Flink4 jug

Windowing

(to,2)(to,2) (be,2)(be,2) (or,1)(or,1) (lord,2)(lord,2) (my,2)(my,2) (king,1)(king,1)……

.window(Count.of(4)).every(Count.of(2))

43

CountTime

….

CountTime

….

CountTime

….

CountTime

….

Page 44: Flink4 jug

Go inside Flink

Page 45: Flink4 jug

© 2015 Persistent Systems Ltd 45

Page 46: Flink4 jug

Comment ca marche : idée naïve

CodeCode

Flink

Job Manager

Job Manager

Execution Plan

Execution Plan

DataData

ResultsResults

Page 47: Flink4 jug

Execution plan

Page 48: Flink4 jug

We have resources, let’s optimize it !

CodeCode

Flink

Job Manager

Job Manager

Execution Plan

Execution Plan

DataData

ResultResult

DataData

ResultResult

DataData

ResultResult

DataData

ResultResult

Page 49: Flink4 jug

Distributed Runtime

49

Master (Job Manager) handles job submission, scheduling, and metadata

Workers (Task Managers) execute operations

Data can be streamed between nodes

All operators startin-memory and graduallygo out-of-core

Page 50: Flink4 jug

How the magic happen - Flink Runtime - Flink Optimizer

50

Page 51: Flink4 jug

The optimizer is the component that selects an execution plan for a Common API program

Think of an AI system manipulating your program for you

But don’t be scared – it works• Relational databases

have been doing this for decades – Flink ports the technology to API-based systems

Flink Optimizer

51

Page 52: Flink4 jug

Program lifecycle

52

val source1 = …val source2 = …val m axed = source1 .m ap(v = > (v._1,v._2, m ath.m ax(v._1,v._2))val filtered = source2 .filter(v = > (v._1 > 4))val result = m axed .join(filtered).w here(0).equalTo(0) .filter(_1 > 3) .groupBy(0) .reduceG roup {… … }

1

3

4

5

2

Page 53: Flink4 jug

Forwarded fields@ForwardedFields("f0->f2")

public class MyMap implements MapFunction<Tuple2<…>, Tuple3<…>> {

@Override public Tuple3<…> map(Tuple2<…> val) {

return new Tuple3<…>("foo", val.f1 / 2, val.f0);} }

Some fancy stuff to help him

Page 54: Flink4 jug

PartitioningPartitioning controls how individual data points of a stream are

distributed/ordering among the parallel instances of the transformation operators. There are several partitioning types supported in Flink Streaming:

Ex :

Forward(default): Forward partitioning directs the output data to the next operator on the same machine (if possible) avoiding expensive network I/O

Shuffle: Shuffle partitioning randomly partitions the output data stream to the next operator using uniform distribution.

Rebalance: Rebalance partitioning directs the output data stream to the next operator in a round-robin fashion

Broadcast: Broadcast partitioning sends the output data stream to all parallel instances of the next operator. Usage: dataStream.broadcast()

Some fancy stuff to help him

Page 55: Flink4 jug

Performance

Page 56: Flink4 jug

● -Plus d'info soon

● Demo sur 100.000 produits/3 ans de prix => ~ 20 minutes

● Sur un “petit cluster” de 3 noeuds : 4 procs, 8gb de ram virtualisé

Performance

Page 57: Flink4 jug

Limites

Page 58: Flink4 jug

API still moving

Diagnosic is hard …. Flink, hadoop, network, OS , jvm …

Heap usage (too ?) important

Limitation

Page 59: Flink4 jug

API & Big Data eco system

Page 60: Flink4 jug

The growing Flink stack

60

Flink Optimizer Flink Stream Builder

Common API

Scala API Java API

Python API

(upcoming)

Graph APIApache MRQL

Flink Local Runtime

Embedded environment(Java collections)

Local Environment(for debugging)

Remote environment(Regular cluster execution)

Apache Tez

Data storage

HDFS Files S3 JDBC Redis Rabbit

MQKafka

Azure tables

Single node execution Standalone or YARN cluster

Page 61: Flink4 jug

Roadmap

61

Page 62: Flink4 jug

Flink Roadmap

Currently being discussed by the Flink community

Flink has a major release every 3 months, and one or more bug-fixing releases between major releases

Caveat: rough roadmap, depends on volunteer work, outcome of community discussion, and Apache open source processes

62

Page 63: Flink4 jug

Roadmap for 2015 (highlights)

Q1 Q2 Q3

APIs Logical Query integration

Additional operators

Interactive programs

Interactive Scala shell

SQL-on-Flink

Optimizer Semantic annotations

HCatalog integration

Optimizer hints

Runtime Dual engine (blocking & pipelining)

Fine-grained fault tolerance

Dynamic memory allocation

Streaming Better memory management

More operators in API

At-least-once processing guarantees

Unify batch and streaming

Exactly-once processing guarantees

ML library First version Additional algorithms

Mahout integration

Graph library

First version

Integration

Tez, Samoa Mahout

63

Page 64: Flink4 jug

Integration with other projects

Machine Learning – Samoa (incubating):

distributed streaming machine learning (ML) framework

Apache Tez (run complex directed-acyclic-graph of tasks for processing data ) (simplify Pig, Hive task definition)

Storage – Tachyon(Tachyon is a

memory-centric distributed storage system)

Mahout (Data analytics) – H2O (distributed scalable

machine learning system)

Apache Hive (High level langage for data processing)

● Expected Q3/Q4 2015

Apache Zepelin (inc.) A web-based notebook that enables interactive data analytics.

64

Page 65: Flink4 jug

And many more…

Runtime: even better performance and robustnessUsing off-heap memory, dynamic memory allocation

Improvements to the Flink optimizerIntegration with HCatalog, better statistics

Runtime optimization

Streaming graph and ML pipeline libraries

65

Page 66: Flink4 jug

Sumary and conclusion

Page 67: Flink4 jug

Flink is optimized for cyclic or iterative processes by using iterative transformations on collections.

Flink streaming processes data streams as true streams, i.e., data elements are immediately "pipelined" though a streaming program as soon as they arrive. This allows to perform flexible window operations on streams.

Built-in optimizer

Flink in one slide

Page 68: Flink4 jug

flink.apache.orghttp://flink-forward.org/ : 15 oct : Berlin