Big Data series : Apache Flink
Jérôme BlachonLaurent TardifStéphane Thiers
Juin 2015 : Jug Grenoble Septembre 2015 : Jug Lausanne
Qui sommes nous
Jérôme Blachon Laurent Tardif Stéphane Thiers
Un peu d’histoire
La stack
Flink Demo
Comment ca marche
Les plus
Roadmap
La soirée
Histoire
BigData success story
Map / ReduceOSDI 04
Map / ReduceOSDI 04
Hadoop1
Dryad Euro’Sys
07
Dryad Euro’Sys
07 TEZ
RDDs HotCloud’10,
NSDI’12
RDDs HotCloud’10,
NSDI’12Spark
PACTsSOCC’10, VLDB’12
PACTsSOCC’10, VLDB’12 Flink
Map/Reduce extended to DAGBacktracking recovery
Map/Reduce extended to DAGBacktracking recovery
Small recoverable tasksSequencial code
Small recoverable tasksSequencial code
Functional implementation of Dryad
recovery
Functional implementation of Dryad
recovery
Cyclic Graph (and incremental construction)Query Processing runtime embed in DAG
engine
Cyclic Graph (and incremental construction)Query Processing runtime embed in DAG
engine
Stonebraker/ Cetintemel /
Zdonik2005
Stonebraker/ Cetintemel /
Zdonik2005
● Keep data moving● Low latency on critical path
● Query on stream● High level language
● Handle stream imperfection● Timeout (ex: avg of last 25 securities)
● Out of order (must leave window open)
● Generate predictable outcomes● Time ordered
Criteria for stream processing (1/2)
● Integrate stored / streaming data● Uniform language for both stored and streamed data
● Combine streamed and stored data
● Data safety / availability● Resistant to failure
● Partition and scale automatically● Process and respond instantaneously
● 100 000 msg / s
Criteria for stream processing (2/2)
Big data stack
The stack
Data Processing engine Data Processing engine
User requirementUser requirement
App and ressource managementApp and ressource management
Storage / streamStorage / stream
Eco system
Applications
Data processing engines
App and resourcemanagement
Storage/Stream
Une autre vue
http://practicalanalytics.wordpress.com
Demo
Word count
The hello world
// read test file or in Memory, and generate a set of StringDataSet<String> text = getTextDataSet(env);DataSet<Tuple2<String, Integer>> counts = // split up the lines in pairs (2-tuples) containing: (word,1)text.flatMap(new Tokenizer())
// group by the tuple field "0" and sum up tuple field "1“.groupBy(0).sum(1);
Word count“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)(to,1)(to,1)(to,1)
(be,1)(be,1)(be,1)(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
(or,1)(or,1)
Flatmap(tojenizer)
groupby
sum
Data in memory
public static final String[] WORDS = new String[] {"To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune","Or to take arms against a sea of troubles,","And by opposing end them?--To die,--to sleep,--","No more; and by a sleep to say we end","The heartache, and the thousand natural shocks","That flesh is heir to,--'tis a consummation","Devoutly to be wish'd. To die,--to sleep;--",….
File
private static DataSet<String> getTextDataSet(ExecutionEnvironment env) {
return env.readTextFile(textPath);
}
With POJO
public static class Word {
// fields
private String word;
private Integer frequency;
// constructors
public Word() { }
public Word(String word, int i) {
this.word = word;
this.frequency = i; }
// getters setters
// to String
@Override
public String toString() {
return "Word="+word+" freq="+frequency;
}
Pojo“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
Word 1 {to,1}Word 1 {to,1}
Word 2 {be,1}Word 2 {be,1}
Word 3 {or,1}Word 3 {or,1}
Word 1 {to,1}Word 5 {to,1}Word 1 {to,1}Word 5 {to,1}
Word 2 {be,2}Word 6 {be,1}Word 2 {be,2}Word 6 {be,1}
Word 3 {be,1}Word 3 {be,1}
Word7 {to,2}Word7 {to,2}
Word8 {be,2}Word8 {be,2}
Word9 {or,1}Word9 {or,1}
Flatmap(tokenizer)
groupby
sum
JDBC
(“To be, or not to be,--that is the question:--")(“To be, or not to be,--that is the question:--")
("Whether 'tis nobler in the mind to suffer")("Whether 'tis nobler in the mind to suffer")
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)(to,1)(to,1)(to,1)
(be,1)(be,1)(be,1)(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
(or,1)(or,1)
Map +Flatmap(tokenizer)
groupby
sum
hamlet
“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer",
Stream“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)(to,1)(to,1)(to,1)
(be,1)(be,1)(be,1)(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
(or,1)(or,1)
Stream“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)(to,1)(to,1)(to,1)
(be,1)(be,1)(be,1)(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
Stream“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)(to,1)(to,1)(to,1)
(be,1)(be,1)(be,1)(be,1)
(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
Stream“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)(to,1)(to,1)(to,1)
(be,1)(be,1)(be,1)(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
(or,1)(or,1)
Stream“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)(to,1)(to,1)(to,1)
(be,1)(be,1)(be,1)(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
(or,1)(or,1)
Stream“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
"Whether 'tis nobler in the mind to suffer","Whether 'tis nobler in the mind to suffer",
"The slings and arrows of outrageous fortune","The slings and arrows of outrageous fortune",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)(to,1)(to,1)(to,1)
(be,1)(be,1)(be,1)(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
(or,2)
Flatmap(tokenizer)
groupby
sum
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
"Or to take arms against a sea of troubles,","Or to take arms against a sea of troubles,",
(or,1)(or,1)
(or,1)(or,1)
Multiple “To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)(to,1)(to,1)(to,1)
(be,1)(be,1)(be,1)(be,1)
(or,1)(or,1)
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
(or,1)(or,1)
“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
Multiple“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
......
(to,2)(to,2)
(be,2)(be,2)
Flatmap(tokenizer)
groupby
sum
“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--","Whether 'tis nobler in the mind to suffer","The slings and arrows of outrageous fortune",
“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",“To be, or not to be,--that is the question:--",
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)(to,1)
(be,1)(be,1)
(or,1)(or,1)
(to,1)(to,1)(to,1)(to,1)
(be,1)(be,1)(be,1)(be,1)
Groupby + sum
(to,6)(to,6)
(be,6)(be,6)
(or,3)(or,3)
......
...... ......
Demo
Produit1 , 14 , 1/6/2015Produit1 , 14 , 1/6/2015
Produit2 , 13.5 , 1/6/2015Produit2 , 13.5 , 1/6/2015
Produit3 , 24 , 1/6/2015Produit3 , 24 , 1/6/2015
Produit1 , 14 , 30/5/2015Produit1 , 14 , 30/5/2015Produit2 , 13 , 30/5/2015Produit2 , 13 , 30/5/2015Produit3 , 24 , 30/5/2015Produit3 , 24 , 30/5/2015Produit4 , 124 , 30/5/2015Produit4 , 124 , 30/5/2015
Produit1 Prix moyen (sur 7j) : 14 Prix moyen (sur 30j) : 14 Prix moyen (sur 365j) : 13.5
Produit1 Prix moyen (sur 7j) : 14 Prix moyen (sur 30j) : 14 Prix moyen (sur 365j) : 13.5
Produit1 : 14 , 1/6/2015 14 , 30/5/2015 13 , 29/5/2015
Produit1 : 14 , 1/6/2015 14 , 30/5/2015 13 , 29/5/2015
Produit2 : 13.5 , 1/6/2015 13 , 30/5/2015 13 , 29/5/2015
Produit2 : 13.5 , 1/6/2015 13 , 30/5/2015 13 , 29/5/2015
Demo 2 : twitter
twit, Flink is…, 1/6/2015twit, Flink is…, 1/6/2015
twit, #Flink, 1/6/2015twit, #Flink, 1/6/2015
twit, #Flink, 1/6/2015twit, #Flink, 1/6/2015
twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015
twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015
twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015
twit, #Flink, 30/5/2015twit, #Flink, 30/5/2015
Cloud TagCloud Tag@writer1: #Flink, 1/6/2015 #Flink, 30/5/2015 #Flink, 29/5/2015
@writer1: #Flink, 1/6/2015 #Flink, 30/5/2015 #Flink, 29/5/2015
@writer3: #Flink, 1/6/2015 #Flink, 30/5/2015 #Flink, 29/5/2015
@writer3: #Flink, 1/6/2015 #Flink, 30/5/2015 #Flink, 29/5/2015
JiraJira
stackoverflowstackoverflow
Demo 3 : scala shell
… Word count demo from flink scalashell ...
Demo 4 : ML demo
Classifier (SVM) from MLLib– Scala only
Learn + Predict
Some basics (covered by demo)type, streaming, loop,….
Tuples avec des types primitifsDataSet<Tuple2<String, Integer>> wordCounts = env.fromElements(
new Tuple2<String, Integer>("hello", 1),new Tuple2<String, Integer>("world", 2));
Pojo (constructor + get/set) public class WordWithCount {
public String word; public int count; public WordCount() {} public WordCount(String word, int count) {
this.word = word; this.count = count;
} }
Hadoop org.apache.hadoop.Writable interface
Data
//local file systemDataSet<String> localLines = env.readTextFile("file:///path/to/my/textfile");
// read text file from a HDFS running at nnHost:nnPort DataSet<String> hdfsLines = env.readTextFile("hdfs://nnHost:nnPort/path/to/my/textfile");
// read a CSV file with three fields DataSet<Tuple3<Integer, String, Double>> csvInput = env.readCsvFile("hdfs:///the/CSV/file") .types(Integer.class, String.class, Double.class);
// create a set from some given elements DataSet<String> value = env.fromElements("Foo", "bar", "foobar", "fubar");
Data sources : File based
// Read data from a relational database using the JDBC input format DataSet<Tuple2<String, Integer> dbData =
env.createInput( // create and configure input format
JDBCInputFormat.buildJDBCInputFormat() .setDrivername("org.apache.derby.jdbc.EmbeddedDriver")
.setDBUrl("jdbc:derby:memory:persons")
.setQuery("select name, age from persons")
.finish(),
// specify type information for DataSet
new TupleTypeInfo(Tuple2.class, STRING_TYPE_INFO, INT_TYPE_INFO) );
Data sources
// text
data DataSet<String> textData = // [...]
// write DataSet to a file on the local file system textData.writeAsText("file:///my/result/on/localFS");
// write DataSet to a file on a HDFS with a namenode running at nnHost:nnPort textData.writeAsText("hdfs://nnHost:nnPort/my/result/on/localFS");
// write DataSet to a file and overwrite the file if it exists textData.writeAsText("file:///my/result/on/localFS", WriteMode.OVERWRITE);
// tuples as lines with pipe as the separator "a|b|c"
DataSet<Tuple3<String, Integer, Double>> values = // [...] values.writeAsCsv("file:///path/to/the/result/file", "\n", "|");
Data Sinks
Variable and storage
DataSet<Tuple...> large = env.readCsv(...);DataSet<Tuple...> medium = env.readCsv(...);DataSet<Tuple...> small = env.readCsv(...);
DataSet<Tuple...> LargeAndMedium = large.join(medium) .where(3).equals(1)
.with(new JoinFunction() { ... });
DataSet<Tuple...> LargeMediumAndSmall= small.join(joined1).where(0).equals(2)
.with(new JoinFunction() { ... });
DataSet<Tuple...> result = LargeMediumAndSmall.groupBy(3).aggregate(MAX, 2); DataSet<Tuple...> otherresult = LargeMedium.groupBy(3).aggregate(MAX, 2); DataSet<Tuple...> oneMoreresult = Large.groupBy(3).aggregate(MAX, 2);
Map
Filter
Reduce
Join
Cross
Union
First-n
….
Lazy Evaluation
Operators
Datastream
continuous, parallel, immutable stream of data
Socket stream (twitter, …)
Message Queue connector (RabbitMQ)
FileStream
Streaming
Iterative
Algorithms that need iterations Clustering (K-Means, Canopy, …)
Gradient descent (e.g., Logistic Regression, Matrix Factorization)
Graph Algorithms (e.g., PageRank, Line-Rank, components, paths, reachability, centrality, )
Graph communities / dense sub-components
Inference (believe propagation)
…
Loop makes multiple passes over the data
40
Windowing
(to,2)(to,2) (be,2)(be,2)……
.window(Count.of(4)).every(Count.of(2))
41
CountTime
….
CountTime
….
CountTime
….
CountTime
….
Windowing
(to,2)(to,2) (be,2)(be,2) (or,1)(or,1) (lord,2)(lord,2)……
.window(Count.of(4)).every(Count.of(2))
42
CountTime
….
CountTime
….
CountTime
….
CountTime
….
Windowing
(to,2)(to,2) (be,2)(be,2) (or,1)(or,1) (lord,2)(lord,2) (my,2)(my,2) (king,1)(king,1)……
.window(Count.of(4)).every(Count.of(2))
43
CountTime
….
CountTime
….
CountTime
….
CountTime
….
Go inside Flink
© 2015 Persistent Systems Ltd 45
Comment ca marche : idée naïve
CodeCode
Flink
Job Manager
Job Manager
Execution Plan
Execution Plan
DataData
ResultsResults
Execution plan
We have resources, let’s optimize it !
CodeCode
Flink
Job Manager
Job Manager
Execution Plan
Execution Plan
DataData
ResultResult
DataData
ResultResult
DataData
ResultResult
DataData
ResultResult
Distributed Runtime
49
Master (Job Manager) handles job submission, scheduling, and metadata
Workers (Task Managers) execute operations
Data can be streamed between nodes
All operators startin-memory and graduallygo out-of-core
How the magic happen - Flink Runtime - Flink Optimizer
50
The optimizer is the component that selects an execution plan for a Common API program
Think of an AI system manipulating your program for you
But don’t be scared – it works• Relational databases
have been doing this for decades – Flink ports the technology to API-based systems
Flink Optimizer
51
Program lifecycle
52
val source1 = …val source2 = …val m axed = source1 .m ap(v = > (v._1,v._2, m ath.m ax(v._1,v._2))val filtered = source2 .filter(v = > (v._1 > 4))val result = m axed .join(filtered).w here(0).equalTo(0) .filter(_1 > 3) .groupBy(0) .reduceG roup {… … }
1
3
4
5
2
Forwarded fields@ForwardedFields("f0->f2")
public class MyMap implements MapFunction<Tuple2<…>, Tuple3<…>> {
@Override public Tuple3<…> map(Tuple2<…> val) {
return new Tuple3<…>("foo", val.f1 / 2, val.f0);} }
Some fancy stuff to help him
PartitioningPartitioning controls how individual data points of a stream are
distributed/ordering among the parallel instances of the transformation operators. There are several partitioning types supported in Flink Streaming:
Ex :
Forward(default): Forward partitioning directs the output data to the next operator on the same machine (if possible) avoiding expensive network I/O
Shuffle: Shuffle partitioning randomly partitions the output data stream to the next operator using uniform distribution.
Rebalance: Rebalance partitioning directs the output data stream to the next operator in a round-robin fashion
Broadcast: Broadcast partitioning sends the output data stream to all parallel instances of the next operator. Usage: dataStream.broadcast()
Some fancy stuff to help him
Performance
● -Plus d'info soon
● Demo sur 100.000 produits/3 ans de prix => ~ 20 minutes
● Sur un “petit cluster” de 3 noeuds : 4 procs, 8gb de ram virtualisé
Performance
Limites
API still moving
Diagnosic is hard …. Flink, hadoop, network, OS , jvm …
Heap usage (too ?) important
Limitation
API & Big Data eco system
The growing Flink stack
60
Flink Optimizer Flink Stream Builder
Common API
Scala API Java API
Python API
(upcoming)
Graph APIApache MRQL
Flink Local Runtime
Embedded environment(Java collections)
Local Environment(for debugging)
Remote environment(Regular cluster execution)
Apache Tez
Data storage
HDFS Files S3 JDBC Redis Rabbit
MQKafka
Azure tables
…
Single node execution Standalone or YARN cluster
Roadmap
61
Flink Roadmap
Currently being discussed by the Flink community
Flink has a major release every 3 months, and one or more bug-fixing releases between major releases
Caveat: rough roadmap, depends on volunteer work, outcome of community discussion, and Apache open source processes
62
Roadmap for 2015 (highlights)
Q1 Q2 Q3
APIs Logical Query integration
Additional operators
Interactive programs
Interactive Scala shell
SQL-on-Flink
Optimizer Semantic annotations
HCatalog integration
Optimizer hints
Runtime Dual engine (blocking & pipelining)
Fine-grained fault tolerance
Dynamic memory allocation
Streaming Better memory management
More operators in API
At-least-once processing guarantees
Unify batch and streaming
Exactly-once processing guarantees
ML library First version Additional algorithms
Mahout integration
Graph library
First version
Integration
Tez, Samoa Mahout
63
Integration with other projects
Machine Learning – Samoa (incubating):
distributed streaming machine learning (ML) framework
Apache Tez (run complex directed-acyclic-graph of tasks for processing data ) (simplify Pig, Hive task definition)
Storage – Tachyon(Tachyon is a
memory-centric distributed storage system)
Mahout (Data analytics) – H2O (distributed scalable
machine learning system)
Apache Hive (High level langage for data processing)
● Expected Q3/Q4 2015
Apache Zepelin (inc.) A web-based notebook that enables interactive data analytics.
64
And many more…
Runtime: even better performance and robustnessUsing off-heap memory, dynamic memory allocation
Improvements to the Flink optimizerIntegration with HCatalog, better statistics
Runtime optimization
Streaming graph and ML pipeline libraries
65
Sumary and conclusion
Flink is optimized for cyclic or iterative processes by using iterative transformations on collections.
Flink streaming processes data streams as true streams, i.e., data elements are immediately "pipelined" though a streaming program as soon as they arrive. This allows to perform flexible window operations on streams.
Built-in optimizer
Flink in one slide
flink.apache.orghttp://flink-forward.org/ : 15 oct : Berlin
Top Related