Bay Area Apache Flink Meetup Community Update August 2015
-
Upload
henry-saputra -
Category
Software
-
view
6.349 -
download
0
Transcript of Bay Area Apache Flink Meetup Community Update August 2015
Bay Area Apache Flink Meetup #2 Distributed Stream and Graph Processing
Community Update August 2015
Henry SaputraCommitter and PMC Member
[email protected]@Kingwulf
Apache Flink is an open source platform for scalable batch and stream data processing.
Apache Flink is …
2
• The core of Apache Flink is a distributed streaming dataflow engine.• Executing dataflows in
parallel on clusters• Providing a reliable
foundation for various workloads
• DataSet and DataStream programming abstractions are the foundation for user programs and higher layers
One engine for many use cases
3
Real time streaming topologies
Machine Learning at scale
Graph Analysis
Long batchpipelines
What happened? - 1• New PMC: Maximilian Michels• New Committer: Chesnay Schepler• Discussions for a 0.9.1 release had started• Apache Flink is becoming more popular:– 1000+ Twitter followers– 500+ GitHub stars– Named as “open source Big Data project” to
watch by ZDNet.– Flink Forward schedule with great speakers
announced4
What happened? - 2• Apache Flink on Wikipedia: https://
en.wikipedia.org/wiki/Apache_Flink • New JobManager Dashboard• Apache SAMOA 0.3.0-incubating with Flink
integration• New “Features” page• Contributors list (can you spot your name?)https://cwiki.apache.org/confluence/display/FLINK/List+of+contributors
5
New Job Manager Dashboard
6
New Website Redesign and New Features page
7
New Architecture diagram in 0.10 documentation
8
More contents in the Wiki for Internal Information
9
In master (0.10-SNAPSHOT) - 1
10
• Gelly Scala API• More improvements and fixes for YARN• Flink dropped Java 6 support• Streaming connector for Elastic Search• Sampling operation on DataSet API• A lot of bug fixes:– Streaming: APIs, general stability, kafka
connector
In master (0.10-SNAPSHOT) - 2
• Low watermarks / Event time• New JM Dashboard• Akka messages are now aware of leader
IDs (for HA)• Zookeeper integration (for HA)• Live accumulators (runtime only)• Stability improvements
11
Articles and Mentions• High-throughput, low-latency, and exactly-once stream
processing with Apache Flink [1]
• Introducing Gelly: Graph Processing with Apache Flink [2]
• Apache Flink and the case for stream processing [3]
• Crunching Parquet Files with Apache Flink [4]
• The morning paper: Asynchronous Distributed Snapshots for Distributed Dataflows [5]
• Five open source Big Data projects to watch [6]
• Big Data Performance Engineering: Examples from Hadoop, Pig, HBase, Flink and Spark [7]
12
[1] http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/[2] http://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html[3] http://www.kdnuggets.com/2015/08/apache-flink-stream-processing.html[4] https://medium.com/@istanbul_techie/crunching-parquet-files-with-apache-flink-200bec90d8a7[5] http://blog.acolyer.org/2015/08/19/asynchronous-distributed-snapshots-for-distributed-dataflows/[6] http://www.zdnet.com/article/five-open-source-big-data-projects-to-watch/[7] http://www.bigsynapse.com/addressing-big-data-performance
New Meetups and Events
13
• Chicago: Flink Training @ Capital One
• Bay Area: Stream & Graph Processing @ MapR
13
GitHub stats
14
Upcoming• Sept 15: Washington DC Area Apache
Flink Meetup• Sept 17: StreamProcessing.be meetup• Sept 28-30: Flink Talks at ApacheCon Big
Data BudapestNew Meetup groups:• New York• Boston
15
Flink Forward schedule published
16
• http://flink-forward.org/?post_type=day• Talks by Google, Data Artisans, Huawei,
CapitalOne, Bouyges, Ericsson, Amadeus, ResearchGate, RedHat, and many more.
50% off for this meetup‘s guests
FlinkMeetupBayArea50