Bay Area Apache Flink Meetup Community Update August 2015

Bay Area Apache Flink Meetup #2 Distributed Stream and Graph Processing

Community Update August 2015

Henry SaputraCommitter and PMC Member

[email protected]@Kingwulf

Apache Flink is an open source platform for scalable batch and stream data processing.

Apache Flink is …

2

• The core of Apache Flink is a distributed streaming dataflow engine.• Executing dataflows in

parallel on clusters• Providing a reliable

foundation for various workloads

• DataSet and DataStream programming abstractions are the foundation for user programs and higher layers

One engine for many use cases

3

Real time streaming topologies

Machine Learning at scale

Graph Analysis

Long batchpipelines

What happened? - 1• New PMC: Maximilian Michels• New Committer: Chesnay Schepler• Discussions for a 0.9.1 release had started• Apache Flink is becoming more popular:– 1000+ Twitter followers– 500+ GitHub stars– Named as “open source Big Data project” to

watch by ZDNet.– Flink Forward schedule with great speakers

announced4

What happened? - 2• Apache Flink on Wikipedia: https://

en.wikipedia.org/wiki/Apache_Flink • New JobManager Dashboard• Apache SAMOA 0.3.0-incubating with Flink

integration• New “Features” page• Contributors list (can you spot your name?)https://cwiki.apache.org/confluence/display/FLINK/List+of+contributors

5

New Job Manager Dashboard

6

New Website Redesign and New Features page

7

New Architecture diagram in 0.10 documentation

8

More contents in the Wiki for Internal Information

9

In master (0.10-SNAPSHOT) - 1

10

• Gelly Scala API• More improvements and fixes for YARN• Flink dropped Java 6 support• Streaming connector for Elastic Search• Sampling operation on DataSet API• A lot of bug fixes:– Streaming: APIs, general stability, kafka

connector

In master (0.10-SNAPSHOT) - 2

• Low watermarks / Event time• New JM Dashboard• Akka messages are now aware of leader

IDs (for HA)• Zookeeper integration (for HA)• Live accumulators (runtime only)• Stability improvements

11

Articles and Mentions• High-throughput, low-latency, and exactly-once stream

processing with Apache Flink [1]

• Introducing Gelly: Graph Processing with Apache Flink [2]

• Apache Flink and the case for stream processing [3]

• Crunching Parquet Files with Apache Flink [4]

• The morning paper: Asynchronous Distributed Snapshots for Distributed Dataflows [5]

• Five open source Big Data projects to watch [6]

• Big Data Performance Engineering: Examples from Hadoop, Pig, HBase, Flink and Spark [7]

12

[1] http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/[2] http://flink.apache.org/news/2015/08/24/introducing-flink-gelly.html[3] http://www.kdnuggets.com/2015/08/apache-flink-stream-processing.html[4] https://medium.com/@istanbul_techie/crunching-parquet-files-with-apache-flink-200bec90d8a7[5] http://blog.acolyer.org/2015/08/19/asynchronous-distributed-snapshots-for-distributed-dataflows/[6] http://www.zdnet.com/article/five-open-source-big-data-projects-to-watch/[7] http://www.bigsynapse.com/addressing-big-data-performance

New Meetups and Events

13

• Chicago: Flink Training @ Capital One

• Bay Area: Stream & Graph Processing @ MapR

13

GitHub stats

14

Upcoming• Sept 15: Washington DC Area Apache

Flink Meetup• Sept 17: StreamProcessing.be meetup• Sept 28-30: Flink Talks at ApacheCon Big

Data BudapestNew Meetup groups:• New York• Boston

15

Flink Forward schedule published

16

• http://flink-forward.org/?post_type=day• Talks by Google, Data Artisans, Huawei,

CapitalOne, Bouyges, Ericsson, Amadeus, ResearchGate, RedHat, and many more.

50% off for this meetup‘s guests

FlinkMeetupBayArea50

Bay Area Apache Flink Meetup Community Update August 2015

Software

Transcript of Bay Area Apache Flink Meetup Community Update August 2015