Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of...

25
1 Jamie Grier @jamiegrier data-artisans.com Apache Flink: The Latest and Greatest

Transcript of Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of...

Page 1: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

1

Jamie Grier@jamiegrier

data-artisans.com

Apache Flink: The Latest and Greatest

Page 2: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

2

Original creators of Apache Flink®

Providers of the dA Platform, a supported

Flink distribution

Page 3: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

The Latest Features

▪ ProcessFunction API ▪ Queryable State API ▪ Excellent support for advanced applications that are: ▪ Flexible ▪ Stateful ▪ Event Driven ▪ Time Driven

Page 4: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

The Latest Features - Quick Overview

▪ Rescalable State ▪ Async I/O Support ▪ Flexible Deployment Options ▪ Enhanced Security

Page 5: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Rescalable State

▪ Separates state parallelism from task parallelism

▪ Enables autoscaling integrations while maintaining stateful computations

▪ Handled efficiently via key groups

Page 6: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Rescalable State

Map Filter Window

State State StateSource Sink

State is partitioned by key

Page 7: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Rescalable State

Source Sink

Map Filter Window State State State

Map Filter Window State State State

State is partitioned by key

Page 8: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Rescalable State

Source Sink

Map Filter Window State State State

Map Filter Window State State State

Map Filter Window State State State

Map Filter Window State State State

State is partitioned by key

Page 9: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Rescalable State

Source Sink

Map Filter Window State State State

Map Filter Window State State State

Map Filter Window State State State

Map Filter Window State State State

Map Filter Window State State State

Map Filter Window State State State

Map Filter Window State State State

Map Filter Window State State State

State is partitioned by key

Page 10: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Flexible Deployment Options

▪ YARN ▪ Mesos ▪ Docker Swarm ▪ Kubernetes

Page 11: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Flexible Deployment Options

▪ DC/OS ▪ Amazon EMR ▪ Google Dataproc

Page 12: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Asynchronous I/O Support

▪ Make aynchronous calls to external services from streaming job

▪ Efficiently keeps configurable number of asynchronous calls in flight

▪ Correctly handles failure scenarios - restarts failed async calls, etc

Page 13: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Asynchronous I/O Support

Async I/O

UserCode

Page 14: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Asynchronous I/O Support

Little’s Law:

throughput = occupancy / latency

Page 15: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Asynchronous I/O Support

abcd

bcd

x

a

ab

b

a

Sync. I/O Async. I/O

database database

sendRequest(x) x receiveResponse(x) wait

Page 16: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Asynchronous I/O Support

Page 17: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Asynchronous I/O Support// create the original stream val stream: DataStream[String] = ...

// apply the async I/O transformation val resultStream: DataStream[(String, String)] =

AsyncDataStream.unorderedWait( input = stream, asyncFunction = new AsyncDatabaseRequest(), timeout = 1000, timeUnit = TimeUnit.MILLISECONDS, concurrentRequests = 100)

Page 18: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Asynchronous I/O Supportclass AsyncDatabaseRequest extends AsyncFunction[String, (String, String)] {

override def asyncInvoke(str: String, asyncCollector: AsyncCollector[(String, String)]): Unit = {

// issue the asynchronous request, receive a future for the result val resultFuture: Future[String] = client.query(str)

// set the callback to be executed once the request by the client is complete // the callback simply forwards the result to the collector resultFuture.onSuccess { case result: String => asyncCollector.collect(Iterable((str, result))); } } }

Page 19: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Enhanced Security

▪ SSL ▪ Kerberos ▪ Kafka ▪ Zookeeper ▪ Hadoop

Page 20: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Advanced Event-Driven Applications

▪ ProcessFunction API ▪ Queryable State API ▪ Excellent support for advanced applications that are: ▪ Flexible ▪ Stateful ▪ Event Driven ▪ Time Driven

Page 21: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Example: FlinkTrade▪ Overall Requirements:

▪ Consume “starting position” and “quote” streams ▪ Process complex, time-oriented, trading rules ▪ Trade out of positions to our advantage if possible ▪ Provide a dashboard of currently held positions to traders and asset managers

▪ Complex Rules: ▪ We only make trades where the Bid Price is above our current Ask Price ▪ When a trade is made we increase our Ask Price — looking to optimize our profits ▪ Positions have a set time-to-live until we try to trade out of them more aggressively

by decreasing the Ask Price over time

Page 22: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Example: FlinkTrade

Quotes

Starting Positions

TradingEngine Trades

Positions

SYMBOL SHARES BUY PRICE ASK PRICE LAST TRADEPRICE

Profit

AAPL 10,000 140.40 140.50 140.40 $10,921.00

GOOG 20,000 846.81 846.91 846.81 $12,021.00

TWTR 8,000 15.12 15.22 15.12 $4,032.00

● Event Driven Processing● Complex Trading Rules● Time Based Logic

Trader Dashboard

Page 23: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Example: FlinkTradeQuotes

StartingPositions

Trades

TradingEngine

Positions

TradingEngine

Positions

TradingEngine

Positions

TradingEngine

Positions

TradingEngine

Positions

TradingEngine

Positions

TradingEngine

Positions

TradingEngine

Positions

TradingEngine

Positions

TradingEngine

Positions

TradingEngine

Positions

TradingEngine

Positions

SYMBOL SHARES BUY PRICE ASK PRICE LAST TRADEPRICE

Profit

AAPL 10,000 140.40 140.50 140.40 $10,921.00

GOOG 20,000 846.81 846.91 846.81 $12,021.00

TWTR 8,000 15.12 15.22 15.12 $4,032.00

Page 24: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

Example: FlinkTrade

Let’s look at the code

Page 25: Apache Flink: The Latest and GreatestApache Flink: The Latest and Greatest 2 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution The Latest

We are hiring!

data-artisans.com/careers@jamiegrier @ApacheFlink @dataArtisans