Redesigning Apache Flink’s Distributed Architecture · Redesigning Apache Flink’s Distributed...

Post on 20-May-2020

11 views 0 download

Transcript of Redesigning Apache Flink’s Distributed Architecture · Redesigning Apache Flink’s Distributed...

Till Rohrmann trohrmann@apache.org @stsffap

Redesigning Apache Flink’s Distributed Architecture

2

1001 Deployment Scenarios

▪ Many different deployment scenarios • Yarn • Mesos • Docker/Kubernetes • Standalone • Etc.

3

Different Usage Patterns

▪ Few long running vs. many short running jobs • Overhead of starting a Flink cluster

▪ Job isolation vs. sharing resources • Allowing to define per job credentials & secrets • Efficient resource utilization by sharing

4

Job & Session Mode

▪ Job mode • Dedicated cluster for a single job

▪ Session mode • Shared cluster for multiple jobs • Resources can be shared across jobs

5

Flink’s Current State

6

As-Is State (Standalone)

7

Standalone Flink Cluster

Client (2) Submit Job JobManager

TaskManager

(3) Deploy Tasks

(1) RegisterTaskManager

TaskManager

As-Is State (YARN)

8

YARNResourceManager

YARN Cluster

Client

(1) Submit YARN App.(FLINK)

Application Master

JobManager

TaskManager

TaskManager

TaskManager

(2) Spawn Application Master

(4) Start TaskManagers

(8) Deploy Tasks

(3) Poll status

(6) All TaskManager started

(5) Register

(7) Submit Job

Problems

▪ No clear separation of concerns ▪ No dynamic resource allocation ▪ No heterogeneous resources ▪ Not well suited for containerized

execution

9

Flink’s New Distributed Architecture

10

Flink Improvement Proposal 6

▪ Introduce generic building blocks

▪ Compose blocks for different scenarios

▪ Mainly driven by:

11Flip-6 design document: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147077

The Building Blocks

12

• ClusterManager-specific • May live across jobs • Manages available Containers/TaskManagers • Used to acquire / release resources

ResourceManager

TaskManagerJobManager

• Registers at ResourceManager • Gets tasks from one or more

JobManagers

• Single job only, started per job • Thinks in terms of "task slots" • Deploys and monitors job/task execution

Dispatcher

• Lives across jobs • Touch-point for job submissions • Spawns JobManagers • May spawn ResourceManager

The Building Blocks

13

ResourceManager

(3) Request slotsTaskManager

JobManager

(4) Start TaskManager

(5) Register

(7) Deploy Tasks

Dispatcher

Client

(1) Submit Job

(2) Start JobManager

(6) Offer slots

Building Flink-on-YARN

14

YARNResourceManager

YARN Cluster

YARN Cluster Client

(1) Submit YARN App.(JobGraph / JARs)

Application MasterFlink-YARN

ResourceManager

JobManager TaskManager

TaskManager

TaskManager

(2) Spawn Application Master

(4) StartTaskManagers

(6) Deploy Tasks

(5) Register(3) Request slots

Differences to old YARN mode

▪ JARs in classpath of all components

▪ Dynamic resources allocation

▪ No two phase job submission

15

Building Flink-on-Mesos

16

Mesos Master

Mesos Cluster

Mesos Cluster Client

(1) HTTP POST JobGraph/Jars

Flink Master ProcessFlink Mesos

ResourceManager

JobManager TaskManager

TaskManager

TaskManager

(3) Start Process (and supervise)

(5) StartTaskManagers

(7) DeployTasks

(6) Register(4) Request slots

Flink Mesos Dispatcher

(2) Allocate containerfor Flink master

Master Container

Flink Master Process

Building Flink-on-Docker/K8S

17

Flink-ContainerResourceManager

JobManager

Program Runner

(2) Run & Start

Worker Container

TaskManager

Worker Container

TaskManager

Worker Container

TaskManager

(3) Register

(1) Container framework starts Master & Worker Containers

(4) Deploy Tasks

Containerized Execution

▪ Single dedicated Resource- and JobManager container and multiple TaskManager containers

▪ Generalization • Start N containers • Use leader election to determine JobManager role; remainder

TaskManager role

▪ Enabling auto-scaling groups by rescaling job to fill all available slots

18

Multi Job Sessions

19

Building Standalone

20Standalone Cluster

Flink Cluster Client

(2) Submit JobGraph/Jars

Flink Master ProcessStandalone

ResourceManager

TaskManager

TaskManager

TaskManager

(5) Deploy Tasks

(1) Register(4) Request slots

JobManager JobManager

Dispatcher

(3) Start JobManager

Standby Master Process Standby Master Process

YARN Session

ApplicationMasterFlink-YARN

ResourceManager(5) Request

slots

JobManager (A)

JobManager (B)

Dispatcher

(4) StartJobMngr

YARNResourceManager

YARN Cluster

Client

(1) Submit YARN App.(FLINK – session)

TaskManager

TaskManager

TaskManager

(2) Spawn Application Master

(6) StartTaskManagers

(8, 12) Deploy Tasks

(7) Register(3) Submit Job A (11) Request

slots

(10) StartJobMngr

(9) SubmitJob B

21

Multi Job Sessions

▪ Dispatcher spawns for each job a dedicated JobManager

▪ Jobs run under session user credentials

▪ ResourceManager holds on to resources • Reuse of allocated resources • Quicker response for successive jobs

22

Miscellaneous

▪ Resource profiles • Specify CPU & memory requirements for individual

operators • ResourceManager allocates containers according to

resource profiles

▪ New RPC abstraction similar to Akka’s typed actors • Properly defined interface eases development • No longer locked in on Akka

23

Conclusion

24

Conclusion

▪ Different cluster environments have different deployment paradigms

▪ Support for “Job” as well as “Session” mode in various environments necessary

▪ Flip-6 architecture provides necessary flexibility to achieve both

25

26

Thank you!@stsffap @ApacheFlink @dataArtisans

We are hiring!

data-artisans.com/careers