Meson: Heterogeneous Workflows with Spark at Netflix

18
Heterogeneous Workflows with Spark at Netflix 0 Antony Arokiasamy | Kedar Sadekar | Personalization Infrastructure

Transcript of Meson: Heterogeneous Workflows with Spark at Netflix

Page 1: Meson: Heterogeneous Workflows with Spark at Netflix

Heterogeneous Workflows with Spark at Netflix

0

Antony Arokiasamy | Kedar Sadekar | Personalization Infrastructure

Page 2: Meson: Heterogeneous Workflows with Spark at Netflix

1

Help members find content to watch and enjoy to maximize member satisfaction and retention

Page 3: Meson: Heterogeneous Workflows with Spark at Netflix

Everything is a Recommendation2

Recommendations are driven by Machine Learning

Ranking

Row

s

Page 4: Meson: Heterogeneous Workflows with Spark at Netflix

Machine Learning Pipeline3

User SelectionFeature

GenerationModel

ValidationPublishModel

Model Training

Page 5: Meson: Heterogeneous Workflows with Spark at Netflix

Machine Learning Pipeline Challenges

4

• Innovation• Heterogeneous Environments

• Spark• Native Support

• Separate Orchestration and Execution

• Multi Tenancy

• Machine Learning Constructs• Parameter Sweep – 30k Dockers

Page 6: Meson: Heterogeneous Workflows with Spark at Netflix

Meson Workflow System5

• General Purpose Workflow Orchestration and Scheduling framework• Delegates execution to resource managers like Mesos

• Optimized for Machine Learning Pipelines and Visualization

• Checkout the Blog• http://bit.ly/mesonws or techblog.netflix.com

• Plan to Open Sourced soon

Page 7: Meson: Heterogeneous Workflows with Spark at Netflix

Meson Architecture6

Page 8: Meson: Heterogeneous Workflows with Spark at Netflix

Standard and Custom Step Types7

Page 9: Meson: Heterogeneous Workflows with Spark at Netflix

Parameter Passing8

Hive Query User DataSet Regional DataSet

Global DataSet

Get Users

Regional Model

Global Model

User DataSet

Wrangle Data

Page 10: Meson: Heterogeneous Workflows with Spark at Netflix

Structured Constructs9

Page 11: Meson: Heterogeneous Workflows with Spark at Netflix

Top Down or Bottom Up10

Page 12: Meson: Heterogeneous Workflows with Spark at Netflix

Two Way Communication11

Page 13: Meson: Heterogeneous Workflows with Spark at Netflix

Spark Step12

Page 14: Meson: Heterogeneous Workflows with Spark at Netflix

Artifacts13

• Step outputs tracked as Artifacts

• Visualization

• Memoization

Page 15: Meson: Heterogeneous Workflows with Spark at Netflix

Multi Tenancy14

• Resource Attributes • spark.cores.max• spark.executor.memory• spark.mesos.constraints• Dynamic Resource Allocation

Page 16: Meson: Heterogeneous Workflows with Spark at Netflix

Cluster Management15

• Red-Black software updates

• Scale up/Scale down

Page 17: Meson: Heterogeneous Workflows with Spark at Netflix

Meson/Spark Cluster16

• 100s of Concurrent Jobs

• 700 Nodes

• 5000 Cores

• 25 TB Memory

• Apps: Meson Workflow System, Spark and Dockers

• Few smaller clusters

Page 18: Meson: Heterogeneous Workflows with Spark at Netflix

17

Antony Arokiasamy

Kedar Sadekar

@aasamy

/aasamy

[email protected]

@kedar_sadekar

/kedar-sadekar

[email protected]