Spark Summit EU talk by William Benton

133
William Benton Red Hat, Inc. @willb [email protected] CONTAINERIZED SPARK ON KUBERNETES

Transcript of Spark Summit EU talk by William Benton

Page 1: Spark Summit EU talk by William Benton

William Benton Red Hat, Inc. @willb • [email protected]

CONTAINERIZED SPARK ON KUBERNETES

Page 2: Spark Summit EU talk by William Benton

BACKGROUND

Page 3: Spark Summit EU talk by William Benton

BACKGROUND

Page 4: Spark Summit EU talk by William Benton

BACKGROUND

Page 5: Spark Summit EU talk by William Benton

BACKGROUND

Page 6: Spark Summit EU talk by William Benton

BACKGROUND

Page 7: Spark Summit EU talk by William Benton

BACKGROUND

Page 8: Spark Summit EU talk by William Benton

BACKGROUND

Page 9: Spark Summit EU talk by William Benton

BACKGROUND

Page 10: Spark Summit EU talk by William Benton

WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014

Page 11: Spark Summit EU talk by William Benton

WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Page 12: Spark Summit EU talk by William Benton

WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014

Networked POSIX FS

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Page 13: Spark Summit EU talk by William Benton

Mesos

WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014

Networked POSIX FS

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Page 14: Spark Summit EU talk by William Benton

Mesos

WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014

Networked POSIX FS

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

1

2

3

4

Page 15: Spark Summit EU talk by William Benton

Mesos

WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014

Networked POSIX FS

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

1

2

3

4

1

1

2

3

3

4

Page 16: Spark Summit EU talk by William Benton

Mesos

WHAT OUR SPARK CLUSTER LOOKED LIKE IN 2014

Networked POSIX FS

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

1

2

3

4

1

1

2

3

3

4

Page 17: Spark Summit EU talk by William Benton

Analytics is no longer a separate workload.

Page 18: Spark Summit EU talk by William Benton

Analytics is an essential component of modern data-driven applications.

Page 19: Spark Summit EU talk by William Benton

OUR GOALS

Page 20: Spark Summit EU talk by William Benton

OUR GOALS

Page 21: Spark Summit EU talk by William Benton

OUR GOALS

Page 22: Spark Summit EU talk by William Benton

OUR GOALS

Page 23: Spark Summit EU talk by William Benton

OUR GOALS

git

Page 24: Spark Summit EU talk by William Benton

OUR GOALS

git

Page 25: Spark Summit EU talk by William Benton

FORECAST

Motivating containerized microservices

Architectures for analytics and applications

Spark clusters in containers: practicalities and pitfalls

Play along at home

Future work

Page 26: Spark Summit EU talk by William Benton

MOTIVATING MICROSERVICES

Page 27: Spark Summit EU talk by William Benton

A microservice architecture employs lightweight, modular, and typically stateless components with well-defined interfaces and contracts.

Page 28: Spark Summit EU talk by William Benton

BENEFITS OF MICROSERVICE ARCHITECTURES

Page 29: Spark Summit EU talk by William Benton

BENEFITS OF MICROSERVICE ARCHITECTURES

Page 30: Spark Summit EU talk by William Benton

BENEFITS OF MICROSERVICE ARCHITECTURES

Page 31: Spark Summit EU talk by William Benton

BENEFITS OF MICROSERVICE ARCHITECTURES

Page 32: Spark Summit EU talk by William Benton

BENEFITS OF MICROSERVICE ARCHITECTURES

Page 33: Spark Summit EU talk by William Benton

BENEFITS OF MICROSERVICE ARCHITECTURES

Page 34: Spark Summit EU talk by William Benton

BENEFITS OF MICROSERVICE ARCHITECTURES

Page 35: Spark Summit EU talk by William Benton

BENEFITS OF MICROSERVICE ARCHITECTURES

2 + 2

Page 36: Spark Summit EU talk by William Benton

BENEFITS OF MICROSERVICE ARCHITECTURES

2 + 2 5

Page 37: Spark Summit EU talk by William Benton

BENEFITS OF MICROSERVICE ARCHITECTURES

2 + 2 5

Page 38: Spark Summit EU talk by William Benton

BENEFITS OF MICROSERVICE ARCHITECTURES

Page 39: Spark Summit EU talk by William Benton

BENEFITS OF MICROSERVICE ARCHITECTURES

Page 40: Spark Summit EU talk by William Benton

BENEFITS OF MICROSERVICE ARCHITECTURES

?

Page 41: Spark Summit EU talk by William Benton

BENEFITS OF MICROSERVICE ARCHITECTURES

?

Page 42: Spark Summit EU talk by William Benton

BENEFITS OF MICROSERVICE ARCHITECTURES

?

Page 43: Spark Summit EU talk by William Benton

MICROSERVICES AND SPARK

executor

1 2 3

executor

4 5 6

executor

7 8 9

executor

10 11 12

master

Page 44: Spark Summit EU talk by William Benton

MICROSERVICES AND SPARK

executor

1 2 3

executor

4 5 6

executor

7 8 9

executor

10 11 12

master

λ x: x * 2

Page 45: Spark Summit EU talk by William Benton

MICROSERVICES AND SPARK

executor

1 2 3

executor

4 5 6

executor

7 8 9

executor

10 11 12

master

λ x: x * 22 4 6 8 10 12 14 16 18 20 22 24

λ x: x * 2 λ x: x * 2 λ x: x * 2 λ x: x * 2

Page 46: Spark Summit EU talk by William Benton

MICROSERVICES AND SPARK

executor

1 2 3

executor

4 5 6

executor

7 8 9

executor

10 11 12

master

λ x: x * 22 4 6 8 10 12 14 16 18 20 22 24

λ x: x * 2 λ x: x * 2 λ x: x * 2 λ x: x * 2

Page 47: Spark Summit EU talk by William Benton

MICROSERVICES AND SPARK

executor

1 2 3

executor

4 5 6

executor

7 8 9

executor

10 11 12

master

λ x: x * 22 4 6 8 10 12 14 16 18 20 22 24

λ x: x * 2 λ x: x * 2 λ x: x * 2 λ x: x * 2

Page 48: Spark Summit EU talk by William Benton

ARCHITECTURES FOR ANALYTICS AND APPLICATIONS

Page 49: Spark Summit EU talk by William Benton

APPLICATION RESPONSIBILITIES

transform

transform

transform

events

databases

file, object storage

Page 50: Spark Summit EU talk by William Benton

APPLICATION RESPONSIBILITIES

transform

transform

transform

aggregate

events

databases

file, object storage

Page 51: Spark Summit EU talk by William Benton

APPLICATION RESPONSIBILITIES

trainmodels

transform

transform

transform

aggregate

events

databases

file, object storage

Page 52: Spark Summit EU talk by William Benton

APPLICATION RESPONSIBILITIES

trainmodels

transform

transform

transform

aggregate

events

databases

file, object storage

Page 53: Spark Summit EU talk by William Benton

APPLICATION RESPONSIBILITIES

archive

trainmodels

transform

transform

transform

aggregate

events

databases

file, object storage

Page 54: Spark Summit EU talk by William Benton

APPLICATION RESPONSIBILITIES

archive

trainmodels

transform

transform

transform

aggregate

events

databases

file, object storage

Page 55: Spark Summit EU talk by William Benton

APPLICATION RESPONSIBILITIES

archive

trainmodels

transform

transform

transform

aggregate

events

databases

file, object storage

web and mobile

reporting

developer UI

Page 56: Spark Summit EU talk by William Benton

APPLICATION RESPONSIBILITIES

archive

trainmodels

transform

transform

transform

aggregate

events

databases

file, object storage

management

web and mobile

reporting

developer UI

Page 57: Spark Summit EU talk by William Benton

LEGACY ARCHITECTURES

Page 58: Spark Summit EU talk by William Benton

CONVENTIONAL DATA WAREHOUSE

events

Page 59: Spark Summit EU talk by William Benton

CONVENTIONAL DATA WAREHOUSE

transformevents

Page 60: Spark Summit EU talk by William Benton

CONVENTIONAL DATA WAREHOUSE

transformevents

UI

Page 61: Spark Summit EU talk by William Benton

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

Page 62: Spark Summit EU talk by William Benton

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS

Page 63: Spark Summit EU talk by William Benton

transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS

Page 64: Spark Summit EU talk by William Benton

transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS

Page 65: Spark Summit EU talk by William Benton

transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS RDBMS

Page 66: Spark Summit EU talk by William Benton

transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS RDBMS

analysis

Page 67: Spark Summit EU talk by William Benton

transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS RDBMS

analysis

reporting

Page 68: Spark Summit EU talk by William Benton

transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS RDBMS

analysis

reporting

Page 69: Spark Summit EU talk by William Benton

transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS RDBMS

analysis

interactivequeryreporting

Page 70: Spark Summit EU talk by William Benton

transactionprocessing

CONVENTIONAL DATA WAREHOUSE

transformevents

UI business logic

RDBMS analyticprocessing

RDBMS

analysis

interactivequeryreporting

Page 71: Spark Summit EU talk by William Benton

HADOOP-STYLE “DATA LAKE”

HDFS HDFS HDFS

Page 72: Spark Summit EU talk by William Benton

HADOOP-STYLE “DATA LAKE”

HDFS HDFS HDFS HDFS HDFS

Page 73: Spark Summit EU talk by William Benton

HADOOP-STYLE “DATA LAKE”

HDFS

events

HDFS HDFS HDFS HDFS

Page 74: Spark Summit EU talk by William Benton

HADOOP-STYLE “DATA LAKE”

HDFS

events

HDFS HDFS HDFS HDFS

Page 75: Spark Summit EU talk by William Benton

HADOOP-STYLE “DATA LAKE”

HDFS

compute

events

HDFS

compute

HDFS

compute compute compute

HDFS HDFS

Page 76: Spark Summit EU talk by William Benton

MODERN ARCHITECTURES

Page 77: Spark Summit EU talk by William Benton

THE LAMBDA ARCHITECTURE

events

(imprecise)analysistransform

Page 78: Spark Summit EU talk by William Benton

THE LAMBDA ARCHITECTURE

events

(imprecise)analysistransform

DFS

Page 79: Spark Summit EU talk by William Benton

speed layer

THE LAMBDA ARCHITECTURE

events

(imprecise)analysistransform

DFS

Page 80: Spark Summit EU talk by William Benton

speed layer

THE LAMBDA ARCHITECTURE

events

(precise) analysistransform

(imprecise)analysistransform

DFS

Page 81: Spark Summit EU talk by William Benton

speed layer

THE LAMBDA ARCHITECTURE

events

batch layer

(precise) analysistransform

(imprecise)analysistransform

DFS

Page 82: Spark Summit EU talk by William Benton

speed layer

THE LAMBDA ARCHITECTURE

events

batch layer

federate

(precise) analysistransform

(imprecise)analysistransform

DFS

Page 83: Spark Summit EU talk by William Benton

speed layer

THE LAMBDA ARCHITECTURE

events

batch layer

UIfederate

(precise) analysistransform

(imprecise)analysistransform

DFS

Page 84: Spark Summit EU talk by William Benton

serving layerspeed layer

THE LAMBDA ARCHITECTURE

events

batch layer

UIfederate

(precise) analysistransform

(imprecise)analysistransform

DFS

Page 85: Spark Summit EU talk by William Benton

THE KAPPA ARCHITECTURE

events

Page 86: Spark Summit EU talk by William Benton

queue for “raw data” topic

THE KAPPA ARCHITECTURE

events

Page 87: Spark Summit EU talk by William Benton

queue for “raw data” topic

THE KAPPA ARCHITECTURE

events

transform

queue for “preprocessed data” topic

Page 88: Spark Summit EU talk by William Benton

queue for “raw data” topic

THE KAPPA ARCHITECTURE

events

transform analysis

queue for “preprocessed data” topic

queue for “analysis results” topic

Page 89: Spark Summit EU talk by William Benton

queue for “raw data” topic

THE KAPPA ARCHITECTURE

events

transform analysis

queue for “preprocessed data” topic

queue for “analysis results” topic

reporting end-user UI

Page 90: Spark Summit EU talk by William Benton

DATA FEDERATION IN THE COMPUTE LAYER

aggregate

trainmodels

archive

events

databases

file, object storage

management

web and mobile

reporting

developer UItransform

transform

transform

Page 91: Spark Summit EU talk by William Benton

DATA FEDERATION IN THE COMPUTE LAYER

aggregate

trainmodels

archive

events

databases

file, object storage

management

web and mobile

reporting

developer UItransform

transform

transform

Page 92: Spark Summit EU talk by William Benton

DATA FEDERATION IN THE COMPUTE LAYER

aggregate

trainmodels

archive

events

databases

file, object storage

management

web and mobile

reporting

developer UItransform

transform

transform

Page 93: Spark Summit EU talk by William Benton

Cluster scheduler

SIDEBAR: THE MONOLITHIC SPARK ANTIPATTERN

Shared FSSpark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Resource manager

app 1 app 2

app 4app 3

Page 94: Spark Summit EU talk by William Benton

Cluster scheduler

SIDEBAR: THE MONOLITHIC SPARK ANTIPATTERN

Shared FSSpark executor

Spark executor

Spark executor

Spark executor

Spark executor

Spark executor

Resource manager

app 1 app 2

app 4app 3

Page 95: Spark Summit EU talk by William Benton

Resource manager

ONE CLUSTER PER APPLICATION

Object storesapp 1 app 2

app 5app 4

app 3

app 6

Databases

Page 96: Spark Summit EU talk by William Benton

Resource manager

ONE CLUSTER PER APPLICATION

Object storesapp 1 app 2

app 5app 4

app 3

app 6

app 2

app 4

Databases

Page 97: Spark Summit EU talk by William Benton

PRACTICALITIES AND POTENTIAL PITFALLS

Page 98: Spark Summit EU talk by William Benton

SCHEDULING

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

Object stores

Databases

Page 99: Spark Summit EU talk by William Benton

SCHEDULING

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1

Page 100: Spark Summit EU talk by William Benton

SCHEDULING

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1

Page 101: Spark Summit EU talk by William Benton

SCHEDULING

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1

Page 102: Spark Summit EU talk by William Benton

SCHEDULING

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1

Page 103: Spark Summit EU talk by William Benton

SCHEDULING

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1

Page 104: Spark Summit EU talk by William Benton

SECURITY

Page 105: Spark Summit EU talk by William Benton

SECURITY

Page 106: Spark Summit EU talk by William Benton

SECURITY

$SPARK_HOME/bin/spark-class \ org.apache.spark.deploy.worker.Worker \ master:7077

pid

root

net

Page 107: Spark Summit EU talk by William Benton

SECURITY

$SPARK_HOME/bin/spark-class \ org.apache.spark.deploy.worker.Worker \ master:7077

pid

root

net

/tmp/foo

Page 108: Spark Summit EU talk by William Benton

SECURITY

Page 109: Spark Summit EU talk by William Benton

SECURITY

k8s namespace

Page 110: Spark Summit EU talk by William Benton

SECURITY

k8s namespace*

Page 111: Spark Summit EU talk by William Benton

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

Page 112: Spark Summit EU talk by William Benton

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

POSIX filesystem

Page 113: Spark Summit EU talk by William Benton

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

POSIX filesystem

✓ familiar interface

✓ interoperability with other programs

Page 114: Spark Summit EU talk by William Benton

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

POSIX filesystem

✓ familiar interface

✓ interoperability with other programs

✗ unnecessary semantic guarantees

✗ difficult to manage

Page 115: Spark Summit EU talk by William Benton

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

HDFS HDFS

HDFS HDFS

HDFS

HDFS

Page 116: Spark Summit EU talk by William Benton

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

HDFS HDFS

HDFS HDFS

HDFS

HDFS

✓ support for legacy Hadoop installations

Page 117: Spark Summit EU talk by William Benton

✗ inelastic ✗ stateful ✗ can’t collocate

compute and data

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

HDFS HDFS

HDFS HDFS

HDFS

HDFS

✓ support for legacy Hadoop installations

Page 118: Spark Summit EU talk by William Benton

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

object store

Page 119: Spark Summit EU talk by William Benton

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

object store

✓ interoperability

✓ fine-grained AC

✓ many implementations

Page 120: Spark Summit EU talk by William Benton

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

object store

✓ interoperability

✓ fine-grained AC

✓ many implementations

✗ consistency model ✗ performance (?)

Page 121: Spark Summit EU talk by William Benton

STORAGE

Kubernetes

app 1 app 2

app 5app 4

app 3

app 6

app 1 app 2

app 5app 4

app 3

app 6

object store

✓ interoperability

✓ fine-grained AC

✓ many implementations

✗ consistency model ✗ performance

Page 122: Spark Summit EU talk by William Benton

“…in a cloud native architecture, the benefit of HDFS is actually very small and that is why many cloud-first organizations no longer run HDFS, or only run it as a caching layer for S3.”

—Reynold Xin on Quora (http://qr.ae/TAF4cN)

Page 123: Spark Summit EU talk by William Benton

NETWORKING

Page 124: Spark Summit EU talk by William Benton

NETWORKING

Page 125: Spark Summit EU talk by William Benton

NETWORKING

Page 126: Spark Summit EU talk by William Benton

NETWORKINGhttp://app1:8080

Page 127: Spark Summit EU talk by William Benton

NETWORKINGhttp://app1:8080

✗ can’t access worker web UI (but wait for Spark 2.1!)

Page 128: Spark Summit EU talk by William Benton

NETWORKINGhttp://app1:8080

✗ can’t access worker web UI (but wait for Spark 2.1!)

Page 129: Spark Summit EU talk by William Benton

NETWORKINGhttp://app1:8080

✗ can’t access worker web UI (but wait for Spark 2.1!)

http://app1:80

Page 130: Spark Summit EU talk by William Benton

NEXT STEPS: FUTURE WORK & PLAYING ALONG AT HOME

Page 131: Spark Summit EU talk by William Benton

NEXT STEPS

Further performance evaluation

Better developer experience

Improved scheduling of Spark tasks on Kubernetes

Page 132: Spark Summit EU talk by William Benton

TRY IT OUT YOURSELF

Kubernetes standalone Spark example: https://github.com/kubernetes/kubernetes/tree/master/examples/spark

Enabling Spark on OpenShift: https://github.com/radanalyticsio

Native Spark on Kubernetes proposal: https://github.com/kubernetes/kubernetes/issues/34377

Page 133: Spark Summit EU talk by William Benton

@willb • [email protected]://chapeau.freevariable.com

THANKS!