Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik...

34
Teaching Apache Spark Applications to Manage Their Workers Elastically Erik Erlandson Trevor McKay Red Hat, Inc.

Transcript of Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik...

Page 1: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Teaching Apache Spark Applications to Manage Their Workers ElasticallyErik ErlandsonTrevor McKayRed Hat, Inc.

Page 2: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

IntroductionErik ErlandsonSenior SW Engineer; Red Hat, Inc.Emerging Technologies Group

Internal Data ScienceInsightful Applications

Trevor McKayPrincipal SW Engineer; Red Hat, Inc.Emerging Technologies Group

Oshinko DevelopmentInsightful Applications

Page 3: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Outline• Trevor

– container orchestration– containerizing spark

• Erik– spark dynamic allocation– metrics– elastic worker daemon

• Demo

Page 4: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Containerizing Spark• Container 101

– What is a container?– Docker, Kubernetes and OpenShift

• Why Containerize Spark?• Oshinko

– features– components– cluster creation example

Page 5: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

What is a container?

• A process running in a namespace on a container host– separate process table, file system, and routing table– base operating system elements– application-specific code

• Resources can be limited through cgroups

Page 6: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Docker and Kubernetes• “Docker is the world's leading software

containerization platform” www.docker.com

– Open-source tool to build, store, and run containers– Images can be stored in shared registries

• “Kubernetes is an open-source platform for automating deployment, scaling, and operations of application containers across clusters of hosts” kubernetes.io

Page 7: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

OpenShift Origin• Built around a core of Docker and Kubernetes• Adds application lifecycle management

functionality and DevOps tooling. www.openshift.org/

– multi-tenancy– Source-to-Image (S2I)

• Runs on your laptop with “oc cluster up”

Page 8: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Why Containerize Spark?• Repeatable clusters with no mandatory config• Normal users can create a cluster

– No special privileges, just an account on a management platform

Page 9: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Why Containerize Spark?• Containers allow a cluster-per-app model

– Quick to spin up and spin down– Isolation == multiple clusters on the same host– Data can still be shared through common endpoints– Do I need to share a large dedicated cluster?

Page 10: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Why containerize Spark?• Ephemeral clusters conserve resources• Kubernetes makes horizontal scale out simple

– Elastic Worker daemon builds on this foundation– Elasticity further conserves resources

Page 11: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Deeper on Spark + Containers

Optimizing Spark Deployments for Containers: Isolation, Safety, and Performance● William Benton (Red Hat)● Thursday, February 9● 11:40 AM – 12:10 PM● Ballroom A

Page 12: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Oshinko: Simplifying further• Containers simplify deployment but still lots to do ...

– Create the master and worker containers– Handle spark configuration– Wire the cluster together– Allow access to http endpoints– Tear it all down when you’re done

• Oshinko treats clusters as abstractions and does this work for us

Page 13: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Oshinko Features• CLI, web UI, and REST interfaces• Cluster creation with sane defaults (name only)• Scale and delete with simple commands• Advanced configuration

– Enable features like Elastic Workers with a flag– Specify images and spark configuration files– Cluster configurations persisted in Kubernetes

• Source-to-Image integration (pyspark, java, scala)

Page 14: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

pod

Oshinko ComponentsOshinko web UI

Oshinko REST

Oshinko Core

Oshinko CLI

Oshinko Core

Oshinko OpenShift console OpenShift and Kubernetes API

servers

Oshinko CLI

Oshinko Core

pod

s2iimage pod

Oshinko CLI

Oshinko Core

Launch script and user code

Page 15: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Creating a ClusterCLI from a shell …

$ oshinko-cli create mycluster --storedconfig=clusterconfig \ --insecure-skip-tls-verify=true --token=$TOKEN

Using REST from Python …

import requestsr = requests.post("http://oshinko-rest/clusters", json={"name": clustername, "config": {"name": "clusterconfig"} })

Page 16: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

What is a cluster config?$ oc export configmap clusterconfig

apiVersion: v1data: metrics.enable: "true" scorpionstare.enable: "true" sparkimage: docker.io/manyangled/var-spark-worker:latest sparkmasterconfig: masterconfig sparkworkerconfig: workerconfigkind: ConfigMapmetadata: creationTimestamp: null name: clusterconfig

Page 17: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Source for demo’s oshinko-rest• Metrics implementation is being reviewed

– using carbon and graphite today– investigating jolokia metrics

• Metrics and elastic workers currently supported athttps://github.com/tmckayus/oshinko-rest/tree/metrics

• Both features will be merged to oshinko master soon

Page 18: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Dynamic Allocation

CountBacklogJobs

> E ?Requestmin(2x current, backlog)

WaitInterval

ReportTarget asMetric

Shut DownIdle Executors

Yes

No

spark .dynamicAllocation .enabled

Page 19: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Dynamic Allocation

CountBacklogJobs

> E ?Requestmin(2x current, backlog)

WaitInterval

ReportTarget asMetric

Shut DownIdle Executors

Yes

Nospark .dynamicAllocation .maxExecutors

Page 20: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Dynamic Allocation

CountBacklogJobs

> E ?Requestmin(2x current, backlog)

WaitInterval

ReportTarget asMetric

Shut DownIdle Executors

Yes

No

spark .dynamicAllocation .schedulerBacklogTimeout

*.sink.graphite.host

Page 21: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Dynamic Allocation

CountBacklogJobs

> E ?Requestmin(2x current, backlog)

WaitInterval

ReportTarget asMetric

Shut DownIdle Executors

Yes

Nospark .dynamicAllocation .executorIdleTimeout

Page 22: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Executor Scalingspark.dynamicAllocation.initialExecutors

>= spark.dynamicAllocation.minExecutors

<= spark.dynamicAllocation.maxExecutors

<= backlog jobs (<= RDD partitions)

Page 23: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Shuffle Service• Caches shuffle results independent of Executor• Saves results if Executor is shut down• Required for running Dynamic Allocation• spark.shuffle.service.enabled = true

Page 24: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Dynamic Allocation MetricsPublished by the ExecutorAllocationManager

numberExecutorsToAdd Additional executors requested

numberExecutorsPendingToRemove Executors being shut down

numberAllExecutors Executors in any state

numberTargetExecutors Total requested (current+additional)

numberMaxNeededExecutors Maximum that could be loaded

Page 25: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Elastic Worker Daemon

driver

metricservice

elasticdaemon

oshinkoAPIserver

sparkworkers

sparkworkers

sparkworkersspark

master

RE

ST

RE

ST

openshiftAPIserver

Spark Master Pod Spark Worker PodsExecutor Request

Page 26: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Elastic Worker Daemon

driver

metricservice

elasticdaemon

oshinkoAPIserver

sparkworkers

sparkworkers

sparkworkersspark

master

RE

ST

RE

ST

openshiftAPIserver

numberTargetExecutors

Spark Master Pod Spark Worker Pods

Page 27: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Elastic Worker Daemon

driver

metricservice

elasticdaemon

oshinkoAPIserver

sparkworkers

sparkworkers

sparkworkersspark

master

REST

REST

openshiftAPIserver

numberTargetExecutors

Spark Master Pod Spark Worker Pods

Page 28: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Elastic Worker Daemon

driver

metricservice

elasticdaemon

oshinkoAPIserver

sparkworkers

sparkworkers

sparkworkersspark

master

RE

ST

RE

ST

openshiftAPIserver

Spark Master Pod Spark Worker Pods

Page 29: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Elastic Worker Daemon

driver

metricservice

elasticdaemon

oshinkoAPIserver

sparkworkers

sparkworkers

sparkworkersspark

master

RE

ST

RE

ST

openshiftAPIserver

Spark Master Pod Spark Worker Pods

PodRep

licati

on

Page 30: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Demo

Demo

Page 31: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Radanalytics.ioNew community landing page at

http://radanalytics.io/

Page 32: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Where to Find OshinkoOshinko and related bits:

http://github.com/radanalyticsio/

Docker images:https://hub.docker.com/u/radanalyticsio/

Images and notebook for today’s demo:https://hub.docker.com/u/tmckay/

https://hub.docker.com/u/manyangled/https://github.com/erikerlandson/var-notebook/pulls

Page 33: Teaching Apache Spark Clusters to Manage Their Workers Elastically: Spark Summit East talk by Erik Erlandson and Trevor McKay

Related Effort: Spark on K8s• Native scheduler backend for Kubernetes• https://github.com/apache-spark-on-k8s/spark• Developer Community Collaboration