New NVIDIA GPU Support for Apache Mesos and...

51
© 2017 Mesosphere, Inc. All Rights Reserved. 1 NVIDIA GPU Support for Apache Mesos and DC/OS GPU Technology Conference - 2017 Kevin Klues [email protected]

Transcript of New NVIDIA GPU Support for Apache Mesos and...

Page 1: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved. 1

NVIDIA GPU Support forApache Mesos and DC/OS

GPU Technology Conference - 2017Kevin Klues

[email protected]

Page 2: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved. 2

Kevin Klues is a Tech Lead Manager at Mesosphere working with both the Mesos core team as well as the DC/OS Provisioning and Management team. Since joining Mesosphere, Kevin has been involved in the design and implementation of a number of Mesos’s core subsystems, including GPU isolation, Pods, and Attach/Exec support. Prior to joining Mesosphere, Kevin worked at Google on an experimental operating system for data centers called Akaros. He and a few others founded the Akaros project while working on their Ph.Ds at UC Berkeley. In a past life Kevin was a lead developer of the TinyOS project, working at Stanford, the Technical University of Berlin, and the CSIRO in Australia. When not working, you can usually find Kevin on a snowboard or up in the mountains in some capacity or another.

Page 3: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved. 3

What is Apache Mesos?

● An open-source, distributed systems kernel (a.k.a cluster manager) for fine-grained management of cluster resources and tasks

Page 4: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved. 4

What is Apache Mesos?

Page 5: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved. 5

What is Apache Mesos?

● Mesos provides its own containerization technology(called the Mesos containerizer)

● It supports the standard docker image format, but relies on its own internal implementation for building containers

● A separate docker containerizer is also available, but not relevant to this presentation

Page 6: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● DC/OS (Data Center Operating System) takes the Mesos “kernel” and builds upon it with additional services and functionality

6

What is DC/OS?

Page 7: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved. 7

What is DC/OS?

● DC/OS (Data Center Operating System) takes the Mesos “kernel” and builds upon it with additional services and functionality○ Built-in support for service discovery, load balancing, security,

and ease of installation

Page 8: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved. 8

What is DC/OS?

● DC/OS (Data Center Operating System) takes the Mesos “kernel” and builds upon it with additional services and functionality○ Built-in support for service discovery, load balancing, security,

and ease of installation○ Extra tooling (e.g. a comprehensive CLI and a GUI)

Page 9: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved. 9

What is DC/OS?

● DC/OS (Data Center Operating System) takes the Mesos “kernel” and builds upon it with additional services and functionality○ Built-in support for service discovery, load balancing, security,

and ease of installation○ Extra tooling (e.g. a comprehensive CLI and a GUI)○ Built-in frameworks for launching long running services

(Marathon) and batch jobs (Metronome)

Page 10: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved. 10

What is DC/OS?

● DC/OS (Data Center Operating System) takes the Mesos “kernel” and builds upon it with additional services and functionality○ Built-in support for service discovery, load balancing, security,

and ease of installation○ Extra tooling (e.g. a comprehensive CLI and a GUI)○ Built-in frameworks for launching long running services

(Marathon) and batch jobs (Metronome)○ A repository (app-store) for installing other common packages

and frameworks (e.g. Spark, Kafka, Cassandra)

Page 11: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved. 11

What is DC/OS?

METRONOME(Batch)

Page 12: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved. 12

What is DC/OS?

METRONOME(Batch)

Page 13: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved. 13

Overview of Talk

● Brief intro to docker and nvidia-docker● Challenges of supporting Nvidia GPUs in docker containers● How nvidia-docker addresses these challenges● How Apache Mesos addresses these challenges● DC/OS GPU Demos● Future Work

Page 14: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved. 14

Docker

● Extremely popular image format for containers○ Build once → run everywhere○ Configure once → run anything

Source: DockerCon 2016 Keynote by Docker’s CEO Ben Golub

Page 15: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved. 15

Nvidia-docker

Wrapper around docker to allow GPUs to be used inside docker containers

Page 16: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved. 16

Nvidia-docker

Machine Learning Frameworks

Support exists for many popular machine learning frameworks with nvidia-docker (including TensorFlow, Caffe, CNTK, etc.)

Source: https://data-shaker.com/docker-tensorflow-with-jupyter-notebook-on-windows/

Page 17: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved. 17

Overall Goal

Test locally with nvidia-docker

Deploy to production with DC/OS

Page 18: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● Before containers it was easy

Challenges of Supporting Nvidia GPUs in Docker containers

18

Page 19: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● Before containers it was easy○ Buy some GPUs

Challenges of Supporting Nvidia GPUs in Docker containers

19

Page 20: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● Before containers it was easy○ Buy some GPUs○ Install them on your box

Challenges of Supporting Nvidia GPUs in Docker containers

20

Linux Kernel

Page 21: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● Before containers it was easy○ Buy some GPUs○ Install them on your box○ Install the base nvidia drivers

Challenges of Supporting Nvidia GPUs in Docker containers

21

Linux Kernelnvidia-kernel-module

nvidia base libraries

Page 22: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● Before containers it was easy○ Buy some GPUs○ Install them on your box○ Install the base nvidia drivers○ Install some advanced

toolkit libraries

Challenges of Supporting Nvidia GPUs in Docker containers

22

Linux Kernelnvidia-kernel-module

nvidia base libraries

CUDA / TensorFlow libraries

Page 23: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● Before containers it was easy○ Buy some GPUs○ Install them on your box○ Install the base nvidia drivers○ Install some advanced toolkit libraries○ Link a GPU accelerated application

against these libraries

Challenges of Supporting Nvidia GPUs in Docker containers

23

Linux Kernelnvidia-kernel-module

nvidia base libraries

CUDA / TensorFlow libraries

Application

Page 24: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● Before containers it was easy○ Buy some GPUs○ Install them on your box○ Install the base nvidia drivers○ Install some advanced toolkit libraries○ Link a GPU accelerated application

against these libraries○ Run your application

Challenges of Supporting Nvidia GPUs in Docker containers

24

Linux Kernelnvidia-kernel-module

nvidia base libraries

CUDA / TensorFlow libraries

Application

Page 25: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● So what about containers?○ Buy some GPUs○ Install them on your box○ Install the nvidia-kernel-modules

Challenges of Supporting Nvidia GPUs in Docker containers

25

Linux Kernelnvidia-kernel-module

Page 26: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● So what about containers?○ Buy some GPUs○ Install them on your box○ Install the nvidia-kernel-module○ Build a docker image

■ Bundle the base nvidia libraries■ Bundle some advanced toolkit libraries■ Bundle a GPU accelerated

application to use these libraries

Challenges of Supporting Nvidia GPUs in Docker containers

26

Container

Linux Kernelnvidia-kernel-module

nvidia base libraries

CUDA / TensorFlow libraries

Application

Page 27: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● So what about containers?○ Buy some GPUs○ Install them on your box○ Install the nvidia-kernel-module○ Build a docker image

■ Bundle the base nvidia libraries■ Bundle some advanced toolkit libraries■ Bundle a GPU accelerated

application to use these libraries○ Run your docker container

Challenges of Supporting Nvidia GPUs in Docker containers

27

Container

Linux Kernelnvidia-kernel-module

nvidia base libraries

CUDA / TensorFlow libraries

Application

Page 28: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● Straightforward, right?

Challenges of Supporting Nvidia GPUs in Docker containers

28

Container

Linux Kernelnvidia-kernel-module

nvidia base libraries

CUDA / TensorFlow libraries

Application

Page 29: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● Will only work if the kernel / user driver versions match

Challenges of Supporting Nvidia GPUs in Docker containers

29

Linux Kernelnvidia-kernel-module (v1)

Linux Kernelnvidia-kernel-module (v2)

Container

nvidia base libraries (v1)

CUDA / TensorFlow libraries

Application

Page 30: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● Won’t work if they don’t

Challenges of Supporting Nvidia GPUs in Docker containers

30

Linux Kernelnvidia-kernel-module (v1)

Linux Kernelnvidia-kernel-module (v2)

Container

nvidia base libraries (v1)

CUDA / TensorFlow libraries

Application

Page 31: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● Either way, you have to map in the GPU devices somehow

Challenges of Supporting Nvidia GPUs in Docker containers

31

Linux Kernelnvidia-kernel-module (v1)

Linux Kernelnvidia-kernel-module (v2)

Container

nvidia base libraries (v1)

CUDA / TensorFlow libraries

Application

Page 32: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

nvidia-docker and GPUs

32

● Components of nvidia-docker○ Set of docker images that set custom labels / environment variables○ nvidia-docker-plugin (standard docker volume plugin)○ nvidia-docker (wrapper script around docker itself)

docker run ...

nvidia-docker run ...

Page 33: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

nvidia-docker and GPUs

33

● nvidia-docker-plugin

Finds all standard nvidia libraries / binaries on the host and consolidates them into a single place as a docker volume

/var/lib/docker/volumes└── nvidia_XXX.XX (version number) ├── bin ├── lib └── lib64

Page 34: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

nvidia-docker and GPUs

34

● nvidia-docker wrapper script

Looks for the label:com.nvidia.volumes.needed = nvidia_driver

When found, it maps the nvidia_XXX.XX volume into the container at:/usr/local/nvidia

Enumerates all GPUs on the machine and maps them into the container as available devices

Passes all other docker options straight through to docker

Page 35: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

nvidia-docker and GPUs

35

Linux Kernelnvidia-kernel-module (v1)

Linux Kernelnvidia-kernel-module (v2)

Container

nvidia base libraries (v1)

CUDA / TensorFlow libraries

ApplicationContainer

nvidia base libraries (v2)

CUDA / TensorFlow libraries

Application

Page 36: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

nvidia-docker and GPUs

36

Linux Kernelnvidia-kernel-module (v1)

Linux Kernelnvidia-kernel-module (v2)

Container

nvidia base libraries (v1)

CUDA / TensorFlow libraries

ApplicationContainer

nvidia base libraries (v2)

CUDA / TensorFlow libraries

Application

Page 37: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● Mimics functionality of nvidia-docker○ Supports nvidia docker images with custom labels○ Maps consolidated volume of binaries / libraries into

/usr/local/nvidia○ Enumerates GPUs and injects them into containers○ Isolates access to GPUs between tasks

Apache Mesos and GPUs

37

Page 38: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

Apache Mesos and GPUs

(Unified) Mesos

Containerizer

Containerizer APIMesos Agent

Isolator API

CPU

Mem

ory

38

Page 39: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

Apache Mesos and GPUs

(Unified) Mesos

Containerizer

Containerizer APIMesos Agent

Isolator API

CPU

Mem

ory

GPU

39

Page 40: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

Apache Mesos and GPUs

(Unified) Mesos

Containerizer

Containerizer APIMesos Agent

Isolator API

CPU

Mem

ory

GPU

Nvidia GPU Isolator

Linux devices cgroup

Nvidia GPU

Allocator

Nvidia Volume

Manager

40

Page 41: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

Apache Mesos and GPUs

(Unified) Mesos

Containerizer

Containerizer APIMesos Agent

Isolator API

CPU

Mem

ory

GPU

41

Nvidia GPU Isolator

Linux devices cgroup

Nvidia GPU

Allocator

Nvidia Volume

Manager

Mimics functionality of nvidia-docker-plugin

Allocates GPUs totasks

Isolates Access to GPUs between tasks

Page 42: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

DC/OS GPU Demos

42

● Simple isolation demo

● Single node Tensorflow demo

● Distributed Tensorflow demo (Future work)

Page 43: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

Simple Isolation Demo

43

● 1 master, 1 agent - 8 GPUs

● ssh into agent and run nvidia-smi locallyto show all 8 GPUs present

● Launch 2 container instances andallocate 4 GPUs to each

● Run nvidia-smi in each container to showallocation of 4 GPUs to each

Page 44: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

https://youtu.be/z9gzzbjE-JE

Simple Isolation Demo

44

Page 45: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

Single Node Tensorflow Demo

45

● 1 master, 1 agent - 8 GPUs

● Launch standard Tensorflow docker image

● Show standard Jupyter Notebook running

● Exec into the running container and download Tensorflow Exampleshttps://github.com/aymericdamien/TensorFlow-Examples

● Run the Multi-GPU example

Page 46: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

Single Node Tensorflow Demo

46

https://youtu.be/wumsAoUy0cQ

Page 47: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● Integrate Different Machine Learning Frameworks with the DC/OS SDK○ https://github.com/mesosphere/dcos-commons

● Distributed TensorFlow with TFMesos ○ https://github.com/douban/tfmesos

● Distributed Mxnet○ https://github.com/dmlc/mxnet

● One click install in the DC/OS Universe(The DC/OS notion of an app-store)

Future Work

47

Page 48: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● Topology aware scheduling ● GPU Sharing (virtual GPUs)

● GPU consumption metrics

Future Work

48

Task

Task

Task

Page 49: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● Topology aware scheduling ● GPU Sharing (virtual GPUs)

● GPU consumption metrics

Future Work

49

Task

Task

Task

Contributions Welcome!

Page 50: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

Special Thanks to All Collaborators

50

Vikram Ditya

Andrew Iles

Jonathan Calmels

Felix Abecassis

Rob Todd

Rajat Phull

Shivi Fotedar

Yubo Li

Seetharami Seelam

Yong Feng

Guangya Liu

Ian Downes

Niklas Nielson

Connor Doyle

Benjamin Mahler

Tim Chen

Page 51: New NVIDIA GPU Support for Apache Mesos and DC/OSon-demand.gputechconf.com/gtc/2017/presentation/s7160... · 2017. 5. 18. · Overview of Talk Brief intro to docker and nvidia-docker

© 2017 Mesosphere, Inc. All Rights Reserved.

● Apache Mesos○ http://mesos.apache.org/

● Open DC/OS○ https://dcos.io/

● Enterprise DC/OS○ https://mesosphere.com/product/

● GPU Related Documentation for Mesos and DC/OS○ https://github.com/apache/mesos/blob/master/docs/gpu-support.md○ https://dcos.io/docs/1.9/deploying-services/gpu/config/

Questions and Links

51