Composable Infrastructure for On-Prem Kubernetes-Based … › video › gputechconf › ... ·...
Transcript of Composable Infrastructure for On-Prem Kubernetes-Based … › video › gputechconf › ... ·...
![Page 1: Composable Infrastructure for On-Prem Kubernetes-Based … › video › gputechconf › ... · 2019-03-29 · On-Prem Kubernetes-Based Systems S9572 Subrahmanyam Ongole Architect](https://reader030.fdocuments.net/reader030/viewer/2022041112/5f1ce0f3897fc13cfb7017e0/html5/thumbnails/1.jpg)
2019 One Convergence, Inc. All rights reserved GTC 2019
Composable Infrastructure for On-Prem Kubernetes-Based SystemsS9572
Subrahmanyam Ongole Architect One Convergence, Inc.
![Page 2: Composable Infrastructure for On-Prem Kubernetes-Based … › video › gputechconf › ... · 2019-03-29 · On-Prem Kubernetes-Based Systems S9572 Subrahmanyam Ongole Architect](https://reader030.fdocuments.net/reader030/viewer/2022041112/5f1ce0f3897fc13cfb7017e0/html5/thumbnails/2.jpg)
GTC 2019
�2
Agenda
● Introduction● State of the art● Problem description● Proposal● Scale-Out performance
![Page 3: Composable Infrastructure for On-Prem Kubernetes-Based … › video › gputechconf › ... · 2019-03-29 · On-Prem Kubernetes-Based Systems S9572 Subrahmanyam Ongole Architect](https://reader030.fdocuments.net/reader030/viewer/2022041112/5f1ce0f3897fc13cfb7017e0/html5/thumbnails/3.jpg)
GTC 2019
�3
Introduction
● One Convergence Productso http://www.oneconvergence.com
● Topico GPU Composition for Kubernetes workloads
DFabric
![Page 4: Composable Infrastructure for On-Prem Kubernetes-Based … › video › gputechconf › ... · 2019-03-29 · On-Prem Kubernetes-Based Systems S9572 Subrahmanyam Ongole Architect](https://reader030.fdocuments.net/reader030/viewer/2022041112/5f1ce0f3897fc13cfb7017e0/html5/thumbnails/4.jpg)
GTC 2019
�4
Scale-Out Architecture
● Why Scale-out?o Scale-up vs Scale-out
▪ Affordable GPU servers▪ Incrementally add new GPU hardware▪ Resiliency - No single point of failure▪ Higher network speeds via RDMA NICs
o Challenges▪ Cluster management▪ Workload orchestration▪ Resource management ▪ Achieving best performance
o On-Prem▪ Cloud providers address this▪ On-Prem needs to be solved
. . .
Scale-Up System
8-16 GPUs
Scale-Out Systems
2-4 GPUs
RDMA NIC High Speed Interconnect
![Page 5: Composable Infrastructure for On-Prem Kubernetes-Based … › video › gputechconf › ... · 2019-03-29 · On-Prem Kubernetes-Based Systems S9572 Subrahmanyam Ongole Architect](https://reader030.fdocuments.net/reader030/viewer/2022041112/5f1ce0f3897fc13cfb7017e0/html5/thumbnails/5.jpg)
GTC 2019
�5
Platform of Choice
● Kuberneteso Cluster management o Container orchestrationo Standard interfaces for Network and Storage
▪ CNI & CSIo Node-specific resource management
▪ Device plugins for GPUs, RDMA, etc
![Page 6: Composable Infrastructure for On-Prem Kubernetes-Based … › video › gputechconf › ... · 2019-03-29 · On-Prem Kubernetes-Based Systems S9572 Subrahmanyam Ongole Architect](https://reader030.fdocuments.net/reader030/viewer/2022041112/5f1ce0f3897fc13cfb7017e0/html5/thumbnails/6.jpg)
GTC 2019
�6
GPU Allocation
● POD Specresources: limits: nvidia.com/gpu: 2 # requesting 2 GPUs
● Different types of GPUso Label each node with the type of GPU
kubectl label nodes <node-with-k80> accelerator=nvidia-tesla-k80kubectl label nodes <node-with-p100> accelerator=nvidia-tesla-p100
o Specify using node selectors in the POD specnodeSelector:
accelerator: nvidia-tesla-p100 # or nvidia-tesla-k80 etc.
![Page 7: Composable Infrastructure for On-Prem Kubernetes-Based … › video › gputechconf › ... · 2019-03-29 · On-Prem Kubernetes-Based Systems S9572 Subrahmanyam Ongole Architect](https://reader030.fdocuments.net/reader030/viewer/2022041112/5f1ce0f3897fc13cfb7017e0/html5/thumbnails/7.jpg)
GTC 2019
�7
Challenges
● User needs to be aware ofo GPU vendor, Type of GPU and GPU nodes
● Resource segmentationo Experimental vs Production jobs
● Better utilization of GPUso Schedule by mutual agreement
● Multi-usero Isolation of workloads
● Cluster changeso Scale-out/scale-down o GPU health
● Topologyo RDMA, NVLink®, etc
● Complex with increasing number of users/nodes
![Page 8: Composable Infrastructure for On-Prem Kubernetes-Based … › video › gputechconf › ... · 2019-03-29 · On-Prem Kubernetes-Based Systems S9572 Subrahmanyam Ongole Architect](https://reader030.fdocuments.net/reader030/viewer/2022041112/5f1ce0f3897fc13cfb7017e0/html5/thumbnails/8.jpg)
GTC 2019
�8
Extending Kubernetes
● Custom Resourceso Dynamically extend Kubernetes APIo CRDs - Custom Resource Definitions
▪ Handled by API server▪ Uses Kubernetes storage▪ Custom Controller provides Declarative API
o Aggregated APIs▪ Separate service, Complex▪ Custom storage
● Operatorso Combines Custom Resources & Custom Controllerso Domain knowledgeo Examples
▪ Etcd, Prometheus operators▪ Tf operator in Kubeflow
![Page 9: Composable Infrastructure for On-Prem Kubernetes-Based … › video › gputechconf › ... · 2019-03-29 · On-Prem Kubernetes-Based Systems S9572 Subrahmanyam Ongole Architect](https://reader030.fdocuments.net/reader030/viewer/2022041112/5f1ce0f3897fc13cfb7017e0/html5/thumbnails/9.jpg)
GTC 2019
�9
DFabric
DFabric
![Page 10: Composable Infrastructure for On-Prem Kubernetes-Based … › video › gputechconf › ... · 2019-03-29 · On-Prem Kubernetes-Based Systems S9572 Subrahmanyam Ongole Architect](https://reader030.fdocuments.net/reader030/viewer/2022041112/5f1ce0f3897fc13cfb7017e0/html5/thumbnails/10.jpg)
GTC 2019
PCIe Ethernet
Compose & Monitor
Resource Management & Optimization
User/Group Mgmt
K8S
Sche
dulin
g
Device Plug-in
�10
DFabric Architecture
Microsegmentation (Pools)
APIs/Operators
![Page 11: Composable Infrastructure for On-Prem Kubernetes-Based … › video › gputechconf › ... · 2019-03-29 · On-Prem Kubernetes-Based Systems S9572 Subrahmanyam Ongole Architect](https://reader030.fdocuments.net/reader030/viewer/2022041112/5f1ce0f3897fc13cfb7017e0/html5/thumbnails/11.jpg)
2019 One Convergence, Inc. All rights reserved
GTC 2019
�11
Pool & Group Benefits
● Abstracts resourceso User doesn't need to be aware of GPU hardwareo Groups determine GPU association
● Better utilization of GPUso Better distribution of workload
● Isolation of workloadso Separate Namespace per user
● Topology awarenesso Schedules RDMA/GD wherever applicable
● Monitors changes to clustero Scale-out/Scale-downo GPU health
![Page 12: Composable Infrastructure for On-Prem Kubernetes-Based … › video › gputechconf › ... · 2019-03-29 · On-Prem Kubernetes-Based Systems S9572 Subrahmanyam Ongole Architect](https://reader030.fdocuments.net/reader030/viewer/2022041112/5f1ce0f3897fc13cfb7017e0/html5/thumbnails/12.jpg)
GTC 2019
�12
Disaggregated PCIe JBOG
● Introduction● Static composition
o Fixed at node composition time
● Dynamic compositiono Dynamically attaches to PODo GPUs move across nodeso Device plugin requirements
NIC SSD GPU
PCIe Switch
NIC SSD GPU
PCIe Switch
GPU GPU GPU
PCIe Fabric
Host Host Host
![Page 13: Composable Infrastructure for On-Prem Kubernetes-Based … › video › gputechconf › ... · 2019-03-29 · On-Prem Kubernetes-Based Systems S9572 Subrahmanyam Ongole Architect](https://reader030.fdocuments.net/reader030/viewer/2022041112/5f1ce0f3897fc13cfb7017e0/html5/thumbnails/13.jpg)
GTC 2019
�13
Scale-Out Performance
3 Node ClusterEach node contains:
o Lenovo™ Thinksystem™ SD530o Intel® Xeon® Gold 6148 @2.4 GHz▪ 384 GB RAM▪ 20 Cores
o 2 NVIDIA® V100 GPUs / 16GBo Mellanox® 100Gbps ConnectX®-5▪ RDMA NIC
o CUDA 9.0o Cudnn 7.4.1.5-1o TensorFlow 1.12o Mellanox OFED 4.5-1.0.1.0o NCCL openmpi-3.0.0o Horovod: 0.15.2o DKube/DFabric™ 1.0.3
With RDMA
Without RDMA
![Page 14: Composable Infrastructure for On-Prem Kubernetes-Based … › video › gputechconf › ... · 2019-03-29 · On-Prem Kubernetes-Based Systems S9572 Subrahmanyam Ongole Architect](https://reader030.fdocuments.net/reader030/viewer/2022041112/5f1ce0f3897fc13cfb7017e0/html5/thumbnails/14.jpg)
GTC 2019
�14
Summary
● Scale out architectureo http://www.oneconvergence.com/blogs/
● Platform requirementso DFabrico http://www.oneconvergence.com/dfab
Thank YouQuestions?