Interconnect Your Future - PRACE Agenda Systems (Indico) · © 2019 Mellanox Technologies |...

11
© 2019 Mellanox Technologies | Confidential 1 Paving the Road to Exascale March 2019 Interconnect Your Future

Transcript of Interconnect Your Future - PRACE Agenda Systems (Indico) · © 2019 Mellanox Technologies |...

Page 1: Interconnect Your Future - PRACE Agenda Systems (Indico) · © 2019 Mellanox Technologies | Confidential1 Paving the Road to Exascale March 2019 Interconnect Your Future

© 2019 Mellanox Technologies | Confidential 1

Paving the Road to ExascaleMarch 2019

Interconnect Your Future

Page 2: Interconnect Your Future - PRACE Agenda Systems (Indico) · © 2019 Mellanox Technologies | Confidential1 Paving the Road to Exascale March 2019 Interconnect Your Future

© 2019 Mellanox Technologies | Confidential 2

Higher Data SpeedsHigher Data Speeds

Faster Data ProcessingFaster Data Processing

Better Data SecurityBetter Data Security

Adapters SwitchesCables &

Transceivers

SmartNIC System on a Chip

HPC and AI Needs the Most Intelligent Interconnect

Page 3: Interconnect Your Future - PRACE Agenda Systems (Indico) · © 2019 Mellanox Technologies | Confidential1 Paving the Road to Exascale March 2019 Interconnect Your Future

© 2019 Mellanox Technologies | Confidential 3

The Need for Intelligent and Faster Interconnect

CPU-Centric (Onload) Data-Centric (Offload)

Must Wait for the DataCreates Performance Bottlenecks

Faster Data Speeds and In-Network Computing

Enable Higher Performance and Scale

GPU

CPU

GPU

CPU

Onload Network In-Network Computing

GPU

CPU

CPU

GPU

GPU

CPU

GPU

CPU

GPU

CPU

CPU

GPU

Analyze Data as it Moves!Higher Performance and Scale

Page 4: Interconnect Your Future - PRACE Agenda Systems (Indico) · © 2019 Mellanox Technologies | Confidential1 Paving the Road to Exascale March 2019 Interconnect Your Future

© 2019 Mellanox Technologies | Confidential 4

Data Centric Architecture to Overcome Latency Bottlenecks

CPU-Centric (Onload) Data-Centric (Offload)

Communications Latencies of 30-40us

Intelligent Interconnect Paves the Road to Exascale Performance

GPU

CPU

GPU

CPU

GPU

CPU

CPU

GPU

GPU

CPU

GPU

CPU

GPU

CPU

CPU

GPU

Communications Latencies of 3-4us

Page 5: Interconnect Your Future - PRACE Agenda Systems (Indico) · © 2019 Mellanox Technologies | Confidential1 Paving the Road to Exascale March 2019 Interconnect Your Future

© 2019 Mellanox Technologies | Confidential 5

Accelerating All Levels of HPC/AI Frameworks

GPUDirect

RDMA

Network Framework

Communication Framework

Application Framework

Data Analysis Data Analysis

SHARP MPI Tag Matching MPI Rendezvous Software Defined Virtual Devices

SHARP MPI Tag Matching MPI Rendezvous Software Defined Virtual Devices

Network Transport Offload RDMA GPU-Direct SHIELD (self-healing network)

Network Transport Offload RDMA GPU-Direct SHIELD (self-healing network)

Page 6: Interconnect Your Future - PRACE Agenda Systems (Indico) · © 2019 Mellanox Technologies | Confidential1 Paving the Road to Exascale March 2019 Interconnect Your Future

© 2019 Mellanox Technologies | Confidential 6

Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)

Reliable Scalable General Purpose Primitive In-network Tree based aggregation mechanism Large number of groups Multiple simultaneous outstanding operations

Applicable to Multiple Use-cases HPC Applications using MPI / SHMEM Distributed Machine Learning applications

Scalable High Performance Collective Offload Barrier, Reduce, All-Reduce, Broadcast and more Sum, Min, Max, Min-loc, max-loc, OR, XOR, AND Integer and Floating-Point, 16/32/64 bits

SHArP Tree

SHARP Tree Aggregation Node (Process running on HCA)

SHARP Tree Endnode(Process running on HCA)

SHARP Tree Root

Page 7: Interconnect Your Future - PRACE Agenda Systems (Indico) · © 2019 Mellanox Technologies | Confidential1 Paving the Road to Exascale March 2019 Interconnect Your Future

© 2019 Mellanox Technologies | Confidential 7

SHARP AllReduce Performance Advantages (128 Nodes)

SHARP enables 75% Reduction in LatencyProviding Scalable Flat Latency

SHARP enables 75% Reduction in LatencyProviding Scalable Flat Latency

Scalable Hierarchical Aggregation and

Reduction Protocol

Page 8: Interconnect Your Future - PRACE Agenda Systems (Indico) · © 2019 Mellanox Technologies | Confidential1 Paving the Road to Exascale March 2019 Interconnect Your Future

© 2019 Mellanox Technologies | Confidential 8

SHARP AllReduce Performance Advantages 1500 Nodes, 60K MPI Ranks, Dragonfly+ Topology

SHARP Enables Highest PerformanceSHARP Enables Highest PerformanceScalable Hierarchical Aggregation and

Reduction Protocol

Page 9: Interconnect Your Future - PRACE Agenda Systems (Indico) · © 2019 Mellanox Technologies | Confidential1 Paving the Road to Exascale March 2019 Interconnect Your Future

© 2019 Mellanox Technologies | Confidential 9

SHARP Performance Advantage for AI

SHARP provides 16% Performance Increase for deep learning, initial results TensorFlow with Horovod running ResNet50 benchmark, HDR InfiniBand (ConnectX-6, Quantum)

16%

Page 10: Interconnect Your Future - PRACE Agenda Systems (Indico) · © 2019 Mellanox Technologies | Confidential1 Paving the Road to Exascale March 2019 Interconnect Your Future

© 2019 Mellanox Technologies | Confidential10

InfiniBand Roadmap

Page 11: Interconnect Your Future - PRACE Agenda Systems (Indico) · © 2019 Mellanox Technologies | Confidential1 Paving the Road to Exascale March 2019 Interconnect Your Future

© 2019 Mellanox Technologies | Confidential11

Thank You