Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC &...

32
Paving the Road to Exascale September 2016, HPC Advisory Council Spain Conference Interconnect Your Future

Transcript of Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC &...

Page 1: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

Paving the Road to Exascale

September 2016, HPC Advisory Council Spain Conference

Interconnect Your Future

Page 2: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 2

Mellanox Connects the World’s Fastest Supercomputer

93 Petaflop performance, 3X higher versus #2 on the TOP500

40K nodes, 10 million cores, 256 cores per CPU

Mellanox adapter and switch solutions

National Supercomputing Center in Wuxi, China

#1 on the TOP500 Supercomputing List

The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems

Mellanox connects 41.2% of overall TOP500 systems

Mellanox connects 70.4% of the TOP500 HPC platforms

Mellanox connects 46 Petascale systems, Nearly 50% of the total Petascale systems

InfiniBand is the Interconnect of Choice for

HPC Compute and Storage Infrastructures

Page 3: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 3

Accelerating HPC Leadership Systems

“Summit” System “Sierra” System

Proud to Pave the Path to Exascale

Page 4: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 4

The Ever Growing Demand for Higher Performance

2000 202020102005

“Roadrunner”

1st

2015

Terascale Petascale Exascale

Single-Core to Many-CoreSMP to Clusters

Performance Development

Co-Design

HW SW

APP

Hardware

Software

Application

The Interconnect is the Enabling Technology

Page 5: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 5

The Intelligent Interconnect to Enable Exascale Performance

CPU-Centric Co-Design

Work on The Data as it Moves

Enables Performance and Scale

Must Wait for the Data

Creates Performance Bottlenecks

Limited to Main CPU Usage

Results in Performance Limitation

Creating Synergies

Enables Higher Performance and Scale

Page 6: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 6

Breaking the Application Latency Wall

Today: Network device latencies are on the order of 100 nanoseconds

Challenge: Enabling the next order of magnitude improvement in application performance

Solution: Creating synergies between software and hardware – intelligent interconnect

Intelligent Interconnect Paves the Road to Exascale Performance

10 years ago

~10

microsecond

~100

microsecond

NetworkCommunication

Framework

Today

~10

microsecond

Communication

Framework

~0.1

microsecond

Network

~1

microsecond

Communication

Framework

Future

~0.05

microsecond

Co-Design

Network

Page 7: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 7

Mellanox Smart Interconnect Solutions: Switch-IB 2 and ConnectX-5

Page 8: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 8

Switch-IB 2 and ConnectX-5 Smart Interconnect Solutions

Delivering 10X Performance Improvement for MPI

and SHMEM/PGAS Communications

Barrier, Reduce, All-Reduce, Broadcast

Sum, Min, Max, Min-loc, max-loc, OR, XOR, AND

Integer and Floating-Point, 32 / 64 bit

SHArP Enables Switch-IB 2 to Execute Data

Aggregation / Reduction Operations in the Network

PCIe Gen3 and Gen4

Integrated PCIe Switch

Advanced Dynamic Routing

MPI Collectives in Hardware

MPI Tag Matching in Hardware

In-Network Memory

100Gb/s Throughput

0.6usec Latency (end-to-end)

200M Messages per Second

Page 9: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 9

Highest-Performance 100Gb/s Interconnect Solutions

Transceivers

Active Optical and Copper Cables

(10 / 25 / 40 / 50 / 56 / 100Gb/s) VCSELs, Silicon Photonics and Copper

36 EDR (100Gb/s) Ports, <90ns Latency

Throughput of 7.2Tb/s

7.02 Billion msg/sec (195M msg/sec/port)

100Gb/s Adapter, 0.6us latency

200 million messages per second

(10 / 25 / 40 / 50 / 56 / 100Gb/s)

32 100GbE Ports, 64 25/50GbE Ports

(10 / 25 / 40 / 50 / 100GbE)

Throughput of 6.4Tb/s

MPI, SHMEM/PGAS, UPC

For Commercial and Open Source Applications

Leverages Hardware Accelerations

Page 10: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 10

The Performance Advantage of EDR 100G InfiniBand (28-80%)

28%

Page 11: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 11

Switch-IB 2 SHArP Performance Advantage

MiniFE is a Finite Element mini-application

• Implements kernels that represent

implicit finite-element applications

10X to 25X Performance Improvement

Allreduce MPI Collective

Page 12: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 12

SHArP Performance Advantage with Intel Xeon Phi Knight Landing

Maximizing KNL Performance – 50% Reduction in Run Time

(Customer Results)

Lower is better

Tim

e

Tim

e

OSU - OSU MPI benchmark; IMB – Intel MPI Benchmark

Page 13: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 13

HPC-X with SHArP Technology

HPC-X with SHArP Delivers 2.2X Higher Performance over Intel MPI

OpenFOAM is a popular computational fluid dynamics application

Page 14: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 14

Leading Supplier of End-to-End Interconnect Solutions

StoreAnalyzeEnabling the Use of Data

SoftwareICs Switches/GatewaysAdapter Cards Cables/Modules

Comprehensive End-to-End InfiniBand and Ethernet Portfolio (VPI)

Metro / WANNPU & Multicore

NPSTILE

Page 15: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 15

BlueField System-on-a-Chip (SoC) Solution

Integration of ConnectX5 + Multicore ARM

State of the art capabilities• 10 / 25 / 40 / 50 / 100G Ethernet & InfiniBand

• PCIe Gen3 /Gen4• Hardware acceleration offload

- RDMA, RoCE, NVMeF, RAID

Family of products• Range of ARM core counts and I/O ports/speeds

• Price/Performance points

NVMe Flash Storage Arrays

Scale-Out Storage (NVMe over Fabric)

Accelerating & Virtualizing VNFs

Open vSwitch (OVS), SDN

Overlay networking offloads

Page 16: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 16

InfiniBand Router Solutions

SB7780 Router 1U

Supports up to 6 Different Subnets

Isolation Between Different InfiniBand

Networks (Each Network can be Managed

Separately)

Native InfiniBand Connectivity Between

Different Network Topologies (Fat-Tree,

Torus, Dragonfly, etc.)

Page 17: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 17

Multi-Host Socket Direct – Low Latency Socket Communication

Each CPU with direct network access

QPI avoidance for I/O – improve performance

Enables GPU / peer direct on both sockets

Solution is transparent to software

CPU CPUCPU CPUQPI

Multi-Host Socket Direct Performance

50% Lower CPU Utilization

20% lower Latency

Multi Host Evaluation Kit

Lower Application Latency, Free-up CPU

Page 18: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 18

Technology Roadmap – One-Generation Lead over the Competition

2000 202020102005

20G 40G 56G 100G

“Roadrunner”Mellanox Connected

1st3rd

TOP500 2003Virginia Tech (Apple)

2015

200G

Terascale Petascale Exascale

Mellanox 400G

Page 19: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 19

Maximize Performance via Accelerator and GPU Offloads

GPUDirect RDMA Technology

Page 20: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 20

GPUs are Everywhere!

Page 22: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 22

Mellanox GPUDirect RDMA Performance Advantage

HOOMD-blue is a general-purpose Molecular Dynamics simulation code accelerated on GPUs

GPUDirect RDMA allows direct peer to peer GPU communications over InfiniBand• Unlocks performance between GPU and InfiniBand

• This provides a significant decrease in GPU-GPU communication latency

• Provides complete CPU offload from all GPU communications across the network

102%

2X Application

Performance!

Page 23: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 23

GPUDirect Sync (GPUDirect 4.0)

GPUDirect RDMA (3.0) – direct data path between the GPU and Mellanox interconnect

• Control path still uses the CPU

- CPU prepares and queues communication tasks on GPU

- GPU triggers communication on HCA

- Mellanox HCA directly accesses GPU memory

GPUDirect Sync (GPUDirect 4.0)

• Both data path and control path go directly

between the GPU and the Mellanox interconnect

0

10

20

30

40

50

60

70

80

2 4

Ave

rag

e t

ime

pe

r it

era

tio

n (

us

)

Number of nodes/GPUs

2D stencil benchmark

RDMA only RDMA+PeerSync

27% faster 23% faster

Maximum Performance

For GPU Clusters

Page 24: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 24

NVIDIA DGX-1 Deep Learning Appliance

Deep Learning Supercomputer in a Box

8 x Pascal GPU (P100)

5.3TFlops

16nm FinFET

NVLINK

4 x ConnectX-4 EDR 100G InfiniBand

Adapters

Page 25: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 25

Interconnect Architecture Comparison

Offload versus Onload (Non-Offload)

Page 26: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 26

Offload versus Onload (Non-Offload)

Two interconnect architectures exist – Offload-based and Onload-based

Offload Architecture

• The Interconnect manages and executes all network operations

• The interconnect is capable of including application acceleration engines

• Offloads the CPU and therefore free CPU cycles to be used by the applications

• Development requires large R&D investment

• Higher data center ROI

Onload architecture

• A CPU-centric approach – everything must be executed on and by the CPU

• The CPU is responsible for all network functions, the interconnect only pushes the data into the wire

• Cannot support acceleration engines, no support for RDMA, and network transport is done by the CPU

• Onload the CPU and reduces the CPU cycles available for the applications

• Does not require R&D investments or interconnect expertise

Page 27: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 27

Sandia National Laboratory Paper – Offloading versus Onloading

Page 28: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 28

Application Performance Comparison – LS-DYNA

InfiniBand Delivers 42-63% Higher Performance With Only 12 Nodes

A structural and fluid analysis software, used for automotive, aerospace, manufacturing simulations and more

Higher is better Higher is better

Proprietary Proprietary

Page 29: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 29

Application Performance Comparison – SPEC MPI Benchmark Suite

The SPEC MPI benchmark suite evaluates MPI-parallel, floating point, compute intensive performance,

across a wide range of compute intensive applications using the Message-Passing Interface (MPI)

InfiniBand Delivers Superior Performance and Scaling

The Standard Performance Evaluation

Corporation (SPEC) is a non-profit

corporation formed to establish,

maintain and endorse a standardized

set of relevant benchmarks that can

be applied to the newest generation of

high-performance computers Proprietary

Page 30: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 30

Offload versus Onload – CPU Overhead

Operation

InfiniBand Proprietary Onload

CPU UtilizationCPU Frequency at

Operation TimeCPU Utilization

CPU Frequency at

Operation Time

100Gb/s

Data Throughput

(Send-Receive)

0.8% 59% 59.6% 100%

Intel Performance Counter Monitor Tool - Output

Data Throughput

(Gb/s)

AFREQ (relation to nominal CPU

frequency while in active

state)

CPU

InstructionsActive Cycles

InfiniBand 99.5 0.59 39M 163M

Proprietary 95 1 3725M 12000M

InfiniBand Guarantees Lowest CPU Overhead and Enables OPEX Saving!

Page 31: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

© 2016 Mellanox Technologies 31

Application

Performance

Latency

Mellanox InfiniBand Leadership

34-62%

Higher20%

Lower

Message Rate Cost/Performance(CPAR - $/Performance)

68%

Higher

20-35%

Lower

100Gb/s

Link Speed

200Gb/s

Link Speed

2014

Gain Competitive Advantage Today

Protect Your Future

2017

Smart Network For Smart SystemsRDMA, Acceleration Engines, Programmability

Higher Performance

Unlimited Scalability

Higher Resiliency

Proven!

Page 32: Interconnect Your Future - HPC Advisory Council€¦ · The TOP500 list has evolved, includes HPC & Cloud / Web2.0 Hyperscale systems Mellanox connects 41.2% of overall TOP500 systems

Thank You