ACCELERATED COMPUTING THE PATH FORWARD · ACCELERATED COMPUTING THE PATH FORWARD. 2 Every Computer...
Transcript of ACCELERATED COMPUTING THE PATH FORWARD · ACCELERATED COMPUTING THE PATH FORWARD. 2 Every Computer...
ACCELERATED COMPUTINGTHE PATH FORWARD
2
Every Computer MakerWorld’s Most Powerful
AI SupercomputersVolta in Production Every Cloud
| 5,120 CUDA cores
| 7.8 FP64 TFLOPS
21B xtors
15.7 FP32
125 Tensor TFLOPS
VOLTA TAKING OFF
NVIDIA ACCELERATED COMPUTING
3
500+ GPU-ACCELERATED APPLICATIONS
All Top 15 HPC Apps Accelerated
GROMACS
ANSYS Fluent
Gaussian
VASP
NAMD
Simula Abaqus
WRF
OpenFOAM
ANSYS
LS-DYNA
BLAST
LAMMPS
AMBER
Quantum Espresso
GAMESS
HOW WE GOT HERE
APPLICATIONS
SYSTEMS
ALGORITHMS
CUDA
ARCHITECTURE
EVERY DEEP LEARNING FRAMEWORK ACCELERATED
INVESTING FROM TOP TO BOTTOM
4
ARCHITECTING MODERN DATACENTERS
APPS
% W
ORKLO
AD
Strong Scaling
Weak Scaling
Deep Learning
Apps Accelerated
Parallel Speed-Up
% Workload
% Sequential “Amdahl’s Law”
APPLICATION WORKLOAD IN MODERN DATACENTERS
5
ARCHITECTING MODERN DATACENTERS
Strong Core CPU for Sequential code
Volta 5,120 CUDA Cores
125 TFLOPS Tensor Core
NVLink for Strong Scaling
ARCHITECTING MODERN DATACENTERS
6
0
10
20
30
40
50
60
70
80
20 40 60 80 1000
# of CPUs
1 Node with 4x V100 GPUs
48 CPU Nodes Comet Supercomputer
ns/
day
AMBER Simulation of CRISPR
ARCHITECTING MODERN DATACENTERSTHE POWER OF ACCELERATED COMPUTING
7Intersect360, Nov 2017 “HPC Application Support for GPU Computing”
GROMACS
ANSYS Fluent
Gaussian
VASP
NAMD
Simula Abaqus
WRF
OpenFOAM
ANSYS
LS-DYNA
NCBI-BLAST
LAMMPS
AMBER
Quantum Espresso
GAMESS
500+ Accelerated ApplicationsTop 15 HPC Applications
70% OF THE WORLD’S SUPERCOMPUTINGWORKLOAD ACCELERATED
8
Mixed Workload: Materials Science (VASP)Life Sciences (AMBER)Physics (MILC)Deep Learning (ResNet-50)
160 Self-hosted Servers
96 KWatts
4X BETTER HPC SYSTEM TCO4X BETTER HPC SYSTEM TCO
9
Mixed Workload: Materials Science (VASP)Life Sciences (AMBER)Physics (MILC)Deep Learning (ResNet-50)
12 Accelerated Servers w/4 V100 GPUs
20 KWatts
1/3 the Cost
1/4 the Space
1/5 the Power
4X BETTER HPC SYSTEM TCO4X BETTER HPC SYSTEM TCO
10
POWERING NEXT-GENERATION SUPERCOMPUTERS
11
3+EFLOPSTensor Ops
AI Exascale Today
ACME
DIRAC FLASH GTC
HACC LSDALTON NAMD
NUCCOR NWCHEM QMCPACK
RAPTOR SPECFEM XGC
AcceleratedScience
10XPerf Over Titan
20 PF
200 PF
Performance Leadership
VOLTA TO FUEL SUMMITNext Milestone In AI Supercomputing
5-10XApplication Perf Over Titan
12
Most Powerful AI Supercomputer in Japan
4,352 Tesla V100 GPUs
37 PetaFLOPS FP64 HPC Performance
0.55 ExaFLOPS AI Performance
ANNOUNCING JAPAN’S AIST ADOPTS NVIDIA VOLTA FOR ABCI SUPERCOMPUTER
13
40 PetaFLOPS Peak FP64 Performance | 660 PetaFLOPS DL FP16 Performance | 660 NVIDIA DGX-1 Server Nodes
ANNOUNCING NVIDIA SATURNV WITH VOLTA
ANNOUNCINGNVIDIA SATURNV WITH VOLTA
14
ERRORS
REGRESSION TESTING (FP16/INT8)
INFERENCE (FP16/INT8)
TRAINING (FP32/FP16)
SIMULATION (FP64/FP32)
NEW DATA
TRAINING SET REGRESSION SET NEW DATA
DEEP LEARNING COMES TO HPCDEEP LEARNING COMES TO HPC
INSIGHTS
15
UIUC & NCSA: ASTROPHYSICS
5,000X LIGO Signal Processing
U. FLORIDA & UNC: DRUG DISCOVERY
300,000X Molecular Energetics Prediction
SLAC: ASTROPHYSICS
Gravitational Lensing: From Weeks to 10ms
AI ACCELERATES SCIENCE
U.S. DoE: CLEAN ENERGY
33% More Accurate Neutrino Detection
PRINCETON & ITER: PARTICLE PHYSICS
50% Higher Accuracy for Fusion Sustainment
U. PITT: DRUG DISCOVERY
35% Higher Accuracy for Protein Scoring
DEEP LEARNING COMES TO HPC
16
AI FOR STEADY FLOW APPROXIMATION
CHALLENGE
CFD is computationally expensive.
CFD is time consuming to complete
Autodesk Research & University of Michigan
SOLUTION
Adoption of a Convolutional Neural Network to replace Velocity field in 2D & 3D
IMPACT
Rapid time to solution
Re-use of existing data.
AI
TRADITIONAL CFD
ERROR Xiaoxiao Guo, Wei Li, Francesco Iorio (2016)
Convolutional Neural Networks for Steady Flow Approximation
17
SIMULATING LIQUID SURFACES WITH GENERATIVE ADVERSARIAL NETWORKS
CHALLENGE
Liquid modelling is expensive.
Real-time simulations require a GPU or multiple CPUs. Or both.
Technical University of Munich
SOLUTION
Generative Adversarial Network simulates the properties of liquids with 10X reduction in compute
IMPACT
Simulation is portable and available to any user – You can use a phone for inference.
GAN can be re-trained for any liquid in any situation – if given enough data
Lukas Prantl, Boris Bonev, and Nils Thuerey. 2010.
Pre-computed LiquidSpaces with Generative Neural Networks and Optical
Flow.
18
PREDICTING DISRUPTIONS IN FUSION REACTOR USING DL
CHALLENGE
Controlling plasma at high power is a multi-dimensional challenge with high consequences.
Time to act is ~30ms
Princeton University
SOLUTION
Fusion recurrent Neural Network (FRNN) is 80-90% true positives. 5% false positives
Trained on relatively small dataset and old GPU
IMPACT
Reduced down-time on largest experiments
Within sight of an Actively controlled model
William Tang, Alexey Svyatkovskiy, Julian Kates-Harbeck,Kyle Felker, Eliot Feibush, Michael Churchill
19
Containerized in NVDocker
Optimized for GPU-accelerated Systems
Up-to-Date Containers
Available NOW
Sign up at nvidia.com/gpu-cloud
ANNOUNCING NVIDIA GPU CLOUD FOR HPC
CLOUD CONTAINER REGISTRY FORACCELERATED HPC APPS
20
HPC APPS COMING TO NVIDIA GPU CLOUD
21
VISUALIZATION IS VITAL TO SCIENCE
22
Large-scale Volumetric Rendering
Physically Accurate Ray Tracing
Production-quality Images
Seamless integration with ParaView
Early Access NOW
Signup now at nvidia.com/gpu-cloud
U CLOUD FOR HPC VISUALIZATION
UNIFIED VISUALIZATIONFOR LARGE DATA SETS
ParaView with NVIDIA OptiX
ParaView with NVIDIA Holodeck
ParaView with NVIDIA IndeX
NVIDIA GPU CLOUD FOR HPC VISUALIZATION
23
NVIDIA GPU CLOUDSIMPLIFYING AI & HPC
DEEP LEARNING HPC APPS HPC VIZ