Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where...
Transcript of Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where...
![Page 1: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/1.jpg)
Feb. 2, 2017
Molecular Dynamics (MD) on GPUs
![Page 2: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/2.jpg)
2
Accelerating Discoveries
Using a supercomputer powered by the Tesla
Platform with over 3,000 Tesla accelerators,
University of Illinois scientists performed the first
all-atom simulation of the HIV virus and discovered
the chemical structure of its capsid — “the perfect
target for fighting the infection.”
Without gpu, the supercomputer would need to be
5x larger for similar performance.
![Page 3: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/3.jpg)
3
Overview of Life & Material Accelerated Apps
MD: All key codes are GPU-accelerated
Great multi-GPU performance
Focus on dense (up to 16) GPU nodes &/or large # of
GPU nodes
ACEMD*, AMBER (PMEMD)*, BAND, CHARMM, DESMOND, ESPResso,
Folding@Home, GPUgrid.net, GROMACS, HALMD, HOOMD-Blue*,
LAMMPS, Lattice Microbes*, mdcore, MELD, miniMD, NAMD,
OpenMM, PolyFTS, SOP-GPU* & more
QC: All key codes are ported or optimizing
Focus on using GPU-accelerated math libraries,
OpenACC directives
GPU-accelerated and available today:
ABINIT, ACES III, ADF, BigDFT, CP2K, GAMESS, GAMESS-
UK, GPAW, LATTE, LSDalton, LSMS, MOLCAS, MOPAC2012,
NWChem, OCTOPUS*, PEtot, QUICK, Q-Chem, QMCPack,
Quantum Espresso/PWscf, QUICK, TeraChem*
Active GPU acceleration projects:
CASTEP, GAMESS, Gaussian, ONETEP, Quantum
Supercharger Library*, VASP & more
green* = application where >90% of the workload is on GPU
![Page 4: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/4.jpg)
4
MD vs. QC on GPUs
“Classical” Molecular Dynamics Quantum Chemistry (MO, PW, DFT, Semi-Emp)Simulates positions of atoms over time;
chemical-biological or chemical-material behaviors
Calculates electronic properties; ground state, excited states, spectral properties,
making/breaking bonds, physical properties
Forces calculated from simple empirical formulas (bond rearrangement generally forbidden)
Forces derived from electron wave function (bond rearrangement OK, e.g., bond energies)
Up to millions of atoms Up to a few thousand atoms
Solvent included without difficulty Generally in a vacuum but if needed, solvent treated classically (QM/MM) or using implicit methods
Single precision dominated Double precision is important
Uses cuBLAS, cuFFT, CUDA Uses cuBLAS, cuFFT, OpenACC
Geforce (Workstations), Tesla (Servers) Tesla recommended
ECC off ECC on
![Page 5: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/5.jpg)
5
GPU-Accelerated Molecular Dynamics Apps
ACEMD
AMBER
CHARMM
DESMOND
ESPResSO
Folding@Home
GPUGrid.net
GROMACS
HALMD
HOOMD-Blue
LAMMPS
mdcore
Green Lettering Indicates Performance Slides Included
GPU Perf compared against dual multi-core x86 CPU socket.
MELD
NAMD
OpenMM
PolyFTS
![Page 6: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/6.jpg)
6
Benefits of MD GPU-Accelerated Computing
• 3x-8x Faster than CPU only systems in all tests (on average)
• Most major compute intensive aspects of classical MD ported
• Large performance boost with marginal price increase
• Energy usage cut by more than half
• GPUs scale well within a node and/or over multiple nodes
• K80 GPU is our fastest and lowest power high performance GPU yet
Try GPU accelerated MD apps for free – www.nvidia.com/GPUTestDrive
Why wouldn’t you want to turbocharge your research?
![Page 7: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/7.jpg)
ACEMD
![Page 8: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/8.jpg)
www.acellera.com
470 ns/day on 1 GPU for L-Iduronic acid (1362 atoms)
116 ns/day on 1 GPU for DHFR (23K atoms)
M. Harvey, G. Giupponi and G. De Fabritiis, ACEMD: Accelerated molecular dynamics simulations in the microseconds timescale, J. Chem. Theory and Comput. 5, 1632 (2009)
![Page 9: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/9.jpg)
www.acellera.com
NVT, NPT, PME, TCL, PLUMED, CAMSHIFT1
1 M. J. Harvey and G. De Fabritiis, An implementation of the smooth particle-mesh Ewald (PME) method on GPU hardware, J. Chem. Theory Comput., 5, 2371–2377 (2009)2 For a list of selected references see http://www.acellera.com/acemd/publications
![Page 10: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/10.jpg)
June 2017
AMBER 16
![Page 11: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/11.jpg)
11
JAC_NVE on GP100s
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
320.19 320.14
370.32
404.09
0
50
100
150
200
250
300
350
400
450
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
23,558 atoms PME 2fs
![Page 12: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/12.jpg)
12
JAC_NVE on GP100s
614.42 613.16
714.23
782.11
0
100
200
300
400
500
600
700
800
900
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
23,558 atoms PME 4fs
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
![Page 13: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/13.jpg)
13
JAC_NPT on GP100s
295.75 295.42
333.03
360.64
0
50
100
150
200
250
300
350
400
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
23,558 atoms PME 2fs
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
![Page 14: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/14.jpg)
14
JAC_NPT on GP100s
580.47 578.48
654.66
706.53
0
100
200
300
400
500
600
700
800
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
23,558 atoms PME 4fs
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
![Page 15: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/15.jpg)
15
FactorIX_NVE on GP100s
106.23 105.98
142.45
166.61
0
20
40
60
80
100
120
140
160
180
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
90,906 atoms PME
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
![Page 16: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/16.jpg)
16
FactorIX_NPT on GP100s
102.27 102.26
126.75
146.34
0
20
40
60
80
100
120
140
160
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
90,906 atoms PME
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
![Page 17: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/17.jpg)
17
Cellulose_NVE on GP100s
24.01 24.02
31.35
36.91
0
5
10
15
20
25
30
35
40
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
408,609 atoms PME
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
![Page 18: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/18.jpg)
18
Cellulose_NPT on GP100s
22.76 22.8
28.76
32.37
0
5
10
15
20
25
30
35
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
408,609 atoms PME
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
![Page 19: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/19.jpg)
19
STMV_NPT on GP100s
15.64 15.43
20.22
23.13
0
5
10
15
20
25
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
1,067,095 atoms PME
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
![Page 20: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/20.jpg)
20
TRPCAGE on GP100s
1216.561187.3
0
250
500
750
1000
1250
1500
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
ns/
day
304 atoms GB
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
![Page 21: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/21.jpg)
21
Myoglobin on GP100s
470.41458.28
443.49 447.23
0
150
300
450
600
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
2,492 atoms GB
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
![Page 22: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/22.jpg)
22
Nucleosome on GP100s
11.47 11.29
21.2920.51
0
5
10
15
20
25
1 node + 1x GP100per node (PCIe)
1 node + 1x GP100per node (NVLink)
1 node + 2x GP100per node (PCIe)
1 node + 2x GP100per node (NVLink)
ns/
day
25,095 atoms GB
Running AMBER version 16
The green nodes contain Dual Intel(R) Core(TM) i7-4820K @ 3.70GHz CPUs +
Quadro GP100s GPUs (PCIe and NVLink)
![Page 23: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/23.jpg)
February 2017
AMBER 16
![Page 24: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/24.jpg)
24
PME-Cellulose_NPT on K80s
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)2.35
11.36
15.43
0
4
8
12
16
20
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
PME-Cellulose_NPT
4.8X
6.6X
![Page 25: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/25.jpg)
25
PME-Cellulose_NPT on P100s PCIe
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)2.35
21.85
30.00
0
5
10
15
20
25
30
35
40
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +2x P100 PCIe (16GB)
per node
ns/
day
PME-Cellulose_NPT
9.3X
12.8X
![Page 26: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/26.jpg)
26
PME-Cellulose_NPT on P100s SXM2
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)2.35
23.37
32.22
36.65
0
5
10
15
20
25
30
35
40
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
ns/
day
PME-Cellulose_NPT
9.9X
13.7X15.6X
![Page 27: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/27.jpg)
27
PME-Cellulose_NVE on K80s
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)2.47
11.85
16.53
0
4
8
12
16
20
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
PME-Cellulose_NVE
4.8X
6.7X
![Page 28: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/28.jpg)
28
PME-Cellulose_NVE on P100s PCIe
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)2.47
23.34
32.55
0
5
10
15
20
25
30
35
40
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +2x P100 PCIe (16GB)
per node
ns/
day
PME-Cellulose_NVE
9.4X
13.2X
![Page 29: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/29.jpg)
29
PME-Cellulose_NVE on P100s SXM2
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)2.47
24.94
35.16
40.88
0
5
10
15
20
25
30
35
40
45
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
ns/
day
PME-Cellulose_NVE
10.1X
14.2X16.6X
![Page 30: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/30.jpg)
30
PME-FactorIX_NPT on K80s
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)11.43
48.54
66.68
0
10
20
30
40
50
60
70
80
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
PME-FactorIX_NPT
4.2X
5.8X
![Page 31: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/31.jpg)
31
PME-FactorIX_NPT on P100s PCIe
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)11.43
98.77
132.86
0
20
40
60
80
100
120
140
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +2x P100 PCIe (16GB)
per node
ns/
day
PME-FactorIX_NPT
8.6X
11.6X
![Page 32: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/32.jpg)
32
PME-FactorIX_NPT on P100s SXM2
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)11.43
106.25
144.11
159.80
0
20
40
60
80
100
120
140
160
180
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
ns/
day
PME-FactorIX_NPT
9.3X
12.6X14.0X
![Page 33: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/33.jpg)
33
PME-FactorIX_NVE on K80s
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
11.98
51.14
71.49
0
10
20
30
40
50
60
70
80
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
PME-FactorIX_NVE
5.4X
6.0X
![Page 34: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/34.jpg)
34
PME-FactorIX_NVE on P100s PCIe
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)11.98
105.86
145.83
0
20
40
60
80
100
120
140
160
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +2x P100 PCIe (16GB)
per node
ns/
day
PME-FactorIX_NVE
8.8X
12.2X
![Page 35: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/35.jpg)
35
PME-FactorIX_NVE on P100s SXM2
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)11.98
114.88
159.24
178.02
0
20
40
60
80
100
120
140
160
180
200
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
ns/
day
PME-FactorIX_NVE
9.6X
13.3X14.9X
![Page 36: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/36.jpg)
36
PME-JAC_NPT on K80s
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
45.89
162.09
216.78
0
50
100
150
200
250
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
PME-JAC_NPT
3.5X
4.7X
![Page 37: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/37.jpg)
37
PME-JAC_NPT on P100s PCIe
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)
45.89
283.60
327.69
0
50
100
150
200
250
300
350
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +2x P100 PCIe (16GB)
per node
ns/
day
PME-JAC_NPT
6.2X
7.1X
![Page 38: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/38.jpg)
38
PME-JAC_NPT on P100s SXM2
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)45.89
310.52
360.64
423.09
0
50
100
150
200
250
300
350
400
450
1 Broadwell node 1 node +1x P100 PCIe
per node
1 node +2x P100 PCIe
per node
1 node +4x P100 PCIe
per node
ns/
day
PME-JAC_NPT
6.8X7.9X
9.2X
![Page 39: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/39.jpg)
39
PME-JAC_NVE on K80s
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
47.90
173.20
234.99
0
50
100
150
200
250
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
PME-JAC_NVE
3.6X
4.9X
![Page 40: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/40.jpg)
40
PME-JAC_NVE on P100s PCIe
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)
47.90
308.46
363.79
0
50
100
150
200
250
300
350
400
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +2x P100 PCIe (16GB)
per node
ns/
day
PME-JAC_NVE
6.4X
7.6X
![Page 41: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/41.jpg)
41
PME-JAC_NVE on P100s SXM2
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)47.90
339.81
402.18
473.10
0
50
100
150
200
250
300
350
400
450
500
1 Broadwell node 1 node +1x P100 PCIe
per node
1 node +2x P100 PCIe
per node
1 node +4x P100 PCIe
per node
ns/
day
PME-JAC_NVE
7.1X
8.4X9.9X
![Page 42: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/42.jpg)
42
GB-Myoglobin on K80s
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)28.86
288.47
339.45
0
50
100
150
200
250
300
350
400
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
GB-Myoglobin
10.0X
11.8X
![Page 43: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/43.jpg)
43
GB-Myoglobin on P100s PCIe
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)28.86
483.37
561.94
0
100
200
300
400
500
600
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +4x P100 PCIe (16GB)
per node
ns/
day
GB-Myoglobin
16.7X
19.5X
![Page 44: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/44.jpg)
44
GB-Myoglobin on P100s SXM2
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)28.86
534.28
639.37
0
100
200
300
400
500
600
700
1 Broadwell node 1 node +1x P100 PCIe
per node
1 node +4x P100 PCIe
per node
ns/
day
GB-Myoglobin
18.5X
22.2X
![Page 45: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/45.jpg)
45
GB-Nucleosome on K80s
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
0.40
5.84
11.31
20.55
0
5
10
15
20
25
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
ns/
day
GB-Nucleosome
14.6X
28.3X
51.4X
![Page 46: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/46.jpg)
46
GB-Nucleosome on P100s PCIe
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)0.40
11.91
22.77
39.91
45.92
0
5
10
15
20
25
30
35
40
45
50
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
1 node +8x P100 PCIe
(16GB) per node
ns/
day
GB-Nucleosome
29.8X
56.9X
99.8X
114.8X
![Page 47: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/47.jpg)
47
GB-Nucleosome on P100s SXM2
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)0.40
13.36
25.53
46.2948.29
0
10
20
30
40
50
60
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
ns/
day
GB-Nucleosome
33.4X
63.8X
115.7X
120.7X
![Page 48: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/48.jpg)
48
Rubisco-75K on K80s
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
0.01
0.35
0.69
1.34
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
ns/
day
Rubisco-75K
35.0X
69.0X
134.0X
![Page 49: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/49.jpg)
49
Rubisco-75K on P100s PCIe
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)0.01
0.71
1.40
2.69
4.20
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
1 node +8x P100 PCIe
(16GB) per node
ns/
day
Rubisco-75K
71.0X140.0X
269.0X
420.0X
![Page 50: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/50.jpg)
50
Rubisco-75K on P100s SXM2
Running AMBER version 16.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)0.01
0.80
1.57
3.06
4.46
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
ns/
day
Rubisco-75K
80.0X
157.0X
306.0X
446.0X
![Page 51: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/51.jpg)
AMBER 14
![Page 52: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/52.jpg)
52
AMBER 14 vs. AMBER 12
Courtesy of
Scott Le Grand
From GTC 2014
presentation
![Page 53: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/53.jpg)
53
AMBER 14; large P2P and small Boost Clocks impacts
2 x Xeon E5-2690 [email protected] + 4 xTesla K40@745Mhz (no P2P)
2 x Xeon E5-2690 [email protected] + 4 xTesla K40@875Mhz (no P2P)
2 x Xeon E5-2690 [email protected] + 4 xTesla K40@745Mhz (P2P)
2 x Xeon E5-2690 [email protected] + 4 xTesla K40@875Mhz (P2P)
Series1 125.77 132.97 196.68 215.18
125.77132.97
196.68
215.18
0
50
100
150
200
250
ns/d
ay
AMBER 14 (ns/day) on 4x K40; P2P and Boost Clocks ImpactDHFR NVE PME, 2fs Benchmark (CUDA 6.0, ECC off)
Boost
P2P
Boost
No P2P
No Boost
P2PNo Boost
No P2P
![Page 54: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/54.jpg)
5454
AMBER Performance Over Time
Courtesy of
Scott Le Grand
From GTC 2014
presentation
![Page 55: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/55.jpg)
55
Cellulose on K40s, K80s and M6000s
Running AMBER version 14
The blue node contains Dual Intel E5-2698 [email protected], 3.6GHz Turbo CPUs
The green nodes contain Dual Intel E5-2698 [email protected], 3.6GHz Turbo CPUs + either NVIDIA Tesla K40@875Mhz, Tesla K80@562Mhz (autoboost), or Quadro
M6000@987Mhz GPUs
1.93
8.96
7.87
11.76
10.49
13.67
15.3814.90
0
4
8
12
16
20
1 HaswellNode
1 CPU Node+ 1x K40
1 CPU Node+ 0.5x K80
1 CPU Node+ 1x K80
1 CPU Node+ 1x M6000
1 CPU Node+ 2x K40
1 CPU Node+ 2x K80
1 CPU Node+ 2x M6000
Sim
ula
ted T
ime (
ns/
day)
PME-Cellulose_NVE
4.1X
6.1X
5.4X
8.0X7.7X
4.6X
7.1X
![Page 56: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/56.jpg)
56
Factor IX on K40s, K80s and M6000s
Running AMBER version 14
The blue node contains Dual Intel E5-2698 [email protected], 3.6GHz Turbo CPUs
The green nodes contain Dual Intel E5-2698 [email protected], 3.6GHz Turbo CPUs + either NVIDIA Tesla K40@875Mhz, Tesla K80@562Mhz (autoboost), or Quadro
M6000@987Mhz GPUs
9.68
40.48
33.59
50.7047.80
61.18 60.93
66.89
0
10
20
30
40
50
60
70
80
1 HaswellNode
1 CPU Node+ 1x K40
1 CPU Node+ 0.5x K80
1 CPU Node+ 1x K80
1 CPU Node+ 1x M6000
1 CPU Node+ 2x K40
1 CPU Node+ 2x K80
1 CPU Node+ 2x M6000
Sim
ula
ted T
ime (
ns/
day)
PME-FactorIX_NVE
3.5X
5.2X5.0X
6.4X6.3X
7.0X
4.2X
![Page 57: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/57.jpg)
57
JAC on K40s, K80s and M6000s
Running AMBER version 14
The blue node contains Dual Intel E5-2698 [email protected], 3.6GHz Turbo CPUs
The green nodes contain Dual Intel E5-2698 [email protected], 3.6GHz Turbo CPUs + either NVIDIA Tesla K40@875Mhz, Tesla K80@562Mhz (autoboost), or Quadro
M6000@987Mhz GPUs
37.38
134.82
121.30
174.34
161.53
200.34
225.34219.83
0
50
100
150
200
250
1 HaswellNode
1 CPU Node+ 1x K40
1 CPU Node+ 0.5x K80
1 CPU Node+ 1x K80
1 CPU Node+ 1x M6000
1 CPU Node+ 2x K40
1 CPU Node+ 2x K80
1 CPU Node+ 2x M6000
Sim
ula
ted T
ime (
ns/
day)
PME-JAC_NVE
3.2X
4.7X
4.3X
5.4X
6.0X 5.9X
3.6X
![Page 58: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/58.jpg)
58
Cellulose on M40s
Running AMBER version 14
The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)
CPUs + Tesla M40 (autoboost) GPUs
1.07
10.12
14.40
15.90
0
2
4
6
8
10
12
14
16
18
1 Node 1 Node +1x M40 per node
1 Node +2x M40 per node
1 Node +4x M40 per node
Sim
ula
ted T
ime (
ns/
Day)
PME - Cellulose_NPT
9.5X
13.5X14.9X
![Page 59: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/59.jpg)
59
Cellulose on M40s
1.07
10.50
15.41
17.13
0
2
4
6
8
10
12
14
16
18
1 Node 1 Node +1x M40 per node
1 Node +2x M40 per node
1 Node +4x M40 per node
Sim
ula
ted T
ime (
ns/
Day)
PME - Cellulose_NVE
9.8X
14.4X
16.0XRunning AMBER version 14
The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)
CPUs + Tesla M40 (autoboost) GPUs
![Page 60: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/60.jpg)
60
FactorIX on M40s
5.38
46.90
67.37
72.96
0
10
20
30
40
50
60
70
80
1 Node 1 Node +1x M40 per node
1 Node +2x M40 per node
1 Node +4x M40 per node
Sim
ula
ted T
ime (
ns/
Day)
PME - FactorIX_NPT
8.7X
12.5X
13.6XRunning AMBER version 14
The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)
CPUs + Tesla M40 (autoboost) GPUs
![Page 61: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/61.jpg)
61
FactorIX on M40s
5.47
49.33
73.00
80.04
0
10
20
30
40
50
60
70
80
90
1 Node 1 Node +1x M40 per node
1 Node +2x M40 per node
1 Node +4x M40 per node
Sim
ula
ted T
ime (
ns/
Day)
PME - FactorIX_NVE
9.0X
13.3X14.6X
Running AMBER version 14
The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)
CPUs + Tesla M40 (autoboost) GPUs
![Page 62: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/62.jpg)
62
JAC on M40s
20.88
149.40
211.97
226.63
0
50
100
150
200
250
1 Node 1 Node +1x M40 per node
1 Node +2x M40 per node
1 Node +4x M40 per node
Sim
ula
ted T
ime (
ns/
Day)
PME - JAC_NPT
7.2X
10.2X10.9X
Running AMBER version 14
The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)
CPUs + Tesla M40 (autoboost) GPUs
![Page 63: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/63.jpg)
63
JAC on M40s
21.11
157.68
230.18
246.15
0
50
100
150
200
250
300
1 Node 1 Node +1x M40 per node
1 Node +2x M40 per node
1 Node +4x M40 per node
Sim
ula
ted T
ime (
ns/
Day)
PME - JAC_NVE
7.5X
10.9X
11.7X
Running AMBER version 14
The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)
CPUs + Tesla M40 (autoboost) GPUs
![Page 64: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/64.jpg)
64
Myoglobin on M40s
9.83
232.20
300.86
322.09
0
50
100
150
200
250
300
350
1 Node 1 Node +1x M40 per node
1 Node +2x M40 per node
1 Node +4x M40 per node
Sim
ula
ted T
ime (
ns/
Day)
GB - Myoglobin
23.6X
30.6X32.8X
Running AMBER version 14
The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)
CPUs + Tesla M40 (autoboost) GPUs
![Page 65: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/65.jpg)
65
Nucleosome on M40s
0.13
4.67
9.05
16.11
0
2
4
6
8
10
12
14
16
18
1 Node 1 Node +1x M40 per node
1 Node +2x M40 per node
1 Node +4x M40 per node
Sim
ula
ted T
ime (
ns/
Day)
GB - Nucleosome
35.9X
69.6X
123.9X
Running AMBER version 14
The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)
CPUs + Tesla M40 (autoboost) GPUs
![Page 66: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/66.jpg)
66
TrpCage on M40s
408.88
831.91
551.36
464.63
0
100
200
300
400
500
600
700
800
900
1 Node 1 Node +1x M40 per node
1 Node +2x M40 per node
1 Node +4x M40 per node
Sim
ula
ted T
ime (
ns/
Day)
GB - TrpCage2.03X
1.3X
1.1X
Running AMBER version 14
The blue node contain Single Intel Xeon E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Single Intel Xeon E5-2697 [email protected] (IvyBridge)
CPUs + Tesla M40 (autoboost) GPUs
![Page 67: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/67.jpg)
67
Recommended GPU Node Configuration for AMBER Computational Chemistry
Workstation or Single Node Configuration
# of CPU sockets 2
Cores per CPU socket 6+ (1 CPU core drives 1 GPU)
CPU speed (Ghz) 2.66+
System memory per node (GB) 16
GPUs Kepler K20, K40, K80, P100
# of GPUs per CPU socket1-4
GPU memory preference (GB) 6
GPU to CPU connection PCIe 3.0 16x or higher
Server storage 2 TB
Network configuration Infiniband QDR or better
Scale to multiple nodes with same single node configuration67
![Page 68: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/68.jpg)
July 2016
CHARMM DOMDEC-GUI
![Page 69: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/69.jpg)
69
CHARMM DOMDEC-GUI 465 K System Benchmark
Running CHARMM version c40a1
The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who
is responsible for possible benchmarking error.
0.36
2.15
0
1
2
3
4
1 Haswell node 1 node + 1x K80 per node
ns/
day
465 K System (Her1_HER1_membrane)
6.0X
*Higher is better
![Page 70: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/70.jpg)
70
CHARMM DOMDEC-GUI 534 K System Benchmark
Running CHARMM version c40a1
The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs + Tesla K80 (autoboost) GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who
is responsible for possible benchmarking error.
0.18
1.43
0.0
0.5
1.0
1.5
2.0
1 Haswell node 1 node + 1x K80 per node
ns/
day
534 K System (POPC_PSPC_CHL1mixture)
*Higher is better
8.0X
![Page 71: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/71.jpg)
71
CHARMM DOMDEC-GUI 20 K System Benchmark
Running CHARMM version c40a1
The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] GHz (Haswell)
CPUs + Tesla M40 GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who
is responsible for possible benchmarking error.
16.00
59.68
0
20
40
60
80
1 Haswell node 1 node + 1x M40 per node
ns/
day
20 K System (Crambin)
*Higher is better
3.7X
![Page 72: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/72.jpg)
72
CHARMM DOMDEC-GUI 61 K System Benchmark
Running CHARMM version c40a1
The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] GHz (Haswell)
CPUs + Tesla M40 GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who
is responsible for possible benchmarking error.3.90
25.08
0
5
10
15
20
25
30
35
1 Haswell node 1 node + 1x M40 per node
ns/
day
61 K System (GlnBP)
6.4X
*Higher is better
![Page 73: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/73.jpg)
73
CHARMM DOMDEC-GUI 465 K System Benchmark
Running CHARMM version c40a1
The blue node contains Dual Intel Xeon E5-2698 [email protected] GHz (Haswell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] GHz (Haswell)
CPUs + Tesla M40 GPUs
Benchmarks were done based on the STANDARD CHARMM c40a1 version by the Yang group (FSU), who
is responsible for possible benchmarking error.
0.36
2.27
0
1
2
3
4
1 Haswell node 1 node + 1x M40 per node
ns/
day
465 K System (Her1_HER1_membrane)
*Higher is better
6.3X
![Page 74: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/74.jpg)
October 2016
GROMACS 2016
![Page 75: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/75.jpg)
75
Erik Lindahl (GROMACS developer) video
![Page 76: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/76.jpg)
76
Water 1.5M on K80s
Running GROMACS version 2016
The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
2.79
5.22
6.14
0
1
2
3
4
5
6
7
1 Broadwell node 1 node + 2x K80 per node 1 node + 4x K80 per node
ns/
day
Water 1.5M
1.9X
2.2X
![Page 77: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/77.jpg)
77
Water 3M on K80s
Running GROMACS version 2016
The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
1.32
2.66
3.05
0
1
1
2
2
3
3
4
1 Broadwell node 1 node + 2x K80 per node 1 node + 4x K80 per node
ns/
day
Water 3M
2.0X
2.3X
![Page 78: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/78.jpg)
78
Water 1.5M on M40s
Running GROMACS version 2016
The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla M40 (autoboost) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
2.79
6.15
7.60
0
1
2
3
4
5
6
7
8
1 Broadwell node 1 node + 2x M40 per node 1 node + 4x M40 per node
ns/
day
Water 1.5M
2.2X
2.7X
![Page 79: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/79.jpg)
79
Water 3M on M40s
Running GROMACS version 2016
The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla M40 (autoboost) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
1.32
2.97
3.94
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
1 Broadwell node 1 node + 2x M40 per node 1 node + 4x M40 per node
ns/
day
Water 3M
2.3X
3.0X
![Page 80: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/80.jpg)
80
Water 1.5M on P40s
Running GROMACS version 2016
The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P40 GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
2.79
6.60
8.07
0
1
2
3
4
5
6
7
8
9
1 Broadwell node 1 node + 2x P40 per node 1 node + 4x P40 per node
ns/
day
Water 1.5M
2.4X
2.9X
![Page 81: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/81.jpg)
81
Water 3M on P40s
Running GROMACS version 2016
The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P40 GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
1.32
3.36
4.19
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
1 Broadwell node 1 node + 2x P40 per node 1 node + 4x P40 per node
ns/
day
Water 3M
2.5X
3.2X
![Page 82: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/82.jpg)
82
Water 1.5M on P100 PCIes
Running GROMACS version 2016
The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
2.79
6.34
7.11
0
1
2
3
4
5
6
7
8
1 Broadwell node 1 node + 2x P100 PCIe (16GB)per node
1 node + 4x P100 PCIe (16GB)per node
ns/
day
Water 1.5M
2.3X
2.5X
![Page 83: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/83.jpg)
83
Water 3M on P100 PCIes
Running GROMACS version 2016
The blue node contains Dual Intel Xeon E5-2698 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
1.32
3.16
3.43
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
1 Broadwell node 1 node + 2x P100 PCIe (16GB)per node
1 node + 4x P100 PCIe (16GB)per node
ns/
day
Water 3M
2.4X
2.6X
![Page 84: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/84.jpg)
February 2017
GROMACS 5.1.2
![Page 85: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/85.jpg)
85
Water 1.5M on K80s
Running GROMACS version 5.1.2
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
3.04
3.49
5.75
0
1
2
3
4
5
6
7
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
Water 1.5M
1.1X
1.9X
![Page 86: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/86.jpg)
86
Water 1.5M on P100s PCIe
3.04
4.39
6.967.21
0
2
4
6
8
10
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +2x P100 PCIe (16GB)
per node
1 node +4x P100 PCIe (16GB)
per node
ns/
day
Water 1.5M
1.4X
2.3X2.4X
Running GROMACS version 5.1.2
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)
![Page 87: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/87.jpg)
87
Water 1.5M on P100s SXM2
Running GROMACS version 5.1.2
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)
3.04
4.11
6.70
7.18
7.88
0
1
2
3
4
5
6
7
8
9
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x 100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
ns/
day
Water 1.5M
1.4X
2.2X2.4X
2.6X
![Page 88: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/88.jpg)
88
Water 3M on K80s
1.38
1.59
2.98
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
Water 3M
1.2X
2.2X
Running GROMACS version 5.1.2
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
![Page 89: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/89.jpg)
89
Water 3M on P100s PCIe
1.38
1.96
3.43
3.80
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +2x P100 PCIe (16GB)
per node
1 node +4x P100 PCIe (16GB)
per node
ns/
day
Water 3M
1.4X
2.5X
2.8X Running GROMACS version 5.1.2
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)
![Page 90: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/90.jpg)
90
Water 3M on P100s SXM2
Running GROMACS version 5.1.2
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)
1.38
1.84
3.50
3.82
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
ns/
day
Water 3M
1.3X
2.5X2.8X
![Page 91: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/91.jpg)
91
Recommended GPU Node Configuration for GROMACS Computational Chemistry
Workstation or Single Node Configuration
# of CPU sockets 2
Cores per CPU socket 6+
CPU speed (Ghz) 2.66+
System memory per socket (GB) 32
GPUs Kepler K20, K40, K80
# of GPUs per CPU socket
1x
Kepler GPUs: need fast Sandy Bridge or Ivy Bridge, or
high-end AMD Opterons
GPU memory preference (GB) 6
GPU to CPU connection PCIe 3.0 or higher
Server storage 500 GB or higher
Network configuration Gemini, InfiniBand91
![Page 92: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/92.jpg)
February 2017
HOOMD-Blue 1.3.3
![Page 93: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/93.jpg)
93
lj-liquid on K80s
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)326.52
1324.84
1594.37
1942.12
0
500
1000
1500
2000
2500
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
avg
tim
e s
teps/
sec
lj-liquid
4.1X
4.9X
5.9X
![Page 94: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/94.jpg)
94
lj-liquid on P100s PCIe
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)326.52
2912.66
3217.68
0
500
1000
1500
2000
2500
3000
3500
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +8x P100 PCIe (16GB)
per node
avg
tim
est
eps/
sec
lj-liquid
8.9X
9.9X
![Page 95: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/95.jpg)
95
lj-liquid on P100s SXM2
326.52
3129.11
3397.74
0
500
1000
1500
2000
2500
3000
3500
4000
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +8x P100 SXM2
per node
avg
tim
est
eps/
sec
lj-liquid
9.6X
10.4XRunning HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)
![Page 96: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/96.jpg)
96
lj_liquid_512k on K80s
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)43.43
220.10
334.59
526.47
0
100
200
300
400
500
600
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
avg
tim
est
eps/
sec
lj_liquid_512k
5.1X
7.7X
12.1X
![Page 97: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/97.jpg)
97
lj_liquid_512k on P100s PCIe
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)43.43
398.12
534.54
770.18
1045.50
0
200
400
600
800
1000
1200
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
1 node +8x P100 PCIe
(16GB) per node
avg
tim
est
eps/
sec
lj_liquid_512k
9.2X
12.3X
17.7X
24.1X
![Page 98: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/98.jpg)
98
lj_liquid_512k on P100s SXM2
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)43.43
443.74
568.51
793.36
1119.76
0
200
400
600
800
1000
1200
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
avg
tim
est
eps/
sec
lj_liquid_512k
10.2X
13.1X
18.3X
25.8X
![Page 99: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/99.jpg)
99
lj_liquid_1m on K80s
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)22.07
109.54
181.42
303.00
0
50
100
150
200
250
300
350
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
avg
tim
est
eps/
sec
lj_liquid_1m
5.0X
8.2X
13.7X
![Page 100: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/100.jpg)
100
lj_liquid_1m on P100s PCIe
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)22.07
204.67
294.88
465.58
672.46
0
100
200
300
400
500
600
700
800
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
1 node +8x P100 PCIe
(16GB) per node
avg
tim
est
eps/
sec
lj_liquid_1m
9.3X
13.4X
21.1X
30.5X
![Page 101: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/101.jpg)
101
lj_liquid_1m on P100s SXM2
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)22.07
221.02
315.07
488.04
707.73
0
100
200
300
400
500
600
700
800
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
avg
tim
est
eps/
sec
lj_liquid_1m
10.0X
14.3X
22.1X
32.1X
![Page 102: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/102.jpg)
102
Microsphere on K80s
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)17.53
64.87
98.43
166.74
0
20
40
60
80
100
120
140
160
180
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
avg
tim
est
eps/
sec
microsphere
3.7X
5.6X
9.5X
![Page 103: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/103.jpg)
103
Microsphere on P100s PCIe
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)17.53
145.71
179.54
257.58
371.24
0
50
100
150
200
250
300
350
400
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
1 node +8x P100 PCIe
(16GB) per node
avg
tim
est
eps/
sec
microsphere
8.3X
10.2X
14.7X
21.2X
![Page 104: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/104.jpg)
104
Microsphere on P100s SXM2
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)17.53
151.51
186.01
271.21
384.72
0
50
100
150
200
250
300
350
400
450
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
avg
tim
est
eps/
sec
microsphere
8.6X10.6X
15.5X
21.9X
![Page 105: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/105.jpg)
105
Polymer on K80s
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
362.19
975.14
1209.45
1518.99
0
200
400
600
800
1000
1200
1400
1600
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
avg
tim
est
eps/
sec
polymer
2.7X
3.3X
4.2X
![Page 106: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/106.jpg)
106
Polymer on P100s PCIe
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)
362.19
1999.64
2143.15
2480.70
0
500
1000
1500
2000
2500
3000
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +4x P100 PCIe (16GB)
per node
1 node +8x P100 PCIe (16GB)
per node
avg
tim
est
eps/
sec
polymer
5.5X
5.9X
6.8X
![Page 107: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/107.jpg)
107
Polymer on P100s SXM2
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)
362.19
2111.99
2272.27
2651.56
0
500
1000
1500
2000
2500
3000
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
avg
tim
est
eps/
sec
polymer
5.8X
6.3X
7.3X
![Page 108: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/108.jpg)
108
Quasicrystal on K80s
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)78.32
502.53
767.90
1280.44
0
200
400
600
800
1000
1200
1400
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
avg
tim
est
eps/
sec
quasicrystal
6.4X
9.8X
16.3X
![Page 109: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/109.jpg)
109
Quasicrystal on P100s PCIe
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)78.32
851.29
1199.64
1791.41
2261.72
0
500
1000
1500
2000
2500
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
1 node +8x P100 PCIe
(16GB) per node
avg
tim
steps/
sec
quasicrystal
10.9X15.3X
22.9X
28.9X
![Page 110: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/110.jpg)
110
Quasicrystal on P100s SXM2
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)78.32
939.53
1249.90
1940.29
2429.68
0
500
1000
1500
2000
2500
3000
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
avg
tim
steps/
sec
quasicrystal
24.8X
31.0X
12.0X
16.0X
![Page 111: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/111.jpg)
111
Triblock-copolymer on K80s
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
361.42
953.01
1170.47
1492.01
0
200
400
600
800
1000
1200
1400
1600
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
avg
tim
est
eps/
sec
triblock-copolymer
2.6X
3.2X
4.1X
![Page 112: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/112.jpg)
112
Triblock-copolymer on P100s PCIe
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)
361.42
1999.14
2155.27
2456.09
0
500
1000
1500
2000
2500
3000
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +4x P100 PCIe (16GB)
per node
1 node +8x P100 PCIe (16GB)
per node
avg
tim
est
eps/
sec
triblock-copolymer
5.5X6.0X
6.8X
![Page 113: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/113.jpg)
113
Triblock-copolymer on P100s SXM2
Running HOOMD-Blue version 1.3.3
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)
361.42
2132.922253.83
2587.91
0.00
500.00
1000.00
1500.00
2000.00
2500.00
3000.00
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
avg
tim
est
eps/
sec
triblock-copolymer
5.9X
6.2X
7.2X
![Page 114: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/114.jpg)
February 2017
LAMMPS 2016
![Page 115: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/115.jpg)
115
Atomic-Fluid Lennard-Jones 2.5 Cutoff on K80s
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
0.37
0.57
0.00
0.20
0.40
0.60
0.80
1.00
1 Broadwell node 1 node +2x K80 per node
1/se
conds
Atomic-Fluid Lennard-Jones 2.5 Cutoff
1.5X
![Page 116: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/116.jpg)
116
Atomic-Fluid Lennard-Jones 2.5 Cutoff on P100s PCIe
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
0.37
0.62
0.00
0.20
0.40
0.60
0.80
1.00
1 Broadwell node 1 node +2x P100 PCIe (16GB)
per node
1/se
conds
Atomic-Fluid Lennard-Jones 2.5 Cutoff
1.7X
![Page 117: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/117.jpg)
117
Atomic-Fluid Lennard-Jones 2.5 Cutoff on P100s SXM2
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 (autoboost) GPUs
0.37
0.64
0.00
0.25
0.50
0.75
1.00
1 Broadwell node 1 node + 2x P100 SXM2 per node
1/se
conds
Atomic-Fluid Lennard-Jones 2.5 Cutoff
1.7X
![Page 118: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/118.jpg)
118
Atomic-Fluid Lennard-Jones 5.0 Cutoff on K80s
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)0.10
0.14
0.26
0.36
0.00
0.20
0.40
0.60
0.80
1.00
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
1/se
conds
Atomic-Fluid Lennard-Jones 5.0 Cutoff
1.4X2.6X
3.6X
![Page 119: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/119.jpg)
119
Atomic-Fluid Lennard-Jones 5.0 Cutoff on P100s PCIe
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)0.10
0.22
0.350.37 0.38
0.00
0.20
0.40
0.60
0.80
1.00
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
1 node +8x P100 PCIe
(16GB) per node
1/se
conds
Atomic-Fluid Lennard-Jones 5.0 Cutoff
2.2X
3.5X 3.7X 3.8X
![Page 120: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/120.jpg)
120
Atomic-Fluid Lennard-Jones 5.0 Cutoff on P100s SXM2
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)0.10
0.22
0.36
0.41
0.00
0.25
0.50
0.75
1.00
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1/se
conds
Atomic-Fluid Lennard-Jones 5.0 Cutoff
2.2X3.6X
4.1X
![Page 121: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/121.jpg)
121
Course-grain Water on K80s
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
0.00437 0.00444
0.0000
0.0020
0.0040
0.0060
0.0080
0.0100
1 Broadwell node 1 node +4x K80 per node
1/se
conds
Course-grain Water
1.0X
![Page 122: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/122.jpg)
122
Course-grain Water on P100s PCIe
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
0.0044
0.0061
0.0093
0.0000
0.0010
0.0020
0.0030
0.0040
0.0050
0.0060
0.0070
0.0080
0.0090
0.0100
1 Broadwell node 1 node +4x P100 PCIe (16GB)
per node
1 node +8x P100 PCIe (16GB)
per node
1/se
conds
Course-grain Water
1.4X
2.1X
![Page 123: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/123.jpg)
123
Course-grain Water on P100s SXM2
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
0.0044
0.0069
0.0110
0.0000
0.0020
0.0040
0.0060
0.0080
0.0100
0.0120
1 Broadwell node 1 node +4x P100 SXM2
per node
1 node +8x 100 SXM2
per node
1/se
conds
Course-grain Water
1.6X
2.5X
![Page 124: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/124.jpg)
124
EAM on K80s
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)0.01
0.02
0.04
0.07
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
1/se
conds
EAM
2.0X
4.0X
7.0X
![Page 125: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/125.jpg)
125
EAM on P100s PCIe
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)0.01
0.03
0.05
0.08
0.13
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
1 node +8x P100 PCIe
(16GB) per node
1/se
conds
EAM
3.0X
5.0X
8.0X
13.0X
![Page 126: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/126.jpg)
126
EAM on P100s SXM2
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)0.01
0.03
0.05
0.08
0.13
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
1/se
conds
EAM
3.0X
5.0X
8.0X
13.0X
![Page 127: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/127.jpg)
127
Gay-Berne on K80s
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
0.010.02
0.03
0.04
0.00
0.01
0.02
0.03
0.04
0.05
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
1/se
conds
Gay-Berne
2.0X
3.0X
4.0X
![Page 128: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/128.jpg)
128
Gay-Berne on P100s PCIe
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)
0.01
0.02
0.04
0.05
0.00
0.01
0.02
0.03
0.04
0.05
1 Broadwell node 1 node +1x P100 PCIe (16GB)
per node
1 node +2x P100 PCIe (16GB)
per node
1 node +4x P100 PCIe (16GB)
per node
1/se
conds
Gay-Berne
2.0X
4.0X
5.0X
![Page 129: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/129.jpg)
129
Gay-Berne on P100s SXM2
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)
0.01
0.02
0.04
0.05
0.00
0.01
0.02
0.03
0.04
0.05
1 Broadwell node 1 node +1x SXM2per node
1 node +2x SXM2per node
1 node +4x SXM2per node
1/se
conds
Gay-Berne
2.0X
4.0X
5.0X
![Page 130: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/130.jpg)
130
Rhodopsin on K80s
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
➢ 1x K80 is paired with Single Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell)
0.22 0.22
0.31
0.38
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
1 node +4x K80 per node
1/se
conds
Rhodopsin
1.4X
1.7X
![Page 131: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/131.jpg)
131
Rhodopsin on P100s PCIe
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2699 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
➢ 1x P100 PCIe is paired with Single Intel Xeon E5-2699 [email protected]
[3.6GHz Turbo] (Broadwell)
0.22
0.29
0.33
0.48
0.52
0.00
0.10
0.20
0.30
0.40
0.50
0.60
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
1 node +8x P100 PCIe
(16GB) per node
1/se
conds
Rhodopsin
1.3X1.5X
2.2X
2.4X
![Page 132: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/132.jpg)
132
Rhodopsin on P100s SXM2
Running LAMMPS version 2016
The blue node contains Dual Intel Xeon E5-2699 [email protected] [3.6GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2698 [email protected] [3.6GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
➢ 1x P100 SXM2 is paired with Single Intel Xeon E5-2698 [email protected]
[3.6GHz Turbo] (Broadwell)
0.22
0.30
0.38
0.490.50
0.00
0.10
0.20
0.30
0.40
0.50
0.60
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
1 node +8x P100 SXM2
per node
1/se
conds
Rhodopsin
1.4X
1.7X
2.2X 2.3X
![Page 133: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/133.jpg)
133
Recommended GPU Node Configuration for LAMMPS Computational Chemistry
Workstation or Single Node Configuration
# of CPU sockets 2
Cores per CPU socket 6+
CPU speed (Ghz) 2.66+
System memory per socket (GB) 32
GPUsGTX Titan X,
Kepler K20, K40, K80, M40
# of GPUs per CPU socket 1-2
GPU memory preference (GB) 6+
GPU to CPU connection PCIe 3.0 or higher
Server storage 500 GB or higher
Network configuration Gemini, InfiniBand
Scale to thousands of nodes with same single node configuration13
3
![Page 134: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/134.jpg)
July 2017
NAMD 2.12
![Page 135: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/135.jpg)
135
APOA1 on K80s
Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs3.45
14.92
17.73
0
4
8
12
16
20
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
APOA1
![Page 136: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/136.jpg)
136
APOA1 on P100s PCIe
Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs3.45
22.58 22.85
0
4
8
12
16
20
24
28
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
ns/
day
APOA1
![Page 137: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/137.jpg)
137
APOA1 on P100s SXM2
Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
3.45
22.98 23.44 23.87
0
5
10
15
20
25
30
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
ns/
day
APOA1
![Page 138: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/138.jpg)
138
F1ATPASE on K80s
Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
1.15
4.81
6.27
0
2
4
6
8
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
F1ATPASE
![Page 139: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/139.jpg)
139
F1ATPASE on P100s PCIe
Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
1.15
7.346.99
7.40
0
2
4
6
8
10
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
1 node +4x P100 PCIe
(16GB) per node
ns/
day
F1ATPASE
![Page 140: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/140.jpg)
140
F1ATPASE on P100s SXM2
Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
1.15
7.116.85
7.11
0
2
4
6
8
10
1 Broadwell node 1 node +1x P100 SXM2
per node
1 node +2x P100 SXM2
per node
1 node +4x P100 SXM2
per node
ns/
day
F1ATPASE
![Page 141: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/141.jpg)
141
STMV on K80s
Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz
Turbo] (Broadwell) CPUs + Tesla K80 (autoboost) GPUs
0.292
1.274
2.085
0.0
0.5
1.0
1.5
2.0
2.5
3.0
1 Broadwell node 1 node +1x K80 per node
1 node +2x K80 per node
ns/
day
STMV
![Page 142: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/142.jpg)
142
STMV on P100s PCIe
Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz
Turbo] (Broadwell) CPUs + Tesla P100 PCIe (16GB) GPUs
0.29
2.15
2.32
0.0
0.5
1.0
1.5
2.0
2.5
3.0
1 Broadwell node 1 node +1x P100 PCIe
(16GB) per node
1 node +2x P100 PCIe
(16GB) per node
ns/
day
STMV
![Page 143: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/143.jpg)
143
STMV on P100s SXM2
Running NAMD version 2.12
The blue node contains Dual Intel Xeon E5-2690 [email protected] [3.5GHz Turbo]
(Broadwell) CPUs
The green nodes contain Dual Intel Xeon E5-2690 [email protected] [3.5GHz
Turbo] (Broadwell) CPUs + Tesla P100 SXM2 GPUs
0.292
2.077
0.0
0.5
1.0
1.5
2.0
2.5
3.0
1 Broadwell node 1 node +1x P100 SXM2
per node
ns/
day
STMV
![Page 144: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/144.jpg)
NAMD 2.11 – Up to 2X Faster
![Page 145: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/145.jpg)
145
New GPU features in NAMD 2.11
• GPU-accelerated simulations up to twice as fast as NAMD 2.10
• Pressure calculation with fixed atoms on GPU works as on CPU
• Improved scaling for GPU-accelerated particle-mesh Ewald calculation
• CPU-side operations overlap better and are parallelized across cores.
• Improved scaling for GPU-accelerated simulations
• Nonbonded force calculation results are streamed from the GPU for better overlap.
• NVIDIA CUDA GPU-acceleration binaries for Mac OS X
Selected Text from the NAMD website
![Page 146: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/146.jpg)
146
NAMD 2.11 is up to 2x faster
0
5
10
15
20
25
1 Node 2 Nodes 4 Nodes
Sim
ula
ted T
ime (
ns/
day)
APoA1 (92,224 atoms)
1.2X
1.6X2.0X
NAMD 2.10 & NAMD 2.11 contain Dual Intel E5-2697 [email protected] (IvyBridge) CPUs + 2 Tesla K80 (autoboost) GPUs
![Page 147: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/147.jpg)
147
NAMD 2.11 APoA1 on 1 and 2 nodes
Running NAMD version 2.11
The blue nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs + Tesla
K80 (autoboost) GPUs
2.77
11.67
16.99
5.22
19.73
24.31
0
5
10
15
20
25
1 Node 1 Node +1x K80
1 Node +2x K80
2 Nodes 2 Nodes +1x K80
2 Nodes +2x K80
Sim
ula
ted T
ime (
ns/
day)
APoA1(92,224 atoms)
4.2X
6.1X
3.8X
4.7X
![Page 148: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/148.jpg)
148
NAMD 2.11 APoA1 on 4 and 8 nodes
Running NAMD version 2.11
The blue nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs + Tesla
K80 (autoboost) GPUs
10.27
20.64
23.52
16.85
27.83 27.74
0
5
10
15
20
25
30
4 Nodes 4 Nodes +1x K80
4 Nodes +2x K80
8 Nodes 8 Nodes +1x K80
8 Nodes +2x K80
Sim
ula
ted T
ime (
ns/
day)
APoA1 (92,224 atoms)
2.0X
2.3X1.7X 1.6X
![Page 149: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/149.jpg)
149
NAMD 2.11 is up to 1.8x faster
0
2
4
6
8
10
1 Node 2 Nodes 4 Nodes
Sim
ula
ted T
ime (
ns/
day)
F1-ATPase (327,506 atoms)
1.1X
1.8X 1.4X
NAMD 2.10 & NAMD 2.11 contain Dual Intel E5-2697 [email protected] (IvyBridge) CPUs + 2 Tesla K80 (autoboost) GPUs
![Page 150: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/150.jpg)
150
NAMD 2.11 F1-ATPase on 1 and 2 nodes
Running NAMD version 2.11
The blue nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs + Tesla
K80 (autoboost) GPUs
0.94
3.87
6.11
1.86
7.23
10.58
0
5
10
15
1 Node 1 Node +1x K80
1 Node +2x K80
2 Nodes 2 Nodes +1x K80
2 Nodes +2x K80
Sim
ula
ted T
ime (
ns/
day)
F1-ATPase(327,506 atoms)
4.1X
6.5X
3.9X
5.7X
![Page 151: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/151.jpg)
151
NAMD 2.11 F1-ATPase on 4 and 8 nodes
Running NAMD version 2.11
The blue nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs + Tesla
K80 (autoboost) GPUs
3.63
11.66
12.62
6.88
14.22
15.74
0
5
10
15
20
4 Nodes 4 Nodes +1x K80
4 Nodes +2x K80
8 Nodes 8 Nodes +1x K80
8 Nodes +2x K80
Sim
ula
ted T
ime (
ns/
day)
F1-ATPase(327,506 atoms)
3.2X
3.5X2.1X
2.3X
![Page 152: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/152.jpg)
152
NAMD 2.11 is up to 1.5x faster
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
1 Node 2 Nodes 4 Nodes
Sim
ula
ted T
ime (
ns/
day)
STMV (1,066,628 atoms)
1.5X
1.1X
1.5X
NAMD 2.10 & NAMD 2.11 contain Dual Intel E5-2697 [email protected] (IvyBridge) CPUs + 2 Tesla K80 (autoboost) GPUs
![Page 153: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/153.jpg)
153
NAMD 2.11 STMV on 1 and 2 nodes
Running NAMD version 2.11
The blue nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Dual Intel E5-2698 [email protected] CPUs (Haswell) + Tesla
K80 (autoboost) GPUs
0.23
1.03
1.75
0.46
1.98
3.27
0
1
2
3
4
1 Node 1 Node +1x K80
1 Node +2x K80
2 Nodes 2 Nodes +1x K80
2 Nodes +2x K80
Sim
ula
ted T
ime (
ns/
day)
STMV(1,066,628 atoms)
4.5X
7.6X4.3X
7.1X
![Page 154: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/154.jpg)
154
NAMD 2.11 STMV on 4 and 8 nodes
Running NAMD version 2.11
The blue nodes contain Dual Intel E5-2698 [email protected] (Haswell) CPUs
The green nodes contain Dual Intel E5-2698 [email protected] CPUs (Haswell) + Tesla
K80 (autoboost) GPUs
0.90
3.61
4.54
1.74
5.86
6.24
0
2
4
6
8
4 Nodes 4 Nodes +1x K80
4 Nodes +2x K80
8 Nodes 8 Nodes +1x K80
8 Nodes +2x K80
Sim
ula
ted T
ime (
ns/
day)
STMV (1,066,628 atoms)
4.0X
5.0X
3.4X
3.6X
![Page 155: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/155.jpg)
155
Benefits of MD GPU-Accelerated Computing
• 3x-8x Faster than CPU only systems in all tests (on average)
• Most major compute intensive aspects of classical MD ported
• Large performance boost with marginal price increase
• Energy usage cut by more than half
• GPUs scale well within a node and/or over multiple nodes
• K80 GPU is our fastest and lowest power high performance GPU yet
Try GPU accelerated MD apps for free – www.nvidia.com/GPUTestDrive
Why wouldn’t you want to turbocharge your research?
![Page 156: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/156.jpg)
Dec. 19, 2016
Molecular Dynamics (MD) on GPUs
![Page 157: Molecular Dynamics (MD) on GPUs · Supercharger Library*, VASP & more green* = application where >90% of the workload is on GPU. 4 MD vs. QC on GPUs “Classical” Molecular Dynamics](https://reader034.fdocuments.net/reader034/viewer/2022042708/5f39a9a9c285d033d030d0e3/html5/thumbnails/157.jpg)
157
GPU-Accelerated Quantum Chemistry Apps
Abinit
ACES III
ADF
BigDFT
CP2K
GAMESS-US
Gaussian
GPAW
LATTE
LSDalton
MOLCAS
Mopac2012
NWChem
Green Lettering Indicates Performance Slides Included
GPU Perf compared against dual multi-core x86 CPU socket.
Quantum SuperChargerLibrary
RMG
TeraChem
UNM
VASP
WL-LSMS
Octopus
ONETEP
Petot
Q-Chem
QMCPACK
Quantum Espresso