Speed Without Compromise: Rethinking Precision in...
Transcript of Speed Without Compromise: Rethinking Precision in...
SAN DIEGO SUPERCOMPUTER CENTER
Speed Without Compromise: Rethinking Precision in MD
Calculations in the Era of Vanishing Double
Precision FloPs
Ross Walker, Associate Professor and NVIDIA CUDA Fellow ���San Diego Supercomputer Center���UC San Diego Department of Chemistry & Biochemistry
1
UCSD
SAN DIEGO SUPERCOMPUTER CENTER
http://www.wmd-lab.org/
Researchers / Postdocs: Age Skjevik, Andreas Goetz, Perri NeedhamGraduate Students: Ben Madej, Longhua Yang, Maria Rosaria-ferrero, Charles Lin, Daniel Mermelstein. Undergraduate Researchers: Robin Betz
GPU Acceleration
Lipid Force Fields
Enzyme Activation
QM/MM MD Automated Refinement
Walker Molecular Dynamics Lab
2
SAN DIEGO SUPERCOMPUTER CENTER
Molecular Dynamics for the 99%• Develop a GPU accelerated
version of AMBER’s PMEMD.
San DiegoSupercomputer CenterRoss C. Walker
NVIDIAScott Le Grand
3
Partly funded under NSF SI2 - SSE Program
Taking MD to 11
SAN DIEGO SUPERCOMPUTER CENTER
Project Info• AMBER Website: http://ambermd.org/gpus/
Publications1. Salomon-Ferrer, R.; Goetz, A.W.; Poole, D.; Le Grand, S.; Walker, R.C.* "Routine microsecond
molecular dynamics simulations with AMBER - Part II: Particle Mesh Ewald" , J. Chem. Theory Comput. 2013, 9 (9), pp 3878-3888. DOI: 10.1021/ct400314y
2. Goetz, A.W., Williamson, M.J., Xu, D., Poole, D.,Grand, S.L., Walker, R.C. "Routine microsecond molecular dynamics simulations with amber - part i: Generalized born", Journal of Chemical Theory and Computation, 2012, 8 (5), pp 1542-1555, DOI:10.1021/ct200909j
3. Pierce, L.C.T., Salomon-Ferrer, R. de Oliveira, C.A.F. McCammon, J.A. Walker, R.C., "Routine access to millisecond timescale events with accelerated molecular dynamics.", Journal of Chemical Theory and Computation, 2012, 8 (9), pp 2997-3002, DOI: 10.1021/ct300284c
4. Salomon-Ferrer, R.; Case, D.A.; Walker, R.C.; "An overview of the Amber biomolecular simulation package", WIREs Comput. Mol. Sci., 2012, in press, DOI: 10.1002/wcms.1121
5. Grand, S.L.; Goetz, A.W.; Walker, R.C.; "SPFP: Speed without compromise - a mixed precision model for GPU accelerated molecular dynamics simulations", Chem. Phys. Comm., 2013, 184, pp374-380, DOI: 10.1016/j.cpc.2012.09.022
4
SAN DIEGO SUPERCOMPUTER CENTER
Design GoalsOverriding Design Goal: Sampling for the 99%
• Focus on ~< 4 million atoms.• Maximize single workstation performance.• Focus on minimizing costs.
• Be able to use very cheap nodes.• Both gaming and tesla cards.• Ease of use (same input, same output)
5
The <0.0001% The 1.0% The 99.0%
SAN DIEGO SUPERCOMPUTER CENTER
Map problem onto GPU hardware
• Subdivide force matrix into 3 classes of independent tiles
Off-diagonal On-diagonal Redundant
• Map non-redundant tiles to warps• SMs consume tiles
Warp 0 Warp 1
Warp 2
Warp n
Warp 0 Warp 1
Warp 2
Warp n
Warp 0 Warp 1
Warp 2
Warp n
Warp 0 Warp 1
Warp 2
Warp n
. . . SM 0 SM 1 SM m SM 2
• Avoid race conditions by dividing the calculation in both space (tiles) and time (warps).
Shared Memory
Reg
iste
rs
Example: Nonbonded forcesatom j
atom
i
Patent: US 8473948 B1
SAN DIEGO SUPERCOMPUTER CENTER
Version History• AMBER 10 – Released Apr 2008
• Implicit Solvent GB GPU support released as patch Sept 2009.• AMBER 11 – Released Apr 2010
• Implicit and Explicit solvent supported internally on single GPU.• Oct 2010 – Bugfix.9 doubled performance on single GPU, added
multi-GPU support.• AMBER 12 – Released Apr 2012
• Added Umbrella Sampling Support, REMD, Simulated Annealing, aMD, IPS and Extra Points.
• Aug 2012 – Bugfix.9 new SPFP precision model, support for Kepler I, GPU accelerate NMR restraints, improved performance.
• Jan 2013 – Bugfix.14 support CUDA 5.0, Jarzynski on GPU, GBSA. Kepler II support.
7
SAN DIEGO SUPERCOMPUTER CENTER
New in AMBER 14 (GPU)Apr 2014
• ~20-30% performance improvement for single GPU runs.• Peer to peer support for multi-GPU runs providing enhanced multi-GPU scaling.• Hybrid bitwise reproducible fixed point precision model as standard (SPFP)• Support for Extra Points in Multi-GPU runs.• Jarzynski Sampling• GBSA support• Support for off-diagonal modifications to VDW parameters.• Multi-dimensional Replica Exchange (Temperature and Hamiltonian)• Support for CUDA 5.0, 5.5 and 6.0• Support for latest generation GPUs.• Monte Carlo barostat support providing NPT performance equivalent to NVT.• ScaledMD support.• Improved accelerated (aMD) MD support.• Explicit solvent constant pH support.• NMR restraint support on multiple GPUs.• Improved error messages and checking.• Hydrogen mass repartitioning support (4fs time steps).
8
SAN DIEGO SUPERCOMPUTER CENTER
A Question of Dynamic Range32-bit floating point has approximately 7 significant figures
When it happens: PBC, SHAKE, and Force Accumulation.
1.456702 +0.3046714 ----------- 1.761373 -1.456702 ----------- 0.3046710
Lost a sig fig
1456702.0000000 + 0.3046714 ----------------- 1456702.0000000 -1456702.0000000 ----------------- 0.0000000 Lost everything.
SAN DIEGO SUPERCOMPUTER CENTER
Precision ModelsSPSP - Use single precision for the entire calculation with
the exception of SHAKE which is always done in double precision.
SPDP - Use a combination of single precision for calculation and double precision for accumulation (default < AMBER 12.9)
DPDP – Use double precision for the entire calculation.
10
SAN DIEGO SUPERCOMPUTER CENTER
Validation and Precision Testing• Measure a combination of elements that depend
on both static energies / forces and ensemble averages.• Energy conservation.• Optimized structures.• Free energy surfaces.• Order parameters.• RMSF.• Radial distribution functions. etc…
• 2 aims• Is our implementation valid/correct?• What level of approximation with precision is acceptable?
SAN DIEGO SUPERCOMPUTER CENTER
But then…
17
GTX680 and K10 Ruined the Party.
DP performance REALLY sucked.
4 month delay in usefulness while weDeveloped and tested a new precisionmodel.
SAN DIEGO SUPERCOMPUTER CENTER
SPFP• Single / Double / Fixed precision hybrid. Designed for
optimum performance on Kepler I. Uses fire and forget atomic ops. Fully deterministic, faster and more precise than SPDP, minimal memory overhead. (default >= AMBER 12.9)
18
Q24.40 for Forces, Q34.30 for Energies / Virials
SAN DIEGO SUPERCOMPUTER CENTER
Worked GreatUntil Maxwell
19
30.2181.26
129.79251.43
262.39280.54
383.32261.82
356.48116.09
196.99263.85266.07
364.67489.68
229.29334.05
423.69
0.00 100.00 200.00 300.00 400.00 500.00 600.002xE5-2660v2 CPU (16 Cores)
1X C20752X C2075
1X GTX 7801X GTX 980
1X GTX Titan Black2X GTX Titan Black
GTX-Titan-Z (1 GPU, 1/2 board)GTX-Titan-Z (2 GPU, full board)
1X K81X K202X K201X K402X K404X K40
1/2x K80 board (1 GPU)1x K80 board (2 GPUs)
2x K80 boards (4 GPUs)
Performance (ns/day)
DHFR (NVE) HMR 4fs 23,558 Atoms
SAN DIEGO SUPERCOMPUTER CENTER
Yet another solution neededSPXP
Use 2 x 32 bits (~48-bit FP)Extended-Precision Floating-Point Numbers for GPU Computation - Andrew Thall, Alma Collegehttp://andrewthall.org/papers/df64_qf128.pdf
High-Performance Quasi Double-Precision Method Using Single-Precision Hardware for Molecular Dynamics on GPUs – Tetsuo Narumi et al. HPC Asia and JAPAN 2009
SAN DIEGO SUPERCOMPUTER CENTER
Narumi SummationRepresented as a float and an int
const int NARUMI_LARGE_SHIFT = 21;const float NARUMI_LARGE = (float)(1 << (NARUMI_LARGE_SHIFT - 1));
struct Accumulator { float hs; int li; Accumulator() : hs(NARUMI_LARGE), li(0) {}};
SAN DIEGO SUPERCOMPUTER CENTER
Additionvoid add_narumi(Accumulator& a, float ys){ float hs, ls, ws;
// Knuth and Dekker addition hs = a.hs + ys; ws = hs - a.hs; ls = ys - ws;
// Inner Narumi correction a.hs = hs; a.li += (int)(ls * NARUMI_LOWER_FACTOR);}
SAN DIEGO SUPERCOMPUTER CENTER
Conversion to doubledouble upcast_narumi(Accumulator& a){ double d = (double)(a.hs - NARUMI_LARGE); d += a.li * NARUMI_LOWER_FACTOR_1_D; return d;}
SAN DIEGO SUPERCOMPUTER CENTER
Something for Everyone
• DPFP 64-bit everything
• SPFP 32-bit forces, U64 force summation, 64-bit state
• SPXP 32-bit forces, Narumi force summation for inner loops, U64 summation, 64-bit state
SAN DIEGO SUPERCOMPUTER CENTER
Side by SideDP: 22.855216396810960
DPFP: 22.855216396810960
SPFP: 22.855216396810xxx
SPXP: 22.8552163xxxxxxxx
SP: 22.855xxxxxxxxxxxx
AMBER GPU MD Workbench
Collabora7on with Al7ntas, Amaro, & Walker
Facilitates publication-quality MD-based research & training Automated multistep minimization, heating, equilibration protocols Automated multi-copy job submission on GPU clusters Addresses reproducibility, comparison of results, rapid parameter testing / exploration
Minimiza7on Actor Equilibra7on Actor
SAN DIEGO SUPERCOMPUTER CENTER
Asymmetric Periodic Boundary Conditions
• Mapping:• Even: Same as PBC corr=corr-corr/length• Odd:
• x’=x-1/2Lx-x/Lx• y’=y-1/2Ly-y/Ly• z’=(Lz-z)-(z-z/Lz)
Reducing computa/onal costs through reducing the need of mul/ple lipid bilayers in ionic gradient simula/ons.
1 (odd)
0 (even)
-‐1 (odd)
SAN DIEGO SUPERCOMPUTER CENTER
APBC Considerations• Ewald Forces (WIP)
• Even
• Odd
• Other considerations• van der Waals (not included in ewald)• Visualization / post imaging?
SAN DIEGO SUPERCOMPUTER CENTER
Investigating ligand permeability across membranes using COM distance restraints and
Lipid14
Glycerol Ligand
SAN DIEGO SUPERCOMPUTER CENTER
Improved performance of COM distance restraints using pmemd.cuda
1.0 1.1
8.6
12.4
0
5
10
15
Intel i7-‐5930K CPU (6 cores)
Intel i7-‐5930K CPU (2 x 6 cores)
1 x K40 2 X K40
Speed-‐up
Lipid bilayer + glycerol
4.92 5.42
42.23
61.15
0
10
20
30
40
50
60
70
Intel i7-‐5930K CPU (6 cores)
Intel i7-‐5930K CPU (2 x 6 cores)
1 x K40 2 X K40
ns/day
Lipid bilayer + glycerol ligands
SAN DIEGO SUPERCOMPUTER CENTER
New Features: Constant pH MD• Solution pH often has a dramatic impact on biomolecular systems.
• Affects the charge distribution within the biomolecule. • Affects the fundamental structure and function of biomolecules. • Some proteins’ native states are stable only in a narrow pH range
• Traditional approach = constant protonation state• Fixed protonation states of titratable residues at the start of each simulation.• Does not sample changes in protonation state.• Only samples conformational space of fixed protonation state(s).• pH effects accounted for qualitatively.
• Constant pH approach• Samples conformational space AND samples protonation states.• Constant chemical potential of hydronium ions.• Periodic changes in discrete protonation states of titratable residues using Monte Carlo.
SAN DIEGO SUPERCOMPUTER CENTER
These projects were impossible to pursue before GPUs allowed rou/ne access to 100ns long pH-‐REMD simula/ons in one day
Nitrophorin 4
Well characterized pH-‐dependent NO delivery into blood vessels
Talin
pH-‐dependent binding to ac/n implicated in cancer
metastasis
Human Folate receptor
pH-‐dependent intracellular delivery: poten/al target
for drug transport
Constant pH REMD in Implicit Solvent
SAN DIEGO SUPERCOMPUTER CENTER
Constant pH MD in Explicit Solvent
!Input¶meters&and&initial&prot.&
states&(n)&
Explicit&solvent&MD&with&prot.&states&
(n)&
Implicit&(GB)&solvent&MD&with&random&prot.&state&
change&(n)&
Accept&prot.&state&change?&
Strip&solvent&
Restore&solvent&
Run&solvent&relaxation&MD&
NO& YES& &
Current implementation only makes use of the GPU for part of the algorithm
SAN DIEGO SUPERCOMPUTER CENTER
Predicted Performance of GPU-CpH-Ex
0
10
20
30
40
50
60
AMD FX-8150 CPU (8 core)
AMD FX-8150 CPU + GeForce GTX TITAN
Z GPU
GeForce GTX TITAN Z GPU
ns/d
ay
3LZT crystal structure of the hen egg-white lysozyme (HEWL)
0
2
4
6
8
10
12
AMD FX-‐8150 CPU (8 core)
AMD FX-‐8150 CPU + GeForce GTX TITAN Z
GPU
GeForce GTX TITAN Z GPU
Speed-‐up
3LZT crystal structure of the hen egg-‐white lysozyme (HEWL)
SAN DIEGO SUPERCOMPUTER CENTER
Summary• GPUs are awesome but continual ‘internal’
performance changes are crippling development.
• Lots of new things in the pipeline – would be more if we didn’t have to keep rewriting the guts.
SAN DIEGO SUPERCOMPUTER CENTER
AcknowledgementsSan Diego Supercomputer Center
University of California San Diego
National Science Foundation NSF SI2-SSE Program
NVIDIA Corporation Hardware + People
People
Perri Needham Romelia Salomon Scott Le Grand
Robin Betz Ben Madej Simon Layton
Duncan Poole Mark Berger Sarah Tariq
Andreas Goetz Age Skjevik 42