AMBER14 & GPUs

83
SAN DIEGO SUPERCOMPUTER CENTER AMBER 14 and GPUs: Creating the World's Fastest MD Package on Commodity Hardware Ross Walker, Associate Research Professor and NVIDIA CUDA Fellow San Diego Supercomputer and Department of Chemistry & Biochemistry Adrian Roitberg, Professor, Quantum Theory Project, Department of Chemistry, University of Florida 1

description

Listen to Professor Ross Walker and Adrian Roitberg explaining the new GPU features of AMBER version 14. With these features, AMBER is now world's fastest Molecular Dynamics package.

Transcript of AMBER14 & GPUs

Page 1: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

AMBER 14 and GPUs: Creating the World's Fastest MD Package

on Commodity Hardware

Ross Walker, Associate Research Professor and NVIDIA CUDA Fellow ���San Diego Supercomputer and Department of Chemistry & Biochemistry Adrian Roitberg, Professor, Quantum Theory Project,��� Department of Chemistry, University of Florida

1

Page 2: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER 2

Presenters!

Ross Walker!San Diego Supercomputer Center &

Department of Chemistry and Biochemistry

UC San Diego!

Adrian Roitberg!Quantum Theory ProjectDepartment of Chemistry

University of Florida!

Scott Le Grand!Senior Engineer!

Amazon Web Services!

Page 3: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Webinar Overview!•  Part 1!

•  Overview of Amber & AmberTools!•  New Features in Amber v14!•  Overview of the Amber GPU Project!•  New GPU features in Amber v14!•  Recommended Hardware Configurations!•  Performance!

•  Part 2!•  Hydrogen Mass Repartitioning!•  Constant pH MD!•  Multi-dimensional Replica Exchange MD!

3

Page 4: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

What is AMBER?!An MD simulation

package

Latest version is 14.0 as of April 2014

distributed in two parts:

- AmberTools, preparatory and analysis programs, free under GPL

- Amber the main simulation programs,

under academic licensing

independent from the accompanying forcefields

A set of MD forcefields

fixed charge, biomolecular forcefields: ff94, ff99, ff99SB, ff03, ff11, ff12SB ...

experimental polarizable forcefields e.g.

ff02EP

parameters for general organic molecules, solvents, carbohydrates

(Glycam), etc.

in the public domain

4

Page 5: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Development!

AMBER is developed by a loose community of academic groups, led by Dave Case at Rutgers University, New Jersey, building on the earlier

work of Peter Kollman at UCSF!

Page 6: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Development!

Page 7: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

The AMBER Development Team A Multi-Institutional Research Collaboration

! Principal contributors to the current codes:

David A. Case (Rutgers)Tom Darden (OpenEye)

Thomas E. Cheatham III (Utah)Carlos Simmerling (Stony Brook)

Adrian Roitberg (Florida)Junmei Wang (UT Southwestern Medical Center)

Robert E. Duke (NIEHS and UNC-Chapel Hill) Ray Luo (UC Irvine)

Daniel R. Roe (Utah)Ross C. Walker (SDSC, UCSD)

Scott LeGrand (Amazon)Jason Swails (Rutgers)David Cerutti (Rutgers)

Joe Kaus (UCSD)Robin Betz (UCSD)

Romain M. Wolf (Novartis)Kenneth M. Merz (Michigan State)Gustavo Seabra (Recife, Brazil)

Pawel Janowski (Rutgers) !

Andreas W. Go ̈tz (SDSC, UCSD) Istva ́n Kolossva ́ry (Budapest and D.E. Shaw) Francesco Paesani (UCSD)Jian

Liu (Berkeley) Xiongwu Wu (NIH)!Thomas Steinbrecher (Karlsruhe)

Holger Gohlke (Du ̈sseldorf)!Nadine Homeyer (Du ̈sseldorf)!

Qin Cai (UC Irvine)!Wes Smith (UC Irvine)!

Dave Mathews (Rochester)!Romelia Salomon-Ferrer (SDSC,

UCSD) Celeste Sagui (North Carolina State) Volodymyr Babin (North

Carolina State) Tyler Luchko (Rutgers)!Sergey Gusarov (NINT)!

Andriy Kovalenko (NINT)!Josh Berryman (U. of Luxembourg)

Peter A. Kollman (UC San Francisco)!

7

Page 8: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER 8

Page 9: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Availability!•  AmberTools v14 – Available as a free download via the Amber

Website!•  Amber v14 – Available for download after purchase from the Amber

Website!http://ambermd.org/!

9

Page 10: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Distribution!

10

v14 Released April 15, 2014!AmberTools v14 – Free, OpenSource (BSD, LGPL etc.)!Amber v14 – Academic ($500), Industry ($20,000 / $15,000), HPC Center ($2000)!

AmberTools v14!

Amber v14 Adds PMEMD MD Engine Support!

Page 11: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

AmberTools v14: New Features!•  The sander module, our workhorse simulation program, is now a

part of AmberTools!•  Greatly expanded and improved cpptraj program for analyzing

trajectories!•  New documentation and tools for inspecting and modifying Amber

parameter files!•  Improved workflow for setting up and analyzing simulations;!•  New capability for semi-empirical Born-Oppenheimer molecular

dynamics!•  EMIL: a new absolute free energy method using TI!•  New Free Energy Workflow (FEW) tool automates free energy

calculations (LIE, TI, and MM/PBSA-type calculations)!•  QM/MM Improvements – External interface to many QM packages

including TeraChem (for GPU accelerate QM/MM).!•  Completely reorganized Reference Manual!

11

Page 12: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Amber v14: New Features!•  Major updates to PMEMD GPU support.!•  Expanded methods for free energy calculations.!

•  Soft core TI supported in PMEMD.!•  Support for Intel MIC (experimental).!•  Support for Intel MPI.!•  ScaledMD.!•  Improved aMD support.!•  Self Guided Langevin Support.!

12

Page 13: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

GPU Accelerated MD!

13

Page 14: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

The Project!Develop a GPU accelerated

version of AMBER’s PMEMD !

!San DiegoSupercomputer Center !Ross C. Walker!

!

!University of Florida ! !Adrian Roitberg!!

!NVIDIA (now AWS) ! !Scott Le Grand!

Funded in part by the NSF SI2-SSE Program.!

14

Page 15: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Project Info!•  AMBER Website: http://ambermd.org/gpus/!

Publications!1.  Salomon-Ferrer, R.; Goetz, A.W.; Poole, D.; Le Grand, S.;  Walker, R.C.* "Routine microsecond

molecular dynamics simulations with AMBER - Part II: Particle Mesh Ewald" , J. Chem. Theory Comput., in press, 2013, DOI: 10.1021/ct400314y!

2.  Goetz, A.W., Williamson, M.J., Xu, D., Poole, D.,Grand, S.L., Walker, R.C. "Routine microsecond molecular dynamics simulations with amber - part i: Generalized born", Journal of Chemical Theory and Computation, 2012, 8 (5), pp 1542-1555, DOI:10.1021/ct200909j!

3.  Pierce, L.C.T., Salomon-Ferrer, R. de Oliveira, C.A.F. McCammon, J.A. Walker, R.C., "Routine access to millisecond timescale events with accelerated molecular dynamics.", Journal of Chemical Theory and Computation, 2012, 8 (9), pp 2997-3002, DOI: 10.1021/ct300284c!

4.  Salomon-Ferrer, R.; Case, D.A.; Walker, R.C.; "An overview of the Amber biomolecular simulation package", WIREs Comput. Mol. Sci., 2012, in press, DOI: 10.1002/wcms.1121!

5.  Grand, S.L.; Goetz, A.W.; Walker, R.C.; "SPFP: Speed without compromise - a mixed precision model for GPU accelerated molecular dynamics simulations", Chem. Phys. Comm., 2013, 184, pp374-380, DOI: 10.1016/j.cpc.2012.09.022!

15

Page 16: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Motivations!

16

Page 17: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Maximizing Impact!Supercomputers!•  Require expensive low

latency interconnects.!•  Large node counts needed.!•  Vast power consumption!!•  Politics controls access.!

Desktops and Small Clusters!•  Cheap, ubiquitous.!•  Easily maintained.!•  Simple to use.!!

17

Page 18: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Better Science?!•  Bringing the tools the researcher needs into his

own lab.!•  Can we make a researcher’s desktop look like a small cluster (remove

the queue wait)?!•  Can we make MD truly interactive (real time feedback /

experimentation?)!•  Can we find a way for a researcher to cheaply increase the power of

all his graduate students workstations?!•  Without having to worry about available power (power cost?).!•  Without having to worry about applying for cluster

time.!•  Without having to have a full time ‘student’

to maintain the group’s clusters?!

•  GPU’s offer a possible costeffective solution.!

18

Page 19: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Design Goals!Overriding Design Goal: Sampling for the 99%!!

•  Focus on < 2 million atoms.!•  Maximize single workstation performance.!•  Focus on minimizing costs.!

•  Be able to use very cheap nodes.!•  Both gaming and tesla cards.!•  Transparent to user (same input, same output)!

!!

19

The <0.0001% The 1.0% The 99.0%

Page 20: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Version History!•  AMBER 10 – Released Apr 2008!

•  Implicit Solvent GB GPU support released as patch Sept 2009.!•  AMBER 11 – Released Apr 2010!

•  Implicit and Explicit solvent supported internally on single GPU.!•  Oct 2010 – Bugfix.9 doubled performance on single GPU, added

multi-GPU support.!•  AMBER 12 – Released Apr 2012!

•  Added Umbrella Sampling Support, 1D-REMD, Simulated Annealing, aMD, IPS and Extra Points.!

•  Aug 2012 – Bugfix.9 new SPFP precision model, support for Kepler I, GPU accelerate NMR restraints, improved performance.!

•  Jan 2013 – Bugfix.14 support CUDA 5.0, Jarzynski on GPU, GBSA. Kepler II support.!

20

Page 21: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

New in AMBER 14 (GPU)!•  ~20-30% performance improvement for single GPU runs.!•  Peer to peer support for multi-GPU runs providing enhanced multi-GPU scaling.!•  Hybrid bitwise reproducible fixed point precision model as standard (SPFP)!•  Support for Extra Points in Multi-GPU runs.!•  Jarzynski Sampling!•  GBSA support!•  Support for off-diagonal modifications to VDW parameters.!•  Multi-dimensional Replica Exchange (Temperature and Hamiltonian)!•  Support for CUDA 5.0, 5.5 and 6.0!•  Support for latest generation GPUs.!•  Monte Carlo barostat support providing NPT performance equivalent to NVT.!•  ScaledMD support.!•  Improved accelerated (aMD) MD support.!•  Explicit solvent constant pH support.!•  NMR restraint support on multiple GPUs.!•  Improved error messages and checking.!•  Hydrogen mass repartitioning support (4fs time steps).!

21

Page 22: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Reproducibility Critical for Debugging Software and

Hardware!

22

•  SPFP precision model is bitwise reproducible. •  Same simulation from same random seed = same

result.

•  Is used to validate hardware (misbehaving GPUs) (Exxact AMBER Certified Machines)

•  Successfully identified 2 GPU models with underlying

hardware issues based on this.

Page 23: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Reproducibility!

Good GPU! Bad GPU!0.0:  Etot   =    -58229.3134!0.1:  Etot   =    -58229.3134!0.2:  Etot   =    -58229.3134!0.3:  Etot   =    -58229.3134!0.4:  Etot   =    -58229.3134!0.5:  Etot   =    -58229.3134!0.6:  Etot   =    -58229.3134!0.7:  Etot   =    -58229.3134!0.8:  Etot   =    -58229.3134!0.9:  Etot   =    -58229.3134!0.10:  Etot   =    -58229.3134!

0.0:  Etot   =    -58229.3134!0.1:  Etot   =    -58227.1072!0.2:  Etot   =    -58229.3134!0.3:  Etot   =    -58218.9033!0.4:  Etot   =    -58217.2088!0.5:  Etot   =    -58229.3134!0.6:  Etot   =    -58228.3001!0.7:  Etot   =    -58229.3134!0.8:  Etot   =    -58229.3134!0.9:  Etot   =    -58231.6743!0.10:  Etot   =    -58229.3134!!

23

Final Energy after 10^6 MD steps (~45 mins per run)

Page 24: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Supported GPUs!•  Hardware Version 3.0 / 3.5 (Kepler I / Kepler II) !!

!Tesla K20/K20X/K40!!Tesla K10!!GTX-Titan / GTX-Titan-Black!!GTX770 / 780 / 780Ti!!GTX670 / 680 / 690!!Quadro cards supporting SM3.0* or 3.5* !!

•  Hardware Version 2.0 (Fermi) !!!Tesla M2090!!Tesla C2050/C2070/C2075 (and M variants)!!GTX560 / 570 / 580 / 590!!GTX465 / 470 / 480!!Quadro cards supporting SM2.0* !!

24

Page 25: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Supported System Sizes!

25

Explicit Solvent PME!

Implicit Solvent GB!

Page 26: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Running on GPUs!•  Details provided on: http://ambermd.org/gpus/!

•  Compile (assuming nvcc >= 5.0 installed)!•  cd $AMBERHOME!•  ./configure –cuda gnu!•  make install!•  make test!

•  Running on GPU!•  Just replace executable pmemd with pmemd.cuda!

•  $AMBERHOME/bin/pmemd.cuda –O –i mdin …!

If GPUs are set to process exclusive mode pmemd just ‘Does the right thing’™!

26

Page 27: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Running on Multiple GPUs!•  Compile (assuming nvcc >= 5.0 installed)!

•  cd $AMBERHOME!•  ./configure –cuda –mpi gnu!•  make install!!

•  Running on Multiple-GPUs!•  export CUDA_VISIBLE_DEVICES=0,1!•  mpirun –np 2 $AMBERHOME/bin/pmemd.cuda.MPI –O –i mdin …!

If the selected GPUs can 'talk' to each other via peer to peer then the code will automatically use it. (look for Peer to Peer : ENABLED in mdout).!

27

Page 28: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Recommended Hardware Configurations!

•  DIY Systems!

•  Exxact AMBER Certified Systems!•  Come with AMBER preinstalled.!•  Tesla or GeForce – 3 year warranty.!•  Free test drives available.!

•  See the following page for updates:!!

!http://ambermd.org/gpus/recommended_hardware.htm#hardware!

28

Page 29: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

DIY 2 GPU System!Antec P280 Black ATX Mid Tower Case!

http://tinyurl.com/lh4d96s$109.99!!

Corsair RM Series 1000 Watt ATX/EPS 80PLUS Gold-Certified Power

Supplyhttp://tinyurl.com/l3lchqu

$184.99!!Intel i7-4820K LGA 2011 64 Technology

CPU BX80633I74820K!http://tinyurl.com/l3tql8m!

$324.99!Corsair Vengeance 16GB (4x4GB) DDR3 1866 MHZ (CMZ16GX3M4X1866C9)!http://tinyurl.com/losocqr!

$182.99!

!Asus DDR3 2400 Intel LGA 2011

Motherboard P9X79-E WShttp://tinyurl.com/kbqknvv

$460.00!

Seagate Barracuda 7200 3 TB 7200RPM Internal Bare Drive

ST3000DM001!http://tinyurl.com/m9nyqtk

$107.75!

2 of EVGA GeForce GTX780 SuperClocked 3GB GDDR5

384bit!http://tinyurl.com/m5e6c43!

$509.99 each!Or K20C or K40C!

Total Price : $2411.84 29

Intel Thermal Solution Air!http://tinyurl.com/mjwdp5e!

$21.15!

Page 30: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

AMBER Certified Desktops!

30

Joint program with Exxact Inc.!http://exxactcorp.com/index.php/solution/solu_list/65!

•  Custom Solutions Available.!

•  Free Test Drives.!•  GSA Compliant.!•  Sole source

compliant.!•  Peer to Peer ready.!•  Turn key AMBER

GPU solutions.!Contact Ross Walker ([email protected]) or Mike Chen ([email protected]) for more info.!

Page 31: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

AMBER Exxact Certified Clusters!

31

Contact Ross Walker ([email protected]) or Mike Chen ([email protected]) for more info.!

Page 32: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Performance!

AMBER 14!

32

Realistic benchmark suite available from:http://ambermd.org/gpus/benchmarks.htm!

Page 33: AMBER14 & GPUs

33

30.21!

81.26!

129.79!

184.67!

270.99!

251.43!

361.33!

280.01!

389.63!

280.54!

383.32!

196.99!

263.85!

266.07!

364.67!

0.00! 50.00! 100.00! 150.00! 200.00! 250.00! 300.00! 350.00! 400.00! 450.00!

2xE5-2660v2 CPU (16 Cores)!

1X C2075!

2X C2075!

1X GTX 680!

2X GTX 680!

1X GTX 780!

2X GTX 780!

1X GTX 780Ti!

2X GTX 780Ti!

1X GTX Titan Black!

2X GTX Titan Black!

1X K20!

2X K20!

1X K40!

2X K40!

Performance (ns/day)!

DHFR (NVE) HMR 4fs 23,558 Atoms!

Page 34: AMBER14 & GPUs

34

29.01!

79.49!

126.26!

179.45!

261.23!

239.14!

334.29!

271.68!

366.51!

270.39!

359.91!

190.01!

248.40!

245.00!

335.25!

0.00! 50.00! 100.00! 150.00! 200.00! 250.00! 300.00! 350.00! 400.00!

2xE5-2660v2 CPU (16 Cores)!

1X C2075!

2X C2075!

1X GTX 680!

2X GTX 680!

1X GTX 780!

2X GTX 780!

1X GTX 780Ti!

2X GTX 780Ti!

1X GTX Titan Black!

2X GTX Titan Black!

1X K20!

2X K20!

1X K40!

2X K40!

Performance (ns/day)!

DHFR (NPT) HMR 4fs 23,558 Atoms!

Page 35: AMBER14 & GPUs

35

15.81!

41.66!

66.84!

95.45!

142.37!

128.37!

181.96!

146.46!

204.13!

146.12!

200.63!

102.67!

137.67!

138.33!

189.41!

0.00! 50.00! 100.00! 150.00! 200.00! 250.00!

2xE5-2660v2 CPU (16 Cores)!

1X C2075!

2X C2075!

1X GTX 680!

2X GTX 680!

1X GTX 780!

2X GTX 780!

1X GTX 780Ti!

2X GTX 780Ti!

1X GTX Titan Black!

2X GTX Titan Black!

1X K20!

2X K20!

1X K40!

2X K40!

Performance (ns/day)!

DHFR (NVE) 2fs 23,558 Atoms!

Page 36: AMBER14 & GPUs

36

0.85!

2.39!

3.46!

5.72!

8.19!

7.40!

10.47!

9.59!

13.64!

9.60!

13.36!

6.73!

8.80!

9.46!

13.13!

0.00! 2.00! 4.00! 6.00! 8.00! 10.00! 12.00! 14.00! 16.00!

2xE5-2660v2 CPU (16 Cores)!

1X C2075!

2X C2075!

1X GTX 680!

2X GTX 680!

1X GTX 780!

2X GTX 780!

1X GTX 780Ti!

2X GTX 780Ti!

1X GTX Titan Black!

2X GTX Titan Black!

1X K20!

2X K20!

1X K40!

2X K40!

Performance (ns/day)!

Cellulose (NVE) 2fs 408,609 Atoms!

Page 37: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Cumulative Performance (The real differentiator)!

37

> 1.1 microseconds per day on a $6000 desktop!

CPUs are irrelevant to

AMBER GPU performance.

Buy $100 CPUs

<$6000

Workstation gets you > 1100ns/day

aggregate performance for the DHFR NVE HMR benchmark

30.21  

280.54  

2x  280.54  

3x  280.54  

4x  280.54  

0   200   400   600   800   1000   1200  

2xE5-­‐2660v2  CPU  (16  Cores)  

1X  GTX  Titan  Black  

2X  Individual  GTX  Titan  Black  

3X  Individual  GTX  Titan  Black  

4X  Individual  GTX  Titan  Black  

Cumula&ve  Performance  (ns/day)  

Amber  Cumula&ve  Performance  DHFR  (NVE)  HMR  4fs  23,558  Atoms  

Page 38: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Historical Single Node / Single GPU Performance

(DHFR Production NVE 2fs)!

38

0  

20  

40  

60  

80  

100  

120  

2007   2008   2009   2010   2011   2012   2013   2014  

ns/day  

PMEMD  Peformance  

GPU  

CPU  

Linear  (GPU)  

Linear  (CPU)  

Page 39: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Cost Comparison!

Traditional Cluster! GPU Workstation!Nodes Required! 12!

Interconnect! QDR IB!Time to complete

simulations!4.98 days!

Power Consumption! 5.7 kW (681.3 kWh)!System Cost (per day)! $96,800 ($88.40)!

Simulation Cost! (681.3 * 0.18) + (88.40 * 4.98)!

$562.87!

39

4 simultaneous simulations, 23,000 atoms, 250ns each, 5 days maximum time to solution.

Page 40: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Cost Comparison!

Traditional Cluster! GPU Workstation!Nodes Required! 12! 1 (4 GPUs)!

Interconnect! QDR IB! None!Time to complete

simulations!4.98 days! 2.25 days!

Power Consumption! 5.7 kW (681.3 kWh)! 1.0 kW (54.0 kWh)!System Cost (per day)! $96,800 ($88.40)! $5200 ($4.75)!

Simulation Cost! (681.3 * 0.18) + (88.40 * 4.98)!

(54.0 * 0.18) + (4.75 * 2.25)!

$562.87! $20.41!

40

4 simultaneous simulations, 25,000 atoms, 250ns each, 5 days maximum time to solution.

>25x cheaper AND solution obtained in less than half the time

Page 41: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Q&A!

41

Page 42: AMBER14 & GPUs

Replica Exchange Molecular Dynamics in the Age of Heterogeneous Architectures.

The stuff dreams are made of…

Adrian E. Roitberg Quantum Theory Project Department of Chemistry University of Florida

Page 43: AMBER14 & GPUs

4 Conformational space has many local minima 4 Barriers hinder rapid escape from high-energy minima

4 How can we search and score all conformations? Coordinate

Ene

rgy

Most of chemistry and certainly all of biology happens at temperatures substantially above 0K

Page 44: AMBER14 & GPUs

Why is convergence so slow?

•  We want the average of observable properties- so we need to include in that weighted average all of the important values that it might have

•  Structural averages can be slow to converge- you need to wait for the structure to change, and do it many, many times.

•  Sampling an event only once doesn’t tell you the correct probability (or weight) for that event What is the proper weight? Boltzmann’s equation

Pi =e−Ei / kbT

e−Ei / kbT

i∑

Page 45: AMBER14 & GPUs

We need to sample phase for a biomolecule. Our tool of choice is molecular dynamics. Loop over time steps, and at each time compute forces and energies. Within regular MD methodology there is only one way to sample more conformational space à Run longer The usual (old) way to port code to new architectures was to focus on bottlenecks. In MD, this meant non-bonded interactions (vdw + electrostatics) that scaled O(N2) (potentially O(N lnN) and took ~90% of the time.

Page 46: AMBER14 & GPUs

With GPUs we can run MUCH faster, for lower cost So what? It is still a microsecond every 2 days, not enough to sample conformational space, except locally. So we have code that run FAST in one/two gpus, and we have the emergence of large machines (keeneland, blue waters, stampede, etc) with many gpus. Are there methods we can use to leverage these two components? Simple way, run MANY independent simulations, each fast. The probability of something “interesting” happening increases linearly with the number of runs. Good, but not good enough.

Page 47: AMBER14 & GPUs

Can we run independent simulations, and have them ‘talk’ to each other and increase overall sampling ? Yes ! Can we do it according to the laws of statistical mechanics? Yes !

Page 48: AMBER14 & GPUs

Temperature Replica Exchange (Parallel tempering)

375K

350K

325K

300K

REMD Hansmann, U., CPL 1997 Sugita & Okamoto, CPL 1999

? ? ?

You do not EXCHANGE, you ask PERMISSION to exchange and then decide.

Page 49: AMBER14 & GPUs

[ ] [ ]min(1,exp{( )( ( ) ( )})i jm n E q E qρ β β= − −

vnew = voldTnewTold

P X( )w X → X '"

#$%&'

= P X '( )w X '→ X( )

P X( ) = e−βEX

Impose desired weighting (limiting distribution) on exchange calculation

375K

350K

325K

300K

4 Impose reversibility/detailed balance

Rewrite using only potential energies, by rescaling kinetic energy for both replicas.

Exchange?

Page 50: AMBER14 & GPUs

Replica Exchange Methods = Treasure hunt + bartering

Only works when the replicas explore phase space ‘well’ and exchange configurations. The only gain comes from exchanging configurations.

Page 51: AMBER14 & GPUs

Large gains in sampling efficiency using REMD Perfect for heterogeneous architectures: Need FAST execution within a replica, and the exchanges involve very little data (e.g. communication overhead is effectively zero). Scaling is very close to perfect. Important questions: How many replicas? Where do I put the replicas? How often do I try to exchange ? How do I measure convergence and speedup?

“Exchange frequency in replica exchange molecular dynamics”.

Daniel Sindhikara, Yilin Meng and Adrian E. Roitberg.

Journal of Chemical Physics. 128, 024103 (2008).

“Optimization of Umbrella Sampling Replica Exchange Molecular Dynamics by Replica Positioning” Sabri-Dashti, D.; Roitberg, A. E. Journal of Chemical Theory and Computation. 9, 4692. (2013)

Page 52: AMBER14 & GPUs

REMD does not have to be restricted to T-space !

H-REMD is a powerful technique that uses different Hamiltonians for each replica.

H1

H4

H2

H3

Page 53: AMBER14 & GPUs

Within H-REMD, there is another ‘split’ A.  Replicas in ‘space’ == umbrella sampling, each replica has a different

umbrella, replicas exchange. You can also use different restraints, different restraint weights for NMR refinement, tighter restraints… Be creative

B.  Replicas is true Hamiltonian space. Each replica is a different molecule, or small changes in protonation, or force fields.

Imagine replicas where the rotational barriers (bottlenecks for sampling in polymers) are reduced from the regular value. Thos

Page 54: AMBER14 & GPUs

pKa (protein) = pKa (model) +1

2.303kBTΔG proteinAH → proteinA−( ) − ΔG(AH → A− )$% &'

Application to a ‘hard’ pKa prediction. ASP 26 in Thoredoxin. pKa = 7.5 (a regular ASP has a pKa of 4.0)

Page 55: AMBER14 & GPUs

Meng, Yilin; Dashti, Danial Sabri; Roitberg, Adrian E. Journal of Chemical Theory and Computation. 7, 2721-2727. (2011)

Page 56: AMBER14 & GPUs

REMD methods lend themselves to lego-style mix and match No restriction to use only one replica ladder One could do Multidimensional-REMD. Multiscale REMD T-umbrella Umbrella-umbrella-umbrella Radius of gyration-T T-umbrella (QM/MM) (Get enthalpy/entropy !!) Accelerated MD

Page 57: AMBER14 & GPUs

Computational Studies of RNA Dynamics “Multi-dimensional Replica Exchange Molecular Dynamics Yields a Converged Ensemble of an RNA Tetranucleotide” Bergonzo, C.; Henriksen, N.; Roe, D.; Swails, J.; Roitberg, A. E.; Cheatham III, T. Journal of Chemical Theory and Computation. 10:492–499 (2014)

Page 58: AMBER14 & GPUs

Experimental rGACC ensemble

A-form Reference

NMR Major NMR Minor

70-80% 20-30%

Yildirim et al. J.Phys. Chem. B, 2011, 115, 9261.

•  NMR spectroscopy for single stranded rGACC

•  Major and minor A-form-like structures

Page 59: AMBER14 & GPUs

Complete Sampling is Difficult •  Conformational

variability is high, even though the structure is small

•  The populations of different structures (shown blue, green, red, and yellow) are still changing, even after > 5 µs of simulation

Henriksen et al., 2013, J. Phys. Chem. B., 117, 4914.

Niel Henriksen

Page 60: AMBER14 & GPUs

Enhanced Sampling using Multi-Dimensional Replica Exchange MD (M-REMD)

•  Parallel simulations (“replicas”) run at different Temperatures (277K – 400K), Hamiltonians (Full dihedrals – 30% dihedrals)

•  Exchanges are attempted at specific intervals

•  192 replicas, each run on its own GPU

Sugita, Y.; Kitao, A.; Okamoto, Y.; 2000, J. Chem. Phys., 113, 6042. Henriksen et al., J. Phys Chem B, 2013.117, 4014-27.; Bergonzo et al.; 2014, JCTC, 10, 492.

Page 61: AMBER14 & GPUs

Criteria for Convergence •  Independent runs with different

starting structures sample the same conformations in the same probabilities

•  Kullback-Leibler divergence of the final probability distributions (of any simulation observable) between 2 runs reaches zero

61 Bergonzo et al.; 2014, JCTC, 10, 492.

Cyan indicates a 95% confidence level, where slope and y-intercept ranges are denoted by α and β.

Page 62: AMBER14 & GPUs

NMR Major 18.1%, 31.7%

NMR Minor 24.6%, 23.4%

Intercalated anti 15.9%, 10.7%

Top 3 Clusters ff12%, DAC%

Page 63: AMBER14 & GPUs

63

NMR Major, 1-syn* 48.6%

*This structure is never sampled in Amber FF simulations.

Page 64: AMBER14 & GPUs

pH-Replica Exchange molecular Dynamics in proteins using a discrete protonation method Each replica runs MD at a different pH (talk to me if interested), exchange between replica. One can titrate biomolecules very efficiently.

Page 65: AMBER14 & GPUs

Acid-range HEWL Titration Residue Calculated

pKa Exp. pKa

Glu 7 3.6 2.6

His 15 6.0 5.5

Asp 18 2.3 2.8

Glu 35 5.7 6.1

Asp 48 1.2 1.4

Asp 52 2.7 3.6

Asp 66 -18.1 1.2

Asp 87 2.5 2.2

Asp 101 3.7 4.5

Asp 119 2.3 3.5

RMSE 6.5 ------

(1) Webb, H.; Tynan-Connolly, B. M.; Lee, G. M.; Farrell, D.; O’Meara, F.; Sondergaard, C. R.; Teilum, K.; Hewage, C.; McIntosh, L. P.; Nielsen, J. E. Proteins 2011, 79, 685–702.

Constant pH MD, implicit solvent

Page 66: AMBER14 & GPUs

pH Replica Exchange MD pH = 5

pH = 4

pH = 3

pH = 2

pH = 5

pH = 4

pH = 3

pH = 2

?

?

Itoh, S. G.; Damjanovic, A.; Brooks, B. R. Proteins 2011, 79, 3420–3436.

Page 67: AMBER14 & GPUs

Protonation State Sampling

•  CpHMD pH-REMD

<RSS> = 2.33 × 10-2 <RSS> = 3.60 × 10-4

REMD wins, not by much

•  CpHMD pH-REMD

Page 68: AMBER14 & GPUs

Protonation State Sampling •  CpHMD

pH-REMD

<RSS> = 2.51 × 10-4 <RSS> = 1.33 × 10-1

REMD wins by a lot

Page 69: AMBER14 & GPUs

Residue CpHMD pKa pH-REMD pKa Experimental

pKa

Glu 7 3.6 3.8 2.6

His 15 6.0 5.9 5.5

Asp 18 2.3 2.0 2.8

Glu 35 5.7 5.0 6.1

Asp 48 1.2 2.0 1.4

Asp 52 2.7 2.3 3.6

Asp 66 -18.1 1.5 1.2

Asp 87 2.5 2.7 2.2

Asp 101 3.7 3.6 4.5

Asp 119 2.3 2.4 3.5

RMSE 6.5 0.89 ------

Substantially better results with pH-REMD

Page 70: AMBER14 & GPUs

Ackowledgements: David Case Andy McCammon John Mongan Ross Walker Tom Cheatham III Dario Estrin Marcelo Marti Leo Boechi

$ + computer time NSF/Blue Waters PRAC NSF SI2 Keeneland Director’s allocation Titan NVIDIA

Dr. Yilin Meng Natali Di Russo Dr. Jason Swails

Dr. Danial Sabri-Dashti

Page 71: AMBER14 & GPUs

Mass Repartitioning to Increase MD Timestep

(double the fun, double the

pleasure)

Chad Hopkins, Adrian Roitberg

Page 72: AMBER14 & GPUs

Background •  Work based off of :

–  “Improving Efficiency of Large Time-scale Molecular Dynamics Simulations of Hydrogen-rich Systems”, Feenstra, Hess, Berendsen – JCC 1999

•  Recommended increasing H mass by a factor of 4, repartitioning heavy atom masses to keep total system mass constant

•  For technical reasons with Amber code, we increase by a factor of 3

•  Waters not repartitioned (unneeded due to different constraint handling)

Page 73: AMBER14 & GPUs

Repartitioning Example

1.008 12.01

12.01

12.01

16.00

14.01

1.008

1.008

3.024

3.024

3.024

5.962

12.01

9.994

16.00

11.99

Original After Repartitioning

Page 74: AMBER14 & GPUs

Small peptide sampling (capped 3-ALA) 10 x 20 (40) ns Simulations

normal topology, 2 fs timestep middle phi/psi populations

repartitioned topology, 4 fs timestep middle phi/psi populations

Average Timings (Tesla M2070s) Normal topology, 2 fs timestep 160 ns/day

Repartitioned topology, 4 fs timestep 305 ns/day

Page 75: AMBER14 & GPUs

Dynamics test (peptide)

Free energy map generated from MD trajectory for phi/psi near N-terminal

Norm 2fs Repart 2fs Repart 4fs average hops 112 114 200

Counting hops (for equal numbers of frames in each case) across the barrier between the minima centered at (-60,150) and (-60,-40):

Page 76: AMBER14 & GPUs

Per-residue RMSDs (HEWL)

Page 77: AMBER14 & GPUs

C-N termini distance profile

Page 78: AMBER14 & GPUs

Free energy calculation with mass repartitioning

•  PMF from umbrella sampling for disulfide linker dihedral in MTS spin label used in EPR experiments

Page 79: AMBER14 & GPUs

SAN DIEGO SUPERCOMPUTER CENTER

Acknowledgements!San Diego Supercomputer Center

University of California San Diego

University of Florida

National Science Foundation NSF SI2-SSE Program

NVIDIA Corporation Hardware + People

People Scott Le Grand Duncan Poole Mark Berger

Sarah Tariq Romelia Salomon Andreas Goetz

Robin Betz Ben Madej Jason Swails

Chad Hopkins

79

Page 80: AMBER14 & GPUs

TEST DRIVE K40 GPU - WORLD’S FASTEST GPU

I am very glad that NVIDIA has provided the GPU Test Drive. Both my professor and I appreciate the opportunity as it profoundly helped our research.

” Mehran Maghoumi, Brock Univ., Canada

www.nvidia.com/GPUTestDrive

Page 81: AMBER14 & GPUs

SPECIAL OFFER FOR AMBER USERS Buy 3 K40 Get 4th GPU Free

Contact your HW vendor or NVIDIA representative for more details

www.nvidia.com/GPUTestDrive

Page 82: AMBER14 & GPUs

UPCOMING GTC EXPRESS WEBINARS

May 14: CUDA 6: Performance Overview May 20: Using GPUs to Accelerate Orthorectification,

Atmospheric Correction, and Transformations for Big Data May 28: An Introduction to CUDA Programming June 3: The Next Steps for Folding@home

www.gputechconf.com/gtcexpress

Page 83: AMBER14 & GPUs

NVIDIA GLOBAL IMPACT AWARD

•  $150,000 annual award

•  Categories include: disease research, automotive safety, weather prediction

•  Submission deadline: Dec. 12, 2014

•  Winner announced at GTC 2015

Recognizing groundbreaking work with GPUs in tackling key social and humanitarian problems

impact.nvidia.com