Brownian Dynamics Simulations with Hydrodynamic ... · PDF file Brownian Dynamics Simulations...

Click here to load reader

  • date post

    25-Jun-2020
  • Category

    Documents

  • view

    8
  • download

    0

Embed Size (px)

Transcript of Brownian Dynamics Simulations with Hydrodynamic ... · PDF file Brownian Dynamics Simulations...

  • 1

    Graham Lopez – Keeneland (http://keeneland.gatech.edu) 17 July 2014

    Brownian Dynamics Simulations with Hydrodynamic Interactions on GPUs

  • Acknowledgements Brownian dynamics / SPME

    Xing Liu and Edmond Chow, ''Large-scale Hydrodynamic Brownian Simulations on Multicore and Manycore Architectures,'' In 28th IEEE International

    Parallel & Distributed Processing Symposium (IPDPS), Phoenix, AZ, May 19-23,

    2014.

    Mitch Horton

    Keeneland Project http://keeneland.gatech.edu

    Jeff Vetter

    Dick Glassbrook

    2

  • What is Brownian dynamics? 3

    Simulate motion of molecules, macromolecules, and

    nanoparticles in a fluid environment

  • What is Brownian dynamics? 4

    Simulate motion of molecules, macromolecules, and

    nanoparticles in a fluid environment

    Figure from http://en.wikipedia.org/wiki/Brownian_motion

  • What is Brownian dynamics? 5

    Simulate motion of molecules, macromolecules, and

    nanoparticles in a fluid environment

    Solute molecules treated explicitly; fluid is treated implicitly as

    random forces

  • What is Brownian dynamics? 6

    Simulate motion of molecules, macromolecules, and

    nanoparticles in a fluid environment

    Solute molecules treated explicitly; fluid is treated implicitly as

    random forces

  • What is Brownian dynamics? 7

    Simulate motion of molecules, macromolecules, and

    nanoparticles in a fluid environment

    Solute molecules treated explicitly; fluid is treated implicitly as

    random forces

    Add hydrodynamic interactions to BD: long-range interactions

    of each particle on all other particles, mediated by the fluid

  • Matrix-free BD → PME BD with hydrodynamic interactions require 3n x 3n dense

    mobility matrices (n = number of particles) –

    10k particles with 32GB (Liu, Chow)

    8

  • Matrix-free BD → PME BD with hydrodynamic interactions require 3n x 3n dense

    mobility matrices (n = number of particles) –

    10k particles with 32GB (Liu, Chow)

    Replace the mobility matrix with a particle-mesh Ewald (PME)

    summation –

    Same* PME used in MD for electrostatic interactions

    9

  • Matrix-free BD → PME BD with hydrodynamic interactions require 3n x 3n dense

    mobility matrices (n = number of particles) –

    10k particles with 32GB (Liu, Chow)

    Replace the mobility matrix with a particle-mesh Ewald (PME)

    summation –

    Same* PME used in MD for electrostatic interactions

    Liu, Chow also showed that matrix-free was faster for as few

    as 1k particles

    *MD/PME operates with scalar E fields, BD/PME involves vector field of forces

    10

  • Smooth Particle Mesh Ewald (SPME) Ewald summation (Paul Peter Ewald) used to sum slowly

    converging, infinite sums (e.g. Coulomb interaction in a crystal)

    Short-range interactions summed in real space

    Long-range interactions summed in recipricol space

    11

  • Smooth Particle Mesh Ewald (SPME) Ewald summation (Paul Peter Ewald) used to sum slowly

    converging, infinite sums (e.g. Coulomb interaction in a crystal)

    Short-range interactions summed in real space

    Long-range interactions summed in recipricol space

    12

  • Smooth Particle Mesh Ewald (SPME) 13

    Ewald summation (Paul Peter Ewald) used to sum slowly

    converging, infinite sums (e.g. Coulomb interaction in a crystal)

    Short-range interactions summed in real space

    Long-range interactions summed in recipricol space

    Would like to use an FFT, but this is easier with a regular mesh

    Place particles on a regular mesh and use FFT

  • Smooth Particle Mesh Ewald (SPME) 14

    Ewald summation (Paul Peter Ewald) used to sum slowly

    converging, infinite sums (e.g. Coulomb interaction in a crystal)

    Short-range interactions summed in real space

    Long-range interactions summed in recipricol space

    Would like to use an FFT, but this is easier with a regular mesh

    Place particles on a regular mesh and use FFT

    Use cardinal B-splines to spread particle positions onto grid

    points, and then interpolate back to original positions at the

    end of the algorithm

  • SPME algorithm 15

    1. Construct 'P' matrix

    2. Spread particle forces

    3. FFT the interpolation grid

    4. Apply the influence function

    5. iFFT interpolation grid back to

    real space

    6. Interpolate velocities back to

    original particle positions

  • SPME algorithm 16

    1. Construct 'P' matrix

    2. Spread particle forces

    3. FFT the interpolation grid

    4. Apply the influence function

    5. iFFT interpolation grid back to

    real space

    6. Interpolate velocities back to

    original particle positions

  • SPME algorithm 17

    1. Construct 'P' matrix

    2. Spread particle forces

    3. FFT the interpolation grid

    4. Apply the influence function

    5. iFFT interpolation grid back to

    real space

    6. Interpolate velocities back to

    original particle positions

  • SPME algorithm 18

    1. Construct 'P' matrix 2. Spread particle forces

    3. FFT the interpolation grid

    4. Apply the influence function

    5. iFFT interpolation grid back to

    real space

    6. Interpolate velocities back to

    original particle positions

  • SPME algorithm 19

    1. Construct 'P' matrix

    2. Spread particle forces 3. FFT the interpolation grid

    4. Apply the influence function

    5. iFFT interpolation grid back to

    real space

    6. Interpolate velocities back to

    original particle positions

  • SPME algorithm 20

    1. Construct 'P' matrix

    2. Spread particle forces

    3. FFT the interpolation grid 4. Apply the influence function

    5. iFFT interpolation grid back to

    real space

    6. Interpolate velocities back to

    original particle positions

  • SPME algorithm 21

    1. Construct 'P' matrix

    2. Spread particle forces

    3. FFT the interpolation grid

    4. Apply the influence function 5. iFFT interpolation grid back to

    real space

    6. Interpolate velocities back to

    original particle positions

  • SPME algorithm 22

    1. Construct 'P' matrix

    2. Spread particle forces

    3. FFT the interpolation grid

    4. Apply the influence function

    5. iFFT interpolation grid back to real space 6. Interpolate velocities back to

    original particle positions

  • SPME algorithm 23

    1. Construct 'P' matrix

    2. Spread particle forces

    3. FFT the interpolation grid

    4. Apply the influence function

    5. iFFT interpolation grid back to

    real space

    6. Interpolate velocities back to original particle positions

  • SPME GPU implementation 24

    1. Construct P – custom kernel

    2. Spread forces – SPMV (custom or CUSPARSE)

    3. FFT – CUFFT

    4. Influence function – custom kernel

    5. iFFT – CUFFT

    6. Interpolation – SPMV (custom or CUSPARSE)

    Note: Spread → grid = P * forces;

    Interpolation → vels = P^T * grid

  • SPME GPU implementation - gotchas 25

    1. Construct P

    a. register/memory usage

    b. device utilization

    2. Spread forces

    a. memory access patterns

    b. atomic operations

    3. FFT

    a. batched

  • SPME GPU implementation - gotchas 26

    4. Influence function

    a. memory bound

    5. iFFT

    a. batched

    6. Interpolation (current bottleneck)

    a. memory access patterns

    b. atomic operations

    c. CUSPARSE formatting

  • 1000 10000 1e+05 Number of particles

    0.01

    0.1

    1 Ti

    m e

    (s ec

    .) CPU GPU Fermi GPU Kepler CUSPARSE Fermi CUSPARSE Kepler

    SPME GPU performance 27

  • SPME GPU performance 28

  • SPME GPU performance 29

    2x Xeon X5660 – 12 Cores / 12 threads 1x M2090 Fermi OR 1x K20 Kepler

    2x Xeon X5680 – 12 Cores / 24threads 1x Xeon Phi 7120 – native mode

  • SPME GPU performance 30

    2x Xeon X5660 – 12 Cores / 12 threads 1x M2090 Fermi OR 1x K20 Kepler

    2x Xeon X5680 – 12 Cores / 24threads 2x Xeon Phi 7120 – hybrid algorithm

  • SPME GPU performance 31

    Why does GPU version seem to struggle at larger input

    values?

  • SPME GPU performance 32

    Why does GPU version seem to struggle at larger input

    values?

    Look at the performance of the individual components of

    the SPME algorithm

  • SPME GPU performance 33

  • SPME GPU performance 34

  • SPME GPU performance 35

    Interpolation is just an SPMV, so what about CUSPARSE?

  • SPME GPU performance 36

  • SPME GPU performance 37