The 2D Ising Model on GPU Clusters

Post on 11-May-2015

487 views 3 download

Tags:

description

This talk was given by me at the Spring Meeting 2010 of the DPG at Regensburg today, in the division "Dynamics and Statistical Physics".

Transcript of The 2D Ising Model on GPU Clusters

The 2D Ising Model on GPU Clusters

Benjamin BlockUniversity of Mainz, Institute for Physics

Thanks to: Tobias Preis, Peter Virnau

Overview

• GPUs: Optimized for massively parallel processing

• Previous work: GPU Accelerated Ising Model

• Architecture specific optimization

• GPU clusters begin to establish – Multi GPU implementation useful

T. Preis, P. Virnau, W. Paul, J. J. Schneider:GPU Accelerated Monte Carlo Simulation of the 2D and 3D Ising Model, J. Comp. Phys., 228 (2009)

Ising Model (Ferromagnetism)

T >> TC T ~ TC T << TC

Lattice of spins

Metropolis Monte Carlo

Perform successive spin flips!

Probability: Metropolis criterion

Parallelization of Metropolis Updates

Idea: Update non-interacting domains in parallel

Checkerboard Update

Programming the GPU

Slowglobal

memory

Fastshared

memory

Store spin lattice

Use for local computations

Execute the same code for different data in parallel

Utilize different kinds of memory

Reduce slow memory access

Slowglobal

memory

Fastshared

memory

Idea: Store 4x4 spin blocks in 1 unit of GPU memory

For each parallel thread

Access 16 spins with one memory lookup

Perform local computations in (fast) shared memory

XOR

Update scheme in shared memoryInteger array in shared memory

Perform Computations

(draw random number, evaluate

Metropolis criterion)

Old spins New spinsUpdate pattern

=

Performance measurement

CPU previous

CPU optimized

GPUprevious

GPU optimized

Fair comparison:

Heavily optimized CPU implementation

How to measure performance?Single spin flips per time unit!

~ 20x

~ 200x

Multi GPU communication

Distribute spin lattice among many GPUs

Border information has to be passed between GPUs after each complete update step

Multi-GPU Performance

Measure: Single spin flips per GPU

Communication overhead

Bottleneck forsmall system sizes

Simulation on GPU Clusters

• On 64 GPUs: 256 GB video memory!

• A lattice of 800.000 x 800.000 spins could be processed.

• Processing the whole lattice on 64 GPUs: 3 seconds!

Tesla S1070 UnitAt NEC Nehalem Cluster Stuttgart

128 GPUs

Conclusion

• Optimization is important (CPU and GPU) for fair comparison

• The 2D Ising model is a good candidate for parallel processing on GPU clusters

• Submitted to be published in Computer Physics Communications

• Source code will be made available at www.tobiaspreis.de