The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the...
Transcript of The GPU Supercomputer of CQSEcqse.ntu.edu.tw/cqse/download_file/twchiu_090116.pdf · At the...
The GPU Supercomputer of CQSE The GPU Supercomputer of CQSE
Workshop on GPU Supercomputing January 16, 2009
Ting-Wai Chiu (趙挺偉)
Department of Physics, and
Center for Quantum Science and Engineering
National Taiwan University
2
Graphic Processing Unit (GPU) Supercomputing
A graphic card (e.g., Nvidia
GTX280) is capable to deliver
> 100 Gflops (sustained) with
the price less than NT$12,000.
It gives a speed up 10x –100x
comparing with a single CPU.
Two GTX280 in one motherboard
• This opens up a great opportunity for many scientific
and engineering problems (in CQSE) which require
enormous amount of number-crunching power.
• Recall that in the past 50 years, each 10x jump in
computing power motivated new ways of computing,
which in turn led to many scientific breakthroughs.
T.W. Chiu, GPU Workshop, Jan 16, '09
T.W. Chiu, GPU Workshop, Jan 16, '09 3
Basic criteria for acquiring computing hardware
• What is its half-life ?
(How long it takes before its worth becomes only half of its value
at the time when it was purchased ?)
• What scientific/educational impacts it can produce within its half-life ?
(Note that, in sciences, only the first that counts).
• Are codes ready for production runs when the hardware is installed ?
(Never buy any hardware before your code for production runs is ready !)
• Is the price/performance ratio the optimal ?
(Taking into account of the power consumption, and the air-conditioning.)
4
GPU Supercomputer of CQSE
• It constitutes of 16 units of Nvidia
Tesla S1070 (total 64 GPU, 64 x 4 GB),
with 16 servers (total 32 quadcore CPU,
16 x 32 GB)
• Peak performance is 64 Tflops
(50 times higher than that of any PC
cluster with the same price tag)
• With our GPU supercomputer, we can
tackle many large scale computations
without using the prohibitively expensive
supercomputers like IBM BlueGene.
• We have developed highly efficient CUDA
(Compute Unified Device Architecture )
codes for our computationally intense
problems (quantum chromodynamics,
quantum spin systems, and astrophysics)
T.W. Chiu, GPU Workshop, Jan 16, '09
T.W. Chiu, GPU Workshop, Jan 16, '09 5
Projects for the GPU Supercomputer of CQSE
• Lattice QCD with Optimal Domain-Wall Fermion
(PI: Ting-Wai Chiu)
• Self-gravitating Gas Dynamics (PI: Tzihong Chiueh)
• Quantum Phase Transition in Strongly Correlated
Systems (PI: Ying-Jer Kao)
T.W. Chiu, GPU Workshop, Jan 16, '09 6
Lattice QCD with Optimal Domain-Wall Fermion
Quantum ChromoDynamics
The quantum field theory for the
strong interaction, e.g., the strong
nuclear force.
Optimal Domain-Wall Fermion[T.W. Chiu, Phys. Rev. Lett., 90 (2003) 071601]
For computing quark propagator in lattice QCD with ODWF,
Nvidia GTX280 (C1060) attains 120 Gflops, 85x faster than
Intel QuadCore CPU [email protected].
Formulating QCD on a 4d space-time
lattice is called Lattice QCD such that
numerical solutions of QCD can be obtained. [ K. Wilson, PRD, 1974]
T.W. Chiu, GPU Workshop, Jan 16, '09 7
• To understand the QCD vacuum fluctuations, and its
role in color confinement, and chiral symmetry breaking.
• To obtain the mass spectra of mesons and baryons,
their decay constants, and weak matrix elements.
• To obtain the mass spectra of exotic hadrons, e.g.,
hybrid mesons, 4-quark mesons, and pentaquark baryons.
Nonperturbative Strong Interaction Physics
to be tackled with GPU Supercomputer
This is the first large scale (state-of-the-art) lattice QCD
computation with Tesla S1070, without using any
expensive supercomputers like IBM BlueGene.
One slice of 3D Turbulence
3D Navier-Stokes solver with Adaptive Mesh Refinement
(AMR) scheme. GTX280 (C1060) is 15x faster than
Intel QuadCore CPU [email protected]
AMR
8T.W. Chiu, GPU Workshop, Jan 16, '09
For details, see Justin Schive’s talk in the afternoon
Self-gravitating Gas Dynamics (PI: Tzihong Chiueh)
Cosmology problems to be tackled with
GPU Supercomputer
Highest resolution cosmology simulations to address the galaxy formation problem
Highest resolution MHD simulations to address the black-hole accretion problem
Highest resolution MHD simulations to address the star formation problem
9T.W. Chiu, GPU Workshop, Jan 16, '09
simulation
algorithms
Computational Physics
Science and Engineering
entanglement
Quantum Information Theory
quantum
many-body
systems
10T.W. Chiu, GPU Workshop, Jan 16, '09
Quantum Phase Transition in Strongly
Correlated Systems (PI: Ying-Jer Kao)
Entanglement: tensor networks
Simulation of quantum many-body problems on a classical computer is hard
Use tensor networks (matrix/tensor product states) to reduce the number of degrees of freedom
Possible solution for simulating frustrated systems
(For details, see Ying-Jer Kao’s talk)
11T.W. Chiu, GPU Workshop, Jan 16, '09
GTX280 (C1060) attains 92 Gflops,
92x faster than Intel QuadCore CPU [email protected]
Impacts of the CQSE GPU Supercomputer
• CQSE is playing the leading role of GPU
supercomputing in Taiwan, achieving
world-class contributions in the frontiers
of QCD, quantum spin system, and
cosmology.
Quantum ChromoDynamics
Quantum Spin System
• CQSE will offer a graduate
course on CUDA
programming for science
and engineering students
at NTU, in the next
semester.
• CQSE is designing the-state-of-the-art
CUDA codes for a wide range of physics
and engineering problems which will lead
to exciting scientific discoveries, and the
cutting edge technologies.
One slice of 3D Turbulence
12T.W. Chiu, GPU Workshop, Jan 16, '09
85x
15x
92x
Simulating Lattice Simulating Lattice QQCCDDwith GPU supercomputer with GPU supercomputer
Workshop on GPU Supercomputing January 16, 2009
Ting-Wai Chiu (趙挺偉)
Department of Physics, and
Center for Quantum Science and Engineering
National Taiwan University
14
The quantum field theory for the strong interaction between
quarks and gluons.
:
Gauge group gluons have self-interacti ons.
Asymptotic freedom: .
IR slavory:
(3)
( ) 0 as 0
( )
Salient features
SU
g r r
g r
15 quark/color confinement
No exact analytic solutions
1 as 10 mr
Quantum Chromodynamics (QCD)
T.W. Chiu, GPU Workshop, Jan 16, '09
QuarksQuarks
Quarks are spin fermions carrying color,
and there are 6 species (flavors) of quarks.
1
2
u c t
d s b
u c t
d s b
u c t
d s b
Hadrons are color singlets composed of quarks
antisym. in colorP uu d
antisym. in colorN du d
u u ud dd
The nuclear force between nucleons emerges as
residual interactions of QCD15T.W. Chiu, GPU Workshop, Jan 16, '09
The action of The action of QQCCDD
4
1tr
2
D D
D f f f
flavo
CQ QC
CQ
rs
S d x
F F i igA m
/2, [ , ], tr ab a b ca a a b abcT T i TA T A T T f
, a a a a a abc b cF T F F A A g f A A
, 1, ,8, generators of (3), 3 3 Hermitian matrices.aT a SU
Here the color and Dirac indices of quark fields are suppressed.
Explicitly, for quark atu , , ,x t x y z
, 1,2,3,4, , ,f c x gu r yc
16
4[ , , ] exp [ ]QC
a
D
aiZ J dA d d d x J A
Nobody can solve the ground state (vacuum) of QQCCDD !
T.W. Chiu, GPU Workshop, Jan 16, '09
The Challenge of The Challenge of QQCCDD
At the hadronic scale, , perturbation theory isincapable to extract any quantities from QCD, nor to tacklethe most interesting physics, namely, the spontaneouslychiral symmetry breaking and the color confinement
1g r
To extract any physical quantities from the first principlesof QCD, one has to solve QCD nonperturbatively.
A viable nonperturbative formulation of QCD was firstproposed by K. G. Wilson in 1974, i.e., Lattice QCD
But, the problem of lattice fermion, and to formulate exact chiral symmetry on the lattice had not been resolved until 1992-98.
17T.W. Chiu, GPU Workshop, Jan 16, '09
Basic notions of Lattice Basic notions of Lattice QQCCDD
1. Perform Wick rotation: , then ,and the expectation value of any observable O
4t ix exp( ) exp( )EiS S
1
, , ESO dA d d O A eZ
ESZ dA d d e Recall that the divergences in QFT, which
requires reg. and ren. , stemming from d.o.f. ,
and proximity of any field operator .
2. Discretize the space-time as a 4-d lattice with latticespacing a. Then the path integral in QFT becomes a well-definedmultiple integral which can be evaluated via Monte Carlo
44L Na
1
, , E
i j
i j
S
k
k
eO dA d d O AZ
18T.W. Chiu, GPU Workshop, Jan 16, '09
Gluon fields on the LatticeGluon fields on the Lattice
Then the gluon action on the lattice can be written as
where
ˆx a ˆˆx a a
x ˆx a
The color gluon field are defined on each linkconnecting and , through the link variable
A x 3SU
x ˆx a
ˆexp2
aU x iagA x
4
2plaquette
6 1 11 Re 0
3 2g pS U tr U a d x tr F x F x
g
† †ˆˆpU U x U x a U x a U x
19T.W. Chiu, GPU Workshop, Jan 16, '09
LatticeLattice QQCCDD
,
The QCD action
where is the action of the gluon fields
( ) ( )
( )
( ) ( )
, , , , ,
fl v a r o
G
G
f f
a x f a x b y b y
S S U D U
S U
D U D U
f u d s c b t
sites
, 1,2,3
, =1,
index
color index
2,3,4
, 1
Dirac
, , =
inde
s
x
x y z t
a b
x y N N N N N
3
1
16 32
1,572,864 1,
ite index
For example, on the lattice, is a complex matrix
of size 572,864
d( , , ) ( , )( ,
et( )
det( )
, )
G
G
SS
SS
dUd d U e dU D U eU
dUd d e dU
D
D
D e
20T.W. Chiu, GPU Workshop, Jan 16, '09
T.W. Chiu, GPU Workshop, Jan 16, '09 21
The Challenge of Lattice The Challenge of Lattice QQCCDD
So far, the lightest u/d quark cannot be put on the lattice.
To use ChPT to extrapolate lattice results to physical ones.
To have lattice volume large enough such that 1. m L
l To have attice spacing small enough such that 1.qm a
3
The lattice size should be at least .
The required computing power i
s around
To meet the above two conditions:
100 200
Petaflops !
HMC Algorithm for 2 flavor QCD
†
† † 1
2
1. Initial gauge configuration
2. Generate with probability distribution
3. Generate with probability distributi exp( )
Recal
{ }
{ }
l:
exp[ (
exp[ ( ) ] = ex
)
p
o
2
n
]
[
/
l
a a
l l
U
P P
D D
†
†
1†
4. Fixin
(
g
) ( ( ))
( ) ( ) ( ), ( ) ( )
the pseudofermion field
5. Molecular dynamics
]
the
(leap-frog/Omelyan inte
most expensive part of H
grato
MC
r
)
a a
l l l l l
D D U
U iP U P P T
D
† †
6. Accept { } wi
( ) ( ) ( ) ( ) ( )
th the probability
7. Go to
min 1,ex
.
( )
2
p
a a a
l l
l A
l GP D S U
U P H
D D D U
H
†, Ax b A D D
,
,
k k
k
k k
r r
p Ap
Conjugate Gradient algorithm †( )D D
1k k k kx x p
1k k k kr r Ap
1 1,
,
k k
k
k k
r r
r r
1 1k k k kp r p
0 0 0 0 00, , x r b Ax p r
†
The most time-consuming operations
are the matrix-vector multiplications:
GPU computes in single precision
much faster than in dou
(
ble precisio
)
!
n
kD Dp
1If | | | |, then stopkr b
CG algorithm with mixed precision
1
1
1
1 1
1.
2. If | | | |, then stop
3.
4.
5. Go to 1.
Pr
Let
the
Solve in
| | |
single precision to an accuracy 1
,
oo
|
f of convergence:
|
|n
|
|
k k
k
k
k k
k
k k
k k k k
k k
At r
r b Ax
r b
x x
s r A s r
r b A
t
x
t
1| | | | | | | |k kkk kb Ax A s rt r
(To recount the tuning of our CUDA kernel for CG with
mixed precision, see Kenji Ogawa’s talk)
T.W. Chiu, GPU Workshop, Jan 16, '09 25
First results of the First results of the QQCCDD Vacuum from GPU Vacuum from GPU
• The vacuum (ground state) of QCD constitutes
various quantum fluctuations.
• These quantum fluctuations are the origin of many
interesting and important nonperturbative physics.
• In QCD, each gauge configuration possesses a
well-defined topological charge Q with integer value.
• Thus it is important to determine the topological
charge fluctuation in the QCD vacuum.
, 1,0, 1,
( ) i Q
Q
Q
Z e Z
[ ] ,ES Q
QZ dA d d e
T.W. Chiu, GPU Workshop, Jan 16, '09 26
Quantum fluctuations in the QCD vacuumQuantum fluctuations in the QCD vacuum
0tQ
243.3 10 sect
16
33 15
1.2 10 m
2 10 m
x a
L
316 32
T.W. Chiu, GPU Workshop, Jan 16, '09 27
Quantum fluctuations in the QCD vacuumQuantum fluctuations in the QCD vacuum
1tQ
243.3 10 sect
16
33 15
1.2 10 m
2 10 m
x a
L
316 32
T.W. Chiu, GPU Workshop, Jan 16, '09 28
OutlookOutlook
• To clarify the nature of QCD vacuum, whether it is more
instanon-like, or more complicated 2-dim or 3-dim
sheet-like structure. Namely, to test the (anti-)self-duality,
1
2F F F
• To identify the QCD vacuum fluctuations which are the
most relevant to the mechanism of color confinement.
• GPU supercomputers will play the most important role in
simulating lattice QCD, which will unveil the nonperturbative
strong interaction physics from the first principles.
T.W. Chiu, GPU Workshop, Jan 16, '09 29
Backup slides
How to avoid computing How to avoid computing fermionfermion determinant determinant
1
( )( )
The central problem of lattice QCD is to generate a set of
gauge configurations with probability
But the computation of
{ , , },
det
is too costly,
( ) .
det ( ) ( )ince ss i
G
QCD
N i i
S U
f
f
f f
p C
C C C U
D U
D C D C
e
a huge matrix.
1/ 2† † † † 1/ 2det (det ) exp[ ( ) ]
Then
det exp [ ( )
Introduce which carry the same quantum
numbers of the quarks, but obey Bose statistics, i.e.
f f f f f f f f f
f f f f
f G
f
D D D d d D D
Z dU D S U
pseudofermions
† † † 1/ 2
]
exp[ ( ) ( ) ] f f G f f f f
f f
dU d d S U D D
HMCHMC for for QCDQCD
2 1/ 2
For each link variable
introduce conjugate momentum
Then the Hamiltonian
exp( ) exp( )
for HMC i
1tr( ) ( ) ( )
2
s
x x l l
l l
l G f f f f
l f
U iA U iA
P A
H P S U D D
† †
† l f f
l f
H
ldPZ dU d d e
are generators of (3) gauge group
,
satisfying
a a a a
l l l l
a S
P P T A T
U
A
T
8
1
tr( )
1
3
a b
ab
a a
in mj ij mnij mna
T T
T T
† † †
1 , tr( ) 0
l l l
l l l l l l l
U iP U
UU U U P P P
a a a a a
l l l l l l lij ija ij a ijl lij ij
S U S UH P P i PU P P i T U
U U
HMCHMC for for QCD (cont)QCD (cont)
I 0mpose a a
l l ijij l ij
S UH P i T U
U
Define a a
l l ijij l ij
f UD f U i T U
U
a a
l lP D S U
HMC for 2 flavor QCD ( )u dm m
† † † † 1det det det det exp[ ( ) ] f u d u u u u
f
D D D D D d d D D
1† †
† †
a a a a
l l l G l
a a
l G l
P D S U D S U D D D
D S U D D D
1
†D D
† †2 1
†
1tr( ) ( ) ( )
2
exp( )
l G u u
l
l
l
ldP
H P S U D D
Z dU d d H
† †† † 1 exp[ ( ) ] = expNote that [ ], u u uD D D