Rue Léon Gontran Damas, Fann Résidence, Bp 15 532 Dakar- Fann , Sénégal
W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison
description
Transcript of W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison
![Page 1: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison](https://reader030.fdocuments.net/reader030/viewer/2022013012/568153db550346895dc1d3b4/html5/thumbnails/1.jpg)
Implementation of Density Functional Theory based Electronic Structure Codes on Advanced Computing
Architectures
W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison
![Page 2: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison](https://reader030.fdocuments.net/reader030/viewer/2022013012/568153db550346895dc1d3b4/html5/thumbnails/2.jpg)
Cray-X1E• 1024 Multi-streaming vector processor (MSP)
– Each MSP has 2 MB of cache and a peak computation rate of 12.8 GF– 4 single-streaming processors (SSPs) form a node with 16 Gbytes of
shared memory– Memory is physically distributed on individual modules– all memory is directly addressable to and accessible by any MSP in the
system through the use of load and store instructions
![Page 3: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison](https://reader030.fdocuments.net/reader030/viewer/2022013012/568153db550346895dc1d3b4/html5/thumbnails/3.jpg)
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Cray XT3
• 5294 nodes• 4-processors per node connected through
hypertransport 2.4-GHz AMD Opteron processor and 2 GB of memory
• SDRAM memory controller and function of Northbridge is pulled onto the Opteron die. Memory latency reduced to <60 ns
• Interface off the chip is an open standard (HyperTransport)
Six Network LinksEach >3 GB/s x 2
(7.6 GB/sec Peak for each link)
6.5 GB/secSustained
2.17 GB/secSustained
![Page 4: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison](https://reader030.fdocuments.net/reader030/viewer/2022013012/568153db550346895dc1d3b4/html5/thumbnails/4.jpg)
Applications• Porphyrin Functionalized
Nanotube • 1532 atoms• STO-3G basis set = 6380 basis
functions• Do the covalently attached
porphyrins undergo facile absorption of visible light and transferelectrons to the nanotube
• What type of efficiencies does one obtain
• Time for LDA energy + gradient using 300 processors = less than 1 hour
![Page 5: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison](https://reader030.fdocuments.net/reader030/viewer/2022013012/568153db550346895dc1d3b4/html5/thumbnails/5.jpg)
LSDA &Multiple Scattering Theory (MST)
1
2{( )1 } 1 ( ')2
Im ( ) Tr
Im (
( , '; )
( , '; )
( , '
( )
r )) ;( ) T1
eff G
G
Vm
dn f
d f G
π
ε δ
ε ε μ
μ σ
ε
ε εε
ε
π
+ ∇ − = −
= −
= −
∫
∫
r
m
r r
r r
r r
r
r
r) )) )
)
)
h
%
No
Yes
Mix
in & out
Initial guess
nin(r) , min(r)
Calculate
Veff[n,m]in
Solve Schrodinger
Equation
Recalculate
nout(r) , mout(r)
nin(r) = nout(r) ?
min(r) =mout(r) ?
Calculate
Total Energy
Multiple Scattering Theory (MST)J. Korringa, Physica 13, 392, (1947)W. Kohn, N. Rostoker, PR, 94, 1111,(1954)
MST Green function methodsB. Gyorffy, and M. J. Stott, “Band Structure
Spectroscopy of Metals and Alloys”, Ed. D.J. Fabian and L. M. Watson (Academic 1972)
S.J. Faulkner and G.M. Stocks, PR B 21, 3222, (1980)
![Page 6: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison](https://reader030.fdocuments.net/reader030/viewer/2022013012/568153db550346895dc1d3b4/html5/thumbnails/6.jpg)
Algorithm Design for future generation architectures
• More accurate• Spectral or pseudo-spectral accuracy
• Wider range of applicability
• Sparse representation• Memory requirements grow linearly
• Each processor can treat thousands of atoms• Make use of large number of processors
• Message-Passing• Each atom/node local message-passing is independent of the size of the system
• Time consuming step of model• Sparse linear solver
•Direct or preconditioned iterative approach
![Page 7: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison](https://reader030.fdocuments.net/reader030/viewer/2022013012/568153db550346895dc1d3b4/html5/thumbnails/7.jpg)
Multiple Scattering Theory
nR
n′R
nr
nr
r
′r
n
n′
decay slowly with increasing distancecontain free-electron singularities
Generalization of t-matrix. Converts incoming wave at site n into outgoing wave at site m in the presence of all the other sites
€
M(ε) =
t0−1 −G01(ε) −G02 (ε) −G03 (ε)
−G10 (ε) t1−1 −G12 (ε) −G1m(ε)
−G20 (ε) −G21(ε) t2−1 −G2m(ε)
−G30 (ε) −G31(ε) −G32 (ε) tm−1
⎡
⎣
⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥
Multiple scattering theory• Green function
• Scattering path matrix
€
τnm(ε) =tn(ε)δnm+ tnn≠m
∑ (ε)G( r Rn −
r Rm)τ
nm(ε)
τnm(ε) =(Mnm(ε))−1
ML ′ Lnm(ε) =tL ′ L
−1δL ′ L δnm−GL ′ L ( r Rn −
r Rm)
€
Gnm( r r, r′ r ;ε) = ZL
n
L ′ L
∑ ( r rn;ε)τL ′ L
nm(ε)Z ′ Lm(
r′ rm;ε)−ZL
n( r r<;ε)J ′ L
m( r′ r>;ε)δL ′ L
€
Gnm(ε) =−14π
eiε 1/ 2
r Rn−
r Rm
r Rn −
r Rm
![Page 8: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison](https://reader030.fdocuments.net/reader030/viewer/2022013012/568153db550346895dc1d3b4/html5/thumbnails/8.jpg)
Complex Energy Plane
The scattering properties at complex energy can be used to develop highly efficient real-space and k-space methods
Real ε
Im ε
εf
Semi-conductors and insulators could work well since they have no states at εf
Real ε
Im ε
εf
Scattering is local since there are no states near the bottom of the energy contourScattering is local since a large Im ε is equivalent to rising temperature which smears out the states
Near εf scattering is non-local (metal)
εf is the highest occupied electronic state in energy
![Page 9: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison](https://reader030.fdocuments.net/reader030/viewer/2022013012/568153db550346895dc1d3b4/html5/thumbnails/9.jpg)
Multiple Scattering Theory
€
τ−1L ′ L (ε) =t
L , ′ L ,0
−1 (ε) - GL ′ L(R
ij,ε)
Depends on constituent V(r)Independent of lattice
Depends on ε and Underlying lattice structure
Representation is ideal for disorder systems since t-matrixis site diagonal !!Coherent Potential Approximation (CPA)Non-local CPA (based Dynamical Cluster Approximation toDynamical mean-field theory
![Page 10: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison](https://reader030.fdocuments.net/reader030/viewer/2022013012/568153db550346895dc1d3b4/html5/thumbnails/10.jpg)
t-matrix• Solve for t-matrix inside
Voroni polyhedron– Using Calergo method
€
R l (r) =jl (κr)−κ s∫ 2V(s)R l (s) nl (κs) jl (κr)−jl (κs)nl (κr)[ ]ds
R l (r) =cl (r) jl (κr)−sl nl (κr)dcl (r)
dr=−κr2V(r)nl (κr) cl (r) jl (κr)−sl nl (κr)[ ]
dsl (r)dr
=−κr2V(r) jl (κr) cl (r) jl (κr)−sl nl (κr)[ ]
Equations solved for both regular and irregular solutionsSince each atom and is independent can solve in shared memory. Similar for the structure constants.
€
l
![Page 11: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison](https://reader030.fdocuments.net/reader030/viewer/2022013012/568153db550346895dc1d3b4/html5/thumbnails/11.jpg)
Parallel Implementation
• Green’s function
• Scattering path matrix: real space
t : scattering from single site
G: structure constant matrix
M=[t-1()-G(Rnm,]
τ=M-1
• Once M is fixed increasing N does not
affect the local calculation of M-1
€
Gnm( r r, r′ r ;ε) = ZL
n
L ′ L
∑ ( r rn;ε)τL ′ L
nm(ε)Z ′ Lm(
r′ rm;ε)−ZL
n( r r<;ε)J ′ L
m( r′ r>;ε)δL ′ L
![Page 12: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison](https://reader030.fdocuments.net/reader030/viewer/2022013012/568153db550346895dc1d3b4/html5/thumbnails/12.jpg)
Tight-Binding MST Representation
Tight Binding Multiple Scattering Theory
Embed a constant repulsive potential
Shifts the energy zero allowing for calculations at negative energy
Rapidly decaying interactions
Free electron singularities are not a problem
Sparse representation
€
G ,nms(ε)∝e−(Vs−ε)1/ 2 |
r R n−
r R m|
€
Gnm(ε)∝e−ε1/ 2
r R n−
r R m
r Rn −
r Rm
€
G =G0 +G0tG0 +G0tG0tG0 =G0 (I −tG0 )−1
G−1 =G0
−1 −tG r =G0 +G0t
rG0 +G0trG0t
rG0 =G0 (I −trG0 )−1
(G r )−1 =G0
−1 −tr → G0
−1 =(G r )−1 +tr
G−1 =(G r )−1 +tr −t → (G r )−1 −Δt
}
2 Ryd.
0Vr
Constant inside a sphere
€
r R ≤
r Rmt
€
r R >
r Rmt
![Page 13: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison](https://reader030.fdocuments.net/reader030/viewer/2022013012/568153db550346895dc1d3b4/html5/thumbnails/13.jpg)
Screened Structure Constants
€
Gs (ε) =[I −tsGfree(ε)]−1
€
G( r k,ε) = Gm,s (ε)ei
r k•
r Rm
m
∑
• Screened Structure Constants Gs
on the left unscreened on the right– Screened structure constants
rapidly go to zero, whereas the free space structure constants have hardly changed
• Linear solve using m atom cluster that is less than the n atom system
• Easy to perform Fourier transform–K-space method
![Page 14: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison](https://reader030.fdocuments.net/reader030/viewer/2022013012/568153db550346895dc1d3b4/html5/thumbnails/14.jpg)
Screened MST Methods
• Formulation produces a sparse matrix representation– 2-D case has tridiagonal structure with a few distant elements due to
periodicity– 3-D case has scattered elements
• Mainly due to mapping 3-D structure to a matrix (2-D)• A few elements due to periodic boundary conditions
• Require block diagonals of the inverse of τε matrix– Block diagonals represent the site τε matrix and are needed to
calculate the Green’s function for each atomic site• Sparse direct and preconditioned iterative methods are used to calculate
τii(ε)– SuperLU– Transpose free Quasi-Minimal Residual Method (TFQMR)
![Page 15: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison](https://reader030.fdocuments.net/reader030/viewer/2022013012/568153db550346895dc1d3b4/html5/thumbnails/15.jpg)
Screened KKR Accuracy
fcc Cubcc Cubcc Mohcp Co
![Page 16: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison](https://reader030.fdocuments.net/reader030/viewer/2022013012/568153db550346895dc1d3b4/html5/thumbnails/16.jpg)
Timing and Scaling of Scr-K KKR-CPA
![Page 17: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison](https://reader030.fdocuments.net/reader030/viewer/2022013012/568153db550346895dc1d3b4/html5/thumbnails/17.jpg)
Conclusion
• Initial benchmarking of the Screened KKR method– SuperLU N1.8 for finding the inverse of the upper left block of τ– TFQMR with block Jacobi preconditioner N1.06 for finding the inverse of
the upper left block of τ• Extremely high sparsity (97%-99% zeros increases with increasing
system size)• Large number of atoms on a single processor• Real-space/Scr-KKR hybrid may provide the most efficient parallel
approach for new generation architectres• Single code contains
– LSMS, KKR-CPA, Scr-LSMS and Scr-KKR-CPA
![Page 18: W.A. Shelton, E. Aprá, G.I. Fann and R.J. Harrison](https://reader030.fdocuments.net/reader030/viewer/2022013012/568153db550346895dc1d3b4/html5/thumbnails/18.jpg)
Local Poisson EquationDiscontinuous Galerkin approach
€
σh ⋅τK∫ dx=- uh∇⋅τ
K∫ dx+ ˆ uKK
∫ ⋅nKτ ,ds ∀τ ∈ ( )K∑
σh ⋅∇νK∫ dx=- fνK∫ dx+ ˆ σ KK∫ ⋅nKν ,ds ∀ν ∈ P(K )
€
−Δu=f inΩ
€
∇u=σ∇⋅σ =f
€
τ =tensor basis in d-dimension
ν =basis
uh =is ,the representation of u on an element
same forσh
√ uh =