www.DLR.de • Chart 1 > MC-CARP-CG > J. Thies et al. • > PMAA'14
A Hybrid Parallel Iterative Solver forInde�nite Systems in Interior EigenvalueComputations
Jonas Thies1 Lukas Krämer2 and Andreas Pieper3
PMAA, Lugano, July 4, 2014
1German Aerospace Center (DLR)Simulation and Software [email protected]
2University of WuppertalApplied Computer [email protected]
3University of GreifswaldTheoretical [email protected]
www.DLR.de • Chart 2 > MC-CARP-CG > J. Thies et al. • > PMAA'14
Outline
Graphene simulation and the FEAST method
The CGMN algorithm
Parallelization
Experiments
www.DLR.de • Chart 3 > MC-CARP-CG > J. Thies et al. • > PMAA'14
Graphene simulation and the FEASTmethod
www.DLR.de • Chart 4 > MC-CARP-CG > J. Thies et al. • > PMAA'14
Graphene
a1
a
a2
Physical space: carbon atoms in2D hexagonal mesh
Fourier space (`reciprocal mesh')
Tight-binding Hamiltonian
H = −t∑〈ij〉
(c†i cj + c†j ci )
www.DLR.de • Chart 5 > MC-CARP-CG > J. Thies et al. • > PMAA'14
Graphene (2)
−3 −2 −1 0 1 2 3−2
−1 0
1 2
−3
−2
−1
0
1
2
3
E /
t
kx / a
ky / a
E /
t
KK’
K
−0.1 0
0.1−0.1
0
0.1−0.2
0
0.2
E /
t
(kx − Kx) / a(ky − Ky) / a
E /
t Analytical solution for in�nite Graphene sheet
Dirac cones: graphene between conducter and semi-conducter
www.DLR.de • Chart 6 > MC-CARP-CG > J. Thies et al. • > PMAA'14
Graphene modelling
disorder
long range stencil
bilayer
gate-de�ned quantum dots
spin-orbit coupling
...
Long range Hamiltonian:
H =∑
i
Vic†i ci−t
∑〈ij〉
(c†i cj + c†j ci )−t
′∑〈〈ij〉〉
(c†i cj + c†j ci )−t
′′∑
〈〈〈ij〉〉〉
(c†i cj + c†j ci )
www.DLR.de • Chart 7 > MC-CARP-CG > J. Thies et al. • > PMAA'14
The FEAST algorithm at a glance
Input: Iλ :=[λ, λ
], an estimate m̃ of the number of eigenvalues in Iλ.
Output m̂ ≤ m̃ eigenpairs with eigenvalue in Iλ.Perform:
1 Choose Y ∈ Cn×m̃ of full rank and computeU :=
1
2πi
∫C(zB− A)−1Bdz Y,
2 Form AU := U?AU, BU := U
?BU,
3 Solve size-m̃ eigenproblem AUW̃ = BUW̃�̃,
4 Compute (�̃, X̃ := U · W̃),5 If no convergence: go to Step 1 with Y := X̃.
www.DLR.de • Chart 8 > MC-CARP-CG > J. Thies et al. • > PMAA'14
Linear systems for FEAST/graphene
Tough:
very large (N = 108 − 1014) complex symmetric and completely inde�nite
random numbers on and around the diagonal
spectrum essentially continuous
shifts get very close to the spectrum
But also nice in some ways:
2D mesh, very sparse (∼ 10 entries/row) multiple RHS/shift (block methods, recycling, ...)
We need O(100) Eigenpairs =⇒ very coputationally heavy...
www.DLR.de • Chart 9 > MC-CARP-CG > J. Thies et al. • > PMAA'14
The CGMN algorithm
www.DLR.de • Chart 10 > MC-CARP-CG > J. Thies et al. • > PMAA'14
An ancient row projection method
Björck and Elfving, 1979
CG on the `minimum norm' problem, AAT x = b
preconditioned by SSOR
e�cient row-wise formulation
extremely robust: A may be singular, non-square etc.
row scaling aleviates issue of `squared condition number'
www.DLR.de • Chart 11 > MC-CARP-CG > J. Thies et al. • > PMAA'14
Kernel operation: KACZ sweep
Interpretations:
Kaczmarz algorithm
SOR(ω) on the normal equations AAT x = b
successive projections onto the hyperplanesde�ned by the rows of AIn CRS (rptr,val,col):
1: compute nrms=||ai,:||222: for (i=0; i
www.DLR.de • Chart 12 > MC-CARP-CG > J. Thies et al. • > PMAA'14
Parallelization
www.DLR.de • Chart 13 > MC-CARP-CG > J. Thies et al. • > PMAA'14
Multi-Coloring (MC)
requires �distance 2� coloring
software: ColPackhttp://cscapes.cs.purdue.edu/coloringpage/software.htm
http://cscapes.cs.purdue.edu/coloringpage/software.htm
www.DLR.de • Chart 14 > MC-CARP-CG > J. Thies et al. • > PMAA'14
Component-Averaged Row Projection (CARP)
Gordon & Gordon, 2005
Kaczmarz locally
write to halo
exchange and average
Equivalent to Kaczmarz on a superspace of Rn
www.DLR.de • Chart 15 > MC-CARP-CG > J. Thies et al. • > PMAA'14
Hybrid method: MC_CARP-CG
global MC would require...
an extremely scalable coloring method very well-balanced colors many global sync-points (> 20 colors in our examples)
global CARP would require...
huge number of MPI procs increasing amount of `interior halo elements' non-trivial implemention on GPU and Xeon Phi increasing number of iterations
Idea: node-local MC with MPI-based CARP between the nodes
www.DLR.de • Chart 16 > MC-CARP-CG > J. Thies et al. • > PMAA'14
Experiments
www.DLR.de • Chart 17 > MC-CARP-CG > J. Thies et al. • > PMAA'14
Experimental setup
Machine: Intel Xeon �Ivy Bridge�
10 cores/socket, 2 sockets/node
In�niBand between nodes
Here's what we do:
pick some shifts that may occur in FEAST
handle 8 rhs at once (for good performance)
conv tol 10−12
solve linear systems using CGMN variants
www.DLR.de • Chart 18 > MC-CARP-CG > J. Thies et al. • > PMAA'14
Sequential CGMN for various shifts
0
0.2
0.4
0.6
0.8
-1 -0.5 0 0.5 1
imag
part
real part
10−12
10−10
10−8
10−6
10−4
10−2
100
50 100 150 200 250 300 350 400
||r|| 2/||r
0|| 2
iteration
www.DLR.de • Chart 19 > MC-CARP-CG > J. Thies et al. • > PMAA'14
Coloring vs. CARP: single socket (10242 dof)
0
100
200
300
400
500
1 2 3 4 5 6 7 8
iterations
chosen shift
MCLEXCA 2CA 4CA 8CA 10
0
200
400
600
800
1000
1200
1400
1 2 4 8 10
runtimeforallshifts
cores
CARP
Multi-Coloring
www.DLR.de • Chart 20 > MC-CARP-CG > J. Thies et al. • > PMAA'14
Weak scaling of Hybrid vs. CARP (40962 dof/node)
0
1000
2000
3000
4000
5000
6000
20 80 160 320
timefor�rstshift
cores
MC-CARP
CARP
www.DLR.de • Chart 21 > MC-CARP-CG > J. Thies et al. • > PMAA'14
The (almost) �nal slide
Graphene gives nice and challenging test cases for Lin. Alg.
FEAST requires fast linear solvers for indef. systems
row projection methods work very well here
hybrid is a natural choice here - and works
Future work:
integration in FEAST loop
stencil-based implementation
GPU and Xeon PhiAcknoledgement: DFG SPPEXA project ESSEX
(Equipping Sparse Solvers for the EXa-scale)
www.DLR.de • Chart 22 > MC-CARP-CG > J. Thies et al. • > PMAA'14
References
Pieper et. al.: E�ects of disorder and contacts on transport throughgraphene nanoribbonsPhys. Rev. B 88, 195409, (2013).
Polizzi: Density-Matrix-Based Algorithms for Solving EingenvalueProblems. Phys. Rev. B. 79 - 115112 (2009)
Björck & Elfving: Accelerated projection methods for computingpseudoinverse solutions of systems of linear equations.BIT 19, pages 145�163, 1979.
Gordon & Gordon: Component-averaged row projections: A robust,block-parallel scheme for sparse linear systems.SISC 27 (3), 2005, pages 1092�1117.
Top Related