A robust multilevel approximate inverse preconditioner for...
Transcript of A robust multilevel approximate inverse preconditioner for...
MFLR preconditioner
February 16th, 2017 1MFLR preconditioner
PhD SCHOOL – CIVIL ANDENVIRONMENTAL ENGINEERING SCIENCES
XXX CYCLE
A robust multilevel approximate inverse preconditionerfor symmetric positive definite matrices
ANDREA FRANCESCHINI, VICTOR MAGRI, MASSIMILIANO FERRONATO AND CARLO JANNA
February 16th, 2017
DICEA – DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL
ENGINEERING
MFLR preconditioner
February 16th, 2017 2MFLR preconditioner
Algebraic preconditioner
Our main goal is solving real world problems arising from the discretization of PDE
in various fields of application
Algebraic preconditioners: robust tools which can be used knowing the coefficient
matrix only, regardless of the specific problem addressed
Incomplete factorizations, algebraic multigrid, sparse approximate inverses (FSAI)
Introduction
MFLR preconditioner
February 16th, 2017 3MFLR preconditioner
Factorized Sparse Approximate Inverse (FSAI): an almost perfectly parallel factoredpreconditioner [Kolotilina and Yeremin, 1993]
1 TA F F− =with F a lower triangular matrix such that:
minF
I FL− →
over the set of matrices with a prescribed lower triangular sparsity pattern SL where L isthe exact Cholesky factor of A L is not actually required for computing F!
Computed via the solution of n independent small dense systems and applied via matrix-vector products
Nice features: (1) ideally perfect parallel construction and application of the preconditioner; (2) preservation of the positive definiteness of the native matrix
FSAI preconditioner for SPD linear systems
MFLR preconditioner
February 16th, 2017 4MFLR preconditioner
One of the main difficulties stems from the selection of SL as an a priori sparsity pattern for F
Using small powers of A is a popular choice, but for difficult problems high powers may be needed and the preconditioner construction can become quite heavy
A most efficient option relies on selecting the pattern dynamically by an adaptiveprocedure which uses somewhat the “best” available positions for the non-zero coefficients
The procedure is based on the minimization of the Kaporin conditioning number k of an SPD matrix which is defined as:
It can be shown that this procedure, though expensive, has some optimality propertiesand gives satisfactorily results for SPD systems
( ) ( )( ) nAnAA 1det
tr=κ
MFLR preconditioner
The drawback of static FSAI is the choice of the sparsity pattern. The cost increases with the third power of the preconditioner density.
The adaptive FSAI overcomes the choice of the pattern, but the cost increases with the fourth power of the preconditioner density.
FSAI drawbacks
MFLR preconditioner
February 16th, 2017 6MFLR preconditioner
Basic concepts To improve the quality of the FSAI preconditioner we borrow the scheme of the
incomplete factorization… and … To reduce the cost of the dynamic selection of the pattern (the set where we look for best
coefficients) in the adaptive FSAI algorithm, we developed two new approaches: Block tridiagonal FSAI Domain Decomposition FSAI: a particular case of the former, with n = 2. Blocks are
defined as internal unknowns (1st block) and interface unknowns (2nd block) after aDomain Decomposition reordering
Multilevel approaches for FSAI preconditioning
MFLR preconditioner
February 16th, 2017 7MFLR preconditioner
1 1
1 2
2
2
1 1
1
nTn n
n
T
nT
− −−
−
=
A BB A B
AB A B
B A
11 1 1 1 1
1 11 1 2 2 2 2 2
11 1
T
Tn n n n n
−
− −
−− −
=
I S I S BB S I S I S B
A
B S I S I
Factorization in the LDLT form (Cholesky-like)
11 1 1, 2T
i i i i i i n−− − −= − = …S A B S B
KEY ROLEComputation of the Schur complement
Block tridiagonal FSAI
11 11
Ti ii
−− − −≈ FS F
The inverse of S is factorized withthe adaptive FSAI, implemented inFSAIPACK [Janna et al., 2015]
MFLR preconditioner
February 16th, 2017 8MFLR preconditioner
Domain Decomposition FSAI
Following the previous idea, we split the matrix into 2x2 block structure, where the two blocks have different size. Use of domain decomposition technique. Note that this is a particular case of the Block Tridiagonal with 2 blocks.
In this case, we need to compute just one Schur complement.
( ) ( )( ) ( )
( ) ( )( ) ( ) ( )
1 11 1
1 12 21 1
1 21 1
1 1 1 21 2
T
n nT T T
n
= = …
A BA B
A BA
B A A B
B B B A
Internal unknowns Interface unknowns
MFLR preconditioner
February 16th, 2017 9MFLR preconditioner
Parallel phases
S can be indefinite BREAKDOWN
Sequential framework
Algorithm to compute and to apply the preconditioner
Sequential framework
Parallel phases
Parallel phases
MFLR preconditioner
February 16th, 2017 10MFLR preconditioner
Results
Solver: CGExit tolerance: 10-8
CPU: 1 core
Iterations CPU time
MFLR preconditioner
February 16th, 2017 11MFLR preconditioner
Basic concepts The previous implementation of the multilevel is
very prone to breakdowns: Schur complementapproximation is the difference of two SPDmatrices and can be indefinite. This happens alsoin relatively well-conditioned problems.
The reason for this probably resides in the betterapproximation of the lower eigenvalues usuallyoffered by ILU over FSAI.
These issues motivate the development of amultilevel preconditioner which is more connectedto the FSAI methodology.
Robust Multilevel approaches for FSAI preconditioning
Comparison between the spectra of A, LLT
and G-1 G-T for the bcsstk16 matrix from the University of Florida sparse matrix collection
MFLR preconditioner
February 16th, 2017 12MFLR preconditioner
T T
T T TT T
=
G K B I GBG GKG GB0 I B C B G C0 I B G
0 0C
( ) ( ) 1T T −−F GB GKG
T
=
K BA
B C
( )( )
T TT
T T T T T
T
T
T
T T T
+ = + + + +
GKG G KG BI IGKG GB II B G C 0 I GK B G C GB
F0 0FF 0 SF F F FB KG FG G
( ) 1T −G G K
Schur complement computation
The matrix is subdivided into 4 blocks
The adaptive FSAI matrix G of the (1,1) block K is computed, so the first phase ofpreconditioning is done.
The adaptive block FSAI matrix F to decouple diagonal blocks is computed. With this alsothe second phase of preconditioning is done and the Schur complement is defined.
Approximate Schur complement that becomes new matrix A (multilevel)
MFLR preconditioner
February 16th, 2017 13MFLR preconditioner
Parallel phases
S is SPDNO BREAKDOWN
Sequential framework
Algorithm to compute the MFLR preconditionerMultilevel FSAI preconditioner with Low-Rank corrections
MFLR preconditioner
February 16th, 2017 14MFLR preconditioner
Main Schur complement property
The Schur complement is an SPD matrix, indeed, with some algebra:1T T TT T T T−+= ++ += −C GB B G GKS F F G B K BF F W WC K
1 T T−= +W K B G Fwhere: Real Schur complement (SPD)
Spectrum of matrix for the bcsstk38 matrix from the University of Florida sparse matrix collection
−S S
MFLR preconditioner
February 16th, 2017 15MFLR preconditioner
Low-Rank corrections
To improve the precondition quality, two different low rank corrections are developed: Descending Low-Rank: locally improvement of adaptive FSAI of (1,1) block Ascending Low-Rank: global improvement of adaptive FSAI of Schur complement [Xi et al.,
2016]
Both ideas are based on follows observations: An approximation G for the matrix A, such that , is available We want to improve A-1 as Matrices X and H are linked by the simple relation: To reduce ill-conditioning, eigenvalues of A that we want to correct are ones close to 0,
that are eigenvalues of H close to 1. For these values, eigenvalues of X tends to infinity Due to this fact, just a low-rank representation of matrix H is needed. Moreover,
computation of eigenvalues of matrix X (the correction) is straightforward.
T− = →I GAG H 01 T T− = +A G G G XG
[ ] 1−= −X H I H
MFLR preconditioner
February 16th, 2017 16MFLR preconditioner
T=H UΣU
( ) 1 T−= −
U ΣX I Σ U
Low-Rank corrections
H matrix is symmetric, so we can write its eigen-decomposition (where S is diagonal):
X matrix is linked to H, so we have:
For each eigenvalue of H, eigenvalue of X is:
1H
XH
σσσ
=−
1H
XH
σσσ
=−
Eigenvalues of correction on the inverse (X matrix) with respect to the eigenvalues
of the direct correction (H matrix)
A small number of eigenvalues gives rise to a big correction on the approximate inverse. Moreover, it can be shown that the Low-Rank corrections do not propagate along the levels.
MFLR preconditioner
February 16th, 2017 17MFLR preconditioner
Eigenvalue bound with approximate Schur complement as target:
Eigenvalue bound with real Schur complement as target:
Low-Rank target
Target of Ascending Low Rank is to make the available factorized inverse more similar to the inverse of one of two matrices: approximate Schur complement real Schur complement
T T TT T++= +C GB B G GKS F F GF F
1T −= −C BS K B
It can be shown that bound (1) is narrower or equal to the bound (2).
(1)
(2)
MFLR preconditioner
February 16th, 2017 18MFLR preconditioner
Descending Low Rank correction
Ascending Low Rank correction
Algorithm to compute the MFLR preconditioner(with Low Rank Updates)
MFLR preconditioner
February 16th, 2017 19MFLR preconditioner
Algorithm to applythe MFLR preconditioner(with Low Rank Updates)
Ascending Low Rank correction
Descending Low Rank correction
Descending Low Rank correction
MFLR preconditioner
February 16th, 2017 20MFLR preconditioner
Results: test case
Results for Cube matrix. All parameters are fixed, expect the number of levels
Increasing the number of levels, number ofiterations needed to reach convergence alwaysdecreases, but iteration time increases
With Low-Rank corrections we can reduce boththe number of iterations and the iterationtime. In particular, the most promisingtechnique is the Ascending Low-Rank
Results for Cube matrix. All parameters are fixed, expect the Descending Low-Rank size
Results for Cube matrix. All parameters are fixed, expect the Ascending Low-Rank size
We extensively tested Cube matrix, that has 190,581 rows and 7,531,389 non-zeroes
MFLR preconditioner
February 16th, 2017 21MFLR preconditioner
Results: test case
Results for Cube matrix. Descending Low-Rank and Ascending Low-Rank are combined together
Combining together Descending and Ascending Low-Rank we reach the best result
MFLR preconditioner
February 16th, 2017 22MFLR preconditioner
Results: real world problemsWe tested some matrices from the University of Florida Sparse Matrix Collection.
Test matrices (from UF Sparse Matrix Collection)
Computational performance of the adaptive FSAI and MFLR preconditioners for the test matrices
(different configurations)
MFLR preconditioner
February 16th, 2017 23MFLR preconditioner
Results: strong scalabilityWith a Laplacian 300 x 300 x 300 we tested strong scalability
Strong scalability test for a regular 3003 Laplacian
We used MARCONI cluster at theCINECA Center of High PerformanceComputing, Bologna. It has 1,512nodes. Every node has 128 Gbyte ofRAM memory and is equipped with 2Intel Xeon E5-2697v4 Broadwellprocessors at 2.3 GHz with 36 cores.
MFLR preconditioner
February 16th, 2017 24MFLR preconditioner
Conclusions The use of FSAI in the framework of multilevel preconditioner may arise some difficulties,
as indefinite Schur complements. With the MFLR preconditioner, Schur complements are ensured to be positive definite.
The multilevel FSAI preconditioner is further enhanced by introducing Low-Rank corrections at both a local and a global level, namely Descending and Ascending Low-Rank corrections, respectively.
Some theoretical properties and bounds on eigenvalues can be computed for the case with just two levels.
Further research To explore various degrees of sparsity in matrix products to reduce the cost of
preconditioner set-up (matrix sparsification, matrix compression, etc.).
MFLR preconditioner
February 16th, 2017 25MFLR preconditioner
Thank you for your attention
DICEA – DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL
ENGINEERING