ParILUT – A New Parallel Threshold ILUhanzt/talks/ParILUT.pdf · 2 05/07/2018 Hartwig Anzt:...
Transcript of ParILUT – A New Parallel Threshold ILUhanzt/talks/ParILUT.pdf · 2 05/07/2018 Hartwig Anzt:...
KIT – The Research University in the Helmholtz Association
Hartwig AnztSteinbuch Centre for Computing (SCC)
www.kit.edu
ParILUT – ANewParallel Threshold ILU05/07/2018Kolloquiumsvortrag inderFakultätfürInformatik
Steinbuch Centre for Computing2 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Motivation
Wearelookingforafactorization-basedpreconditionersuchthat .isagoodapproximationwithmoderatenonzerocount(e.g.).
• Whereshouldthesenonzeroelementsbelocated?• Howcanwecomputethepreconditioner inahighlyparallelfashion?
A ⇡ L · Unnz(L+ U) = nnz(A)
S(A) = {(i, j) 2 N2 : Aij 6= 0}
Steinbuch Centre for Computing3 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
ExactLUFactorization
• Decomposesystemmatrixintoproduct .• BasedonGaussianelimination.• Triangularsolvestosolveasystem:
• De-Factostandardforsolvingdenseproblems.• Whataboutsparse?Oftensignificantfill-in…
A = L · U
Ax = b
IncompleteLUFactorization(ILU)
• Focusedonrestrictingfill-intoaspecificsparsitypattern.
• ForILU(0),isthesparsitypatternof.• Workswellformanyproblems.• Isthisthebestwecangetfornonzerocount?
• Fill-ininthresholdILU(ILUT)basesonthesignificanceofelements(e.g.magnitude).
• Oftenbetterpreconditionersthanlevel-basedILU.
• Difficulttoparallelize.
S
S
A
S
A ⇡ L · Unnz(L+ U) = nnz(A)
Wearelookingforafactorization-basedpreconditionersuchthat .isagoodapproximationwithmoderatenonzerocount(e.g.).
• Whereshouldthesenonzeroelementsbelocated?• Howcanwecomputethepreconditioner inahighlyparallelfashion?
Motivation
Ly = b ) y ) Ux = y ) x
Steinbuch Centre for Computing4 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
ExactLUFactorization
• Decomposesystemmatrixintoproduct .• BasedonGaussianelimination.• Triangularsolvestosolveasystem:
• De-Factostandardforsolvingdenseproblems.• Whataboutsparse?Oftensignificantfill-in…
A = L · U
Ax = b
IncompleteLUFactorization(ILU)
• Focusedonrestrictingfill-intoaspecificsparsitypattern.
• ForILU(0),isthesparsitypatternof.• Workswellformanyproblems.• Isthisthebestwecangetfornonzerocount?
• Fill-ininthresholdILU(ILUT)basesonthesignificanceofelements(e.g.magnitude).
• Oftenbetterpreconditionersthanlevel-basedILU.
• Difficulttoparallelize.
S
S
A
S
A ⇡ L · Unnz(L+ U) = nnz(A)
Wearelookingforafactorization-basedpreconditionersuchthat .isagoodapproximationwithmoderatenonzerocount(e.g.).
• Whereshouldthesenonzeroelementsbelocated?• Howcanwecomputethepreconditioner inahighlyparallelfashion?
Motivation
Ly = b ) y ) Ux = y ) x
Steinbuch Centre for Computing5 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
ExactLUFactorization
• Decomposesystemmatrixintoproduct .• BasedonGaussianelimination.• Triangularsolvestosolveasystem:
• De-Factostandardforsolvingdenseproblems.• Whataboutsparse?Oftensignificantfill-in…
A = L · U
Ax = b
IncompleteLUFactorization(ILU)
• Focusedonrestrictingfill-intoaspecificsparsitypattern.
• ForILU(0),isthesparsitypatternof.• Workswellformanyproblems.• Isthisthebestwecangetfornonzerocount?
• Fill-ininthresholdILU(ILUT)basesonthesignificanceofelements(e.g.magnitude).
• Oftenbetterpreconditionersthanlevel-basedILU.
• Difficulttoparallelize.
S
S
A
S
A ⇡ L · Unnz(L+ U) = nnz(A)
Wearelookingforafactorization-basedpreconditionersuchthat .isagoodapproximationwithmoderatenonzerocount(e.g.).
• Whereshouldthesenonzeroelementsbelocated?• Howcanwecomputethepreconditioner inahighlyparallelfashion?
Motivation
Ly = b ) y ) Ux = y ) x
L 2 Rn⇥n
U 2 Rn⇥n
Lij = Uij = 0 8(i, j) /2 S
lower(unit-)triangular,sparse.
uppertriangular,sparse.
R = L� U, Rij = 0 8(i, j) 2 S.
.
Steinbuch Centre for Computing6 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
ExactLUFactorization
• Decomposesystemmatrixintoproduct .• BasedonGaussianelimination.• Triangularsolvestosolveasystem:
• De-Factostandardforsolvingdenseproblems.• Whataboutsparse?Oftensignificantfill-in…
A = L · U
Ax = b
IncompleteLUFactorization(ILU)
• Focusedonrestrictingfill-intoaspecificsparsitypattern.
• ForILU(0),isthesparsitypatternof.• Workswellformanyproblems.• Isthisthebestwecangetfornonzerocount?
• Fill-ininthresholdILU(ILUT)basesonthesignificanceofelements(e.g.magnitude).
• Oftenbetterpreconditionersthanlevel-basedILU.
• Difficulttoparallelize.
S
S
A
S
A ⇡ L · Unnz(L+ U) = nnz(A)
Wearelookingforafactorization-basedpreconditionersuchthat .isagoodapproximationwithmoderatenonzerocount(e.g.).
• Whereshouldthesenonzeroelementsbelocated?• Howcanwecomputethepreconditioner inahighlyparallelfashion?
Motivation
Ly = b ) y ) Ux = y ) x
Steinbuch Centre for Computing7 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
ExactLUFactorization
• Decomposesystemmatrixintoproduct .• BasedonGaussianelimination.• Triangularsolvestosolveasystem:
• De-Factostandardforsolvingdenseproblems.• Whataboutsparse?Oftensignificantfill-in…
A = L · U
Ax = b
IncompleteLUFactorization(ILU)
• Focusedonrestrictingfill-intoaspecificsparsitypattern.
• ForILU(0),isthesparsitypatternof.• Workswellformanyproblems.• Isthisthebestwecangetfornonzerocount?
• Fill-ininthresholdILU(ILUT)basesonthesignificanceofelements(e.g.magnitude).
• Oftenbetterpreconditionersthanlevel-basedILU.
• Difficulttoparallelize.
S
S
A
S
A ⇡ L · Unnz(L+ U) = nnz(A)
Wearelookingforafactorization-basedpreconditionersuchthat .isagoodapproximationwithmoderatenonzerocount(e.g.).
• Whereshouldthesenonzeroelementsbelocated?• Howcanwecomputethepreconditioner inahighlyparallelfashion?
Motivation
Ly = b ) y ) Ux = y ) x
Steinbuch Centre for Computing8 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Motivation
Wearelookingforafactorization-basedpreconditionersuchthat .isagoodapproximationwithmoderatenonzerocount(e.g.).
• Whereshouldthesenonzeroelementsbelocated?• Howcanwecomputethepreconditioner inahighlyparallelfashion?
Rethinktheoverallstrategy!
• Useaparalleliterativeprocesstogeneratefactors.
• Thepreconditionershouldhaveamoderatenumberofnonzeroelements,butwedon’tcaretoomuchabout intermediatedata.
1. Selectasetofnonzero locations.2. Computevaluesinthose locationssuchthat isa“good”approximation.
3. Maybechangesomelocationsinfavoroflocationsthatresult inabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.
A ⇡ L · U
A ⇡ L · Unnz(L+ U) = nnz(A)
Steinbuch Centre for Computing9 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Considerations
• Thisisanoptimizationproblem…
1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat
isa“good” approximation.3. Maybechangesomelocationsinfavorof
locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.
A ⇡ L · U
0
BBBBBB@
? ? ? ? ?? ? ? ? ?? ? ?? ?
? ? ?? ? ? ?
1
CCCCCCA�
0
BBBBBB@
?? ?? ? ?? ?
? ?? ? ? ?
1
CCCCCCA⇥
0
BBBBBB@
? ? ? ? ?? ? ? ?
??
? ??
1
CCCCCCA
AR = L U⇥ILUresidual -
Steinbuch Centre for Computing10 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Considerations
• Thisisanoptimizationproblem…
1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat
isa“good” approximation.3. Maybechangesomelocationsinfavorof
locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.
A ⇡ L · U
0
BBBBBB@
? ? ? ? ?? ? ? ? ?? ? ?? ?
? ? ?? ? ? ?
1
CCCCCCA�
0
BBBBBB@
?? ?? ? ?? ?
? ?? ? ? ?
1
CCCCCCA⇥
0
BBBBBB@
? ? ? ? ?? ? ? ?
??
? ??
1
CCCCCCA
0
BBBBBB@
? ? ? ? ?? ? ? ? ? ?? ? ?? ?
? ? ?? ? ? ?
1
CCCCCCA
Steinbuch Centre for Computing11 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Considerations
• Thisisanoptimizationproblem…
1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat
isa“good” approximation.3. Maybechangesomelocationsinfavorof
locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.
A ⇡ L · U
0
BBBBBB@
? ? ? ? ?? ? ? ? ?? ? ?? ?
? ? ?? ? ? ?
1
CCCCCCA�
0
BBBBBB@
?? ?? ? ?? ?
? ?? ? ? ?
1
CCCCCCA⇥
0
BBBBBB@
? ? ? ? ?? ? ? ?
??
? ??
1
CCCCCCA
0
BBBBBB@
? ? ? ? ?? ? ? ? ? ?? ? ? ?? ?
? ? ?? ? ? ?
1
CCCCCCA
Steinbuch Centre for Computing12 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Considerations
• Thisisanoptimizationproblem…
1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat
isa“good” approximation.3. Maybechangesomelocationsinfavorof
locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.
A ⇡ L · U
0
BBBBBB@
? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ?
? ? ? ?? ? ? ? ? ?
1
CCCCCCA
0
BBBBBB@
? ? ? ? ?? ? ? ? ?? ? ?? ?
? ? ?? ? ? ?
1
CCCCCCA�
0
BBBBBB@
?? ?? ? ?? ?
? ?? ? ? ?
1
CCCCCCA⇥
0
BBBBBB@
? ? ? ? ?? ? ? ?
??
? ??
1
CCCCCCA
Steinbuch Centre for Computing13 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Considerations
• Thisisanoptimizationproblem…
1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat
isa“good” approximation.3. Maybechangesomelocationsinfavorof
locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.
A ⇡ L · U
0
BBBBBB@
? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ?
? ? ? ?? ? ? ? ? ?
1
CCCCCCA
0
BBBBBB@
? ? ? ? ?? ? ? ? ?? ? ?? ?
? ? ?? ? ? ?
1
CCCCCCA�
0
BBBBBB@
?? ?? ? ?? ?
? ?? ? ? ?
1
CCCCCCA⇥
0
BBBBBB@
? ? ? ? ?? ? ? ?
??
? ??
1
CCCCCCA
0
BBBBBB@
? ? ? ? ?? ? ? ? ?? ? ?? ?
? ? ?? ? ? ?
1
CCCCCCA�
0
BBBBBB@
? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ?
? ? ? ?? ? ? ? ? ?
1
CCCCCCA=
Residual:
Steinbuch Centre for Computing14 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Considerations1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat
isa“good” approximation.3. Maybechangesomelocationsinfavorof
locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.
A ⇡ L · U
nnz(A� L · U)nnz(L+ U)
• Thisisanoptimizationproblemwithequationsandvariables.
0
BBBBBB@
? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ?
? ? ? ?? ? ? ? ? ?
1
CCCCCCA=
0
BBBBBB@
? ? ? ? ?? ? ? ? ?? ? ?? ?
? ? ?? ? ? ?
1
CCCCCCA�
0
BBBBBB@
?? ?? ? ?? ?
? ?? ? ? ?
1
CCCCCCA⇥
0
BBBBBB@
? ? ? ? ?? ? ? ?
??
? ??
1
CCCCCCA
0
BBBBBB@
? ? ? ? ?? ? ? ? ?? ? ?? ?
? ? ?? ? ? ?
1
CCCCCCA�
0
BBBBBB@
?? ?? ? ?? ?
? ?? ? ? ?
1
CCCCCCA⇥
0
BBBBBB@
? ? ? ? ?? ? ? ?
??
? ??
1
CCCCCCA
SSparsitypattern
Steinbuch Centre for Computing15 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Considerations1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat
isa“good” approximation.3. Maybechangesomelocationsinfavorof
locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.
A ⇡ L · U
• Thisisanoptimizationproblemwithequationsandvariables.
• Wemaywanttocomputethevaluesinsuchthat,theapproximationbeingexactinthe locationsincludedin,butnotoutside!
L,U
=
0
BBBBBB@
? ? ? ? ?? ? ? ? ?? ? ?? ?
? ? ?? ? ? ?
1
CCCCCCA�
0
BBBBBB@
?? ?? ? ?? ?
? ?? ? ? ?
1
CCCCCCA⇥
0
BBBBBB@
? ? ? ? ?? ? ? ?
??
? ??
1
CCCCCCA
R = A� L · U = 0|S
0
BBBBBB@
? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ?
? ? ? ?? ? ? ? ? ?
1
CCCCCCA
nnz(L+ U)equationsvariablesnnz(L+ U)
S
nnz(A� L · U)nnz(L+ U)
SSparsitypattern
Steinbuch Centre for Computing16 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Considerations1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat
isa“good” approximation.3. Maybechangesomelocationsinfavorof
locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.
A ⇡ L · U
• Thisisanoptimizationproblemwithequationsandvariables.
• Wemaywanttocomputethevaluesinsuchthat,theapproximationbeingexactinthe locationsincludedin,butnotoutside!
• ThisistheunderlyingideaofEdmondChow’sparallelILUalgorithm1:
L,U
1ChowandPatel.“Fine-grainedParallelIncompleteLUFactorization”. In:SIAMJ.onSci.Comp.(2015).
R = A� L · U = 0|SS
nnz(A� L · U)nnz(L+ U)
L · U = A|S ) F (lij , uij) =
(1
ujj
⇣aij �
Pj�1k=1 likukj
⌘, i > j
aij �Pi�1
k=1 likukj , i j
Steinbuch Centre for Computing17 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Considerations1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat
isa“good” approximation.3. Maybechangesomelocationsinfavorof
locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.
A ⇡ L · U
• Thisisanoptimizationproblemwithequationsandvariables.
• Wemaywanttocomputethevaluesinsuchthat,theapproximationbeingexactinthe locationsincludedin,butnotoutside!
• ThisistheunderlyingideaofEdmondChow’sparallelILUalgorithm1:
• Convergesintheasymptoticsensetowards incompletefactorssuchthat
L,U
1ChowandPatel.“Fine-grainedParallelIncompleteLUFactorization”. In:SIAMJ.onSci.Comp.(2015).
R = A� L · U = 0|SS
nnz(A� L · U)nnz(L+ U)
R = A� L · U = 0|SL,U
L · U = A|S ) F (lij , uij) =
(1
ujj
⇣aij �
Pj�1k=1 likukj
⌘, i > j
aij �Pi�1
k=1 likukj , i j
Steinbuch Centre for Computing18 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Considerations1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat
isa“good” approximation.3. Maybechangesomelocationsinfavorof
locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.
A ⇡ L · U
• Thisisanoptimizationproblemwithequationsandvariables.
• Wemaywanttocomputethevaluesinsuchthat,theapproximationbeingexactinthe locationsincludedin,butnotoutside!
• ThisistheunderlyingideaofEdmondChow’sparallelILUalgorithm1:
• Wemaynotneedhighaccuracyhere,becausewemaychangethepatternagain…
• Onesinglefixed-pointsweep.
L,U R = A� L · U = 0|SS
nnz(A� L · U)nnz(L+ U)
1ChowandPatel.“Fine-grainedParallelIncompleteLUFactorization”. In:SIAMJ.onSci.Comp.(2015).
L · U = A|S ) F (lij , uij) =
(1
ujj
⇣aij �
Pj�1k=1 likukj
⌘, i > j
aij �Pi�1
k=1 likukj , i j
Steinbuch Centre for Computing19 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Considerations1. Selectasetofnonzero locations.2. Computevaluesinthose locationssuchthat
isa“good”approximation.3. Maybechangesomelocationsinfavorof
locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.
A ⇡ L · U
• Comparingsparsitypatternsextremelydifficult.• MaybeusetheILUresidualasconvergencecheck.
=
0
BBBBBB@
? ? ? ? ?? ? ? ? ?? ? ?? ?
? ? ?? ? ? ?
1
CCCCCCA�
0
BBBBBB@
?? ?? ? ?? ?
? ?? ? ? ?
1
CCCCCCA⇥
0
BBBBBB@
? ? ? ? ?? ? ? ?
??
? ??
1
CCCCCCA
0
BBBBBB@
? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ?
? ? ? ?? ? ? ? ? ?
1
CCCCCCA
Steinbuch Centre for Computing20 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Considerations
A ⇡ L · U
1. Selectasetofnonzero locations.2. Computevaluesinthose locationssuchthat
isa“good”approximation.3. Maybechangesomelocationsinfavorof
locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.
A• Thesparsitypatternofmightbeagoodinitialstartfornonzero locations.
Steinbuch Centre for Computing21 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Considerations
A ⇡ L · U
1. Selectasetofnonzero locations.2. Computevaluesinthose locationssuchthat
isa“good”approximation.3. Maybechangesomelocations infavorof
locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.
A
1Saad.“IterativeMethodsforSparseLinearSystems,2nd Edition”. (2003).
• Thesparsitypatternofmightbeagoodinitialstartfornonzero locations.• Then,theapproximationwillbeexactforalllocations
andnonzero inlocations1.S1 = (S(A) [ S(L0 · U0)) \ S(L0 + U0)S0 = S(L0 + U0)
0
BBBBBB@
? ? ? ? ?? ? ? ? ?? ? ?? ?
? ? ?? ? ? ?
1
CCCCCCA�
0
BBBBBB@
?? ?? ? ?? ?
? ?? ? ? ?
1
CCCCCCA⇥
0
BBBBBB@
? ? ? ? ?? ? ? ?
??
? ??
1
CCCCCCA=
0
BBBBBB@
? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ?
? ? ? ?? ? ? ? ? ?
1
CCCCCCA
Steinbuch Centre for Computing22 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Considerations
0
BBBBBB@
? ? ? ? ?? ? ? ? ?? ? ?? ?
? ? ?? ? ? ?
1
CCCCCCA�
0
BBBBBB@
?? ?? ? ?? ? ? ?
? ? ?? ? ? ? ? ?
1
CCCCCCA⇥
0
BBBBBB@
? ? ? ? ?? ? ? ? ?
? ? ? ?? ?
? ??
1
CCCCCCA=
0
BBBBBB@
? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?
1
CCCCCCA
A ⇡ L · U
1. Selectasetofnonzero locations.2. Computevaluesinthose locationssuchthat
isa“good”approximation.3. Maybechangesomelocations infavorof
locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.
A
S1 = (S(A) [ S(L0 · U0)) \ S(L0 + U0)
• Thesparsitypatternofmightbeagoodinitialstartfornonzero locations.• Then,theapproximationwillbeexactforalllocations
andnonzero inlocations1.
• Addingalltheselocations(level-fill!)mightbegoodidea…
S0 = S(L0 + U0)
Steinbuch Centre for Computing23 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Considerations
A ⇡ L · U
A
S1 = (S(A) [ S(L0 · U0)) \ S(L0 + U0)
• Thesparsitypatternofmightbeagoodinitialstartfornonzero locations.• Then,theapproximationwillbeexactforalllocations
andnonzero inlocations1.
• Addingalltheselocations(level-fill!)mightbegoodidea, butaddingthesewillagaingeneratenewnonzeroresiduals
S0 = S(L0 + U0)
S2 = (S(A) [ S(L1 · U1)) \ S(L1 + U1)
0
BBBBBB@
? ? ? ? ?? ? ? ? ?? ? ?? ?
? ? ?? ? ? ?
1
CCCCCCA�
0
BBBBBB@
?? ?? ? ?? ? ? ?
? ? ?? ? ? ? ? ?
1
CCCCCCA⇥
0
BBBBBB@
? ? ? ? ?? ? ? ? ?
? ? ? ?? ?
? ??
1
CCCCCCA=
0
BBBBBB@
? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?
1
CCCCCCA
1. Selectasetofnonzero locations.2. Computevaluesinthose locationssuchthat
isa“good”approximation.3. Maybechangesomelocations infavorof
locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.
Steinbuch Centre for Computing24 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Considerations
A ⇡ L · U
1. Selectasetofnonzero locations.2. Computevaluesinthose locationssuchthat
isa“good”approximation.3. Maybechangesomelocations infavorof
locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.
• Atsomepointweshouldremovesomelocationsagain,e.g.thesmallestelements,andstartoverlookingatlocations…R = A� Lk · Uk
Steinbuch Centre for Computing25 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Interleavingfixed-pointsweepsapproximatingvalueswithpattern-changingsymbolicroutines.
ParILUT
Steinbuch Centre for Computing26 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
ParILUT :Parallelisminsidetheblocks
Interleavingfixed-pointsweepsapproximatingvalueswithpattern-changingsymbolicroutines.Parallelisminsidethebuildingblocks.
Steinbuch Centre for Computing27 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Parallelisminsidetheblocks:Fixed-pointsweeps
Fixed-pointsweepsapproximatevaluesinILUfactorsandresidual1:
• Inherentlyparalleloperation.• Elementscanbeupdatedasynchronously.
• Wecanexpect100%parallelefficiencyifnumberofcores<numberofelements
• Residualnormisaglobalreduction.
1ChowandPatel.“Fine-grainedParallelIncompleteLUFactorization”. In:SIAMJ.onSci.Comp.(2015).
bilinearfixed-pointiterationcanbeparallelizedbyelements
F (lij , uij) =
(1
ujj
⇣aij �
Pj�1k=1 likukj
⌘, i > j
aij �Pi�1
k=1 likukj , i j
Steinbuch Centre for Computing28 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Parallelisminsidetheblocks:Candidatesearch
S⇤ = (S(A) [ S(L · U)) \ S(L+ U)
sparsematrixproduct
sparsematrixsum
• Combinationofsparsematrixproductandsparsematrixsums.
• BuildingblocksavailableinSparseBLAS.• Blockscanbecombinedintoonekernelfor
higher(memory)efficiency.• Kernelcanbeparallelizedbyrows.
• Costheavilydependentonsparsitypattern.
• Kernelperformanceboundbymemorybandwidth.
sparsematrixsum
Identifylocationsthataresymbolicallynonzero:
Steinbuch Centre for Computing29 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Parallelisminsidetheblocks:Selectingthresholds
Athresholdseparatingthesmallestelementsisneededforremovinginsignificantlocationsandkeepingsparsity.
• Standardapproach:sort/selectionalgorithms• Highcomputationalcost• Memory-intensive• Hardtoparallelize
• Thresholdsdonotneedtobeexact:• Inaccuratethresholdsresultinafewadditional/lesselements.• Wecanusesamplingtogetreasonableapproximations.• Multiplesampling-basedselectionrunsallow togenerate
thresholdsofreasonablequalityinparallel.
• Isthisappropriateformany-corearchitectureswith5Kthreadsexecutingsimultaneously?
Steinbuch Centre for Computing30 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Thresholdselectiononparallelarchitectures
• SelectionSort (comparisons,elementswaps)• QuickSelect (averagecomparisons,elementswaps)• Floyd-Rivest algorithm()• IntroSelect (worstcasecomparisons,elementswaps)
JohnD.McCaplin (TACC)
Selectionalgorithmstraditionallybasedonre-arrangingelementsinmemory.
O(n)O(n2)O(n) O(n)
O(n) O(n)n+min(k, n� k) +O(
pn)
• Computepower(#FLOPs)growsmuchfasterthanmemorybandwidth.
• Data-rearrangingselectionalgorithmsbecomeinefficient.
”Operations arefree,memoryaccessiswhatcounts.”
64bitread,1FLOP
Steinbuch Centre for Computing31 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Thresholdselectiononparallelarchitectures:StreamSelect
Rethinktheoverallstrategy!
• Primarygoal:reducethememorytraffic.• Accountforhighcorecounts,eachhavingasetofregisters.• Assume“nice”distributionofvalues(~uniform).• Acceptsomeinaccuracyinthegeneratedthreshold(approximation).
1. Findthelargestandsmallestelementstogetthedatarange.
2. Generateafinegridofthresholds,distribute themtothecores.3. Streamalldataonesingletime.4. Eachcorehandlesasetofthresholds andcountshowmany
elements arelarger/smaller.5. Selectthethresholdwiththeelementcountclosesttothe
targetvalue.
NVIDIAV100“Volta”7.8TFLOP/sDP16GBRAM@900GB/s
80Multiprocessors,eachwith64FP32cores[oversubscribewiththreadstohidelatency]
Steinbuch Centre for Computing32 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Thresholdselectiononparallelarchitectures:StreamSelect
Rethinktheoverallstrategy!
• Primarygoal:reducethememorytraffic.• Accountforhighcorecounts,eachhavingasetofregisters.• Assume“nice”distributionofvalues(~uniform).• Acceptsomeinaccuracyinthegeneratedthreshold(approximation).
1. Findthelargestandsmallestelementstogetthedatarange.
2. Generateafinegridofthresholds,distribute themtothecores.3. Streamalldataonesingletime.4. Eachcorehandlesasetofthresholds andcountshowmany
elements arelarger/smaller.5. Selectthethresholdwiththeelementcountclosesttothe
targetvalue.
NVIDIAV100“Volta”7.8TFLOP/sDP16GBRAM@900GB/s
80Multiprocessors,eachwith64FP32cores[oversubscribewiththreadstohidelatency]
Run5120threads(bindtocores)eachafewthresholds:
32 thresholds gives mesh granularity of 6.1035e-06
0 5 10 15 20 25 30 35
Thresholds per thread
0
10
20
30
40
Ru
ntim
e
LinearStreamSelect
Steinbuch Centre for Computing33 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
0 5 10 15 20 25 30 35
Thresholds per thread
0
10
20
30
40
Ru
ntim
e
LinearStreamSelect
Thresholdselectiononparallelarchitectures:StreamSelect
Rethinktheoverallstrategy!
• Primarygoal:reducethememorytraffic.• Accountforhighcorecounts,eachhavingasetofregisters.• Assume“nice”distributionofvalues(~uniform).• Acceptsomeinaccuracyinthegeneratedthreshold(approximation).
1. Findthelargestandsmallestelementstogetthedatarange.
2. Generateafinegridofthresholds,distribute themtothecores.3. Streamalldataonesingletime.4. Eachcorehandlesasetofthresholds andcountshowmany
elements arelarger/smaller.5. Selectthethresholdwiththeelementcountclosesttothe
targetvalue.
NVIDIAV100“Volta”7.8TFLOP/sDP16GBRAM@900GB/s
OnaV100GPU:80Multiprocessorseachwith32FP64cores
64FP32cores[oversubscribewiththreadstohidelatency]
Run5120threads(bindtocores)eachafewthresholds:
32 thresholds gives mesh granularity of 6.1035e-06
• Setsizem,subsetsizen*• Foruniformdistribution: quality~meshgranularity.
• Runtimeincreaseslinearwithsetsize.• Runtimeindependentofsubset size.
103
104
105
106
107
108
Set size m
10-4
10-2
100
102
Runtim
e
103 104 105 106 107 108
Subset size n*
10-10
10-5
100
Devi
atio
n
(n-n*) / m
Steinbuch Centre for Computing34 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Isthisafuture-orientedalgorithm?
Interleavingfixed-pointsweepsapproximatingvalueswithpattern-changingsymbolicroutines.Parallelisminsidethebuildingblocks.
Steinbuch Centre for Computing35 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Isthisafuture-orientedalgorithm?
Interleavingfixed-pointsweepsapproximatingvalueswithpattern-changingsymbolicroutines.Parallelisminsidethebuildingblocks.
Bulk-Synchronous Algorithm!
Steinbuch Centre for Computing36 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Isthisafuture-orientedalgorithm?
DependenciesInterleavingfixed-pointsweepsapproximatingvalueswithpattern-changingsymbolicroutines.Parallelisminsidethebuildingblocks.
Bulk-Synchronous Algorithm!
Steinbuch Centre for Computing37 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Isthisafuture-orientedalgorithm?
DependenciesInterleavingfixed-pointsweepsapproximatingvalueswithpattern-changingsymbolicroutines.Parallelisminsidethebuildingblocks.
Bulk-Synchronous Algorithm!
Steinbuch Centre for Computing38 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Isthisafuture-orientedalgorithm?
DependenciesInterleavingfixed-pointsweepsapproximatingvalueswithpattern-changingsymbolicroutines.Parallelisminsidethebuildingblocks.
Bulk-Synchronous Algorithm!
Steinbuch Centre for Computing39 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Isthisafuture-orientedalgorithm?
DependenciesInterleavingfixed-pointsweepsapproximatingvalueswithpattern-changingsymbolicroutines.Parallelisminsidethebuildingblocks.
Bulk-Synchronous Algorithm!
Steinbuch Centre for Computing40 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Strongdependency –wecannotstartbeforefinished.
Isthisafuture-orientedalgorithm?
Steinbuch Centre for Computing41 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Strongdependency –wecannotstartbeforefinished.Weakdependency – ifwestartbefore:+/- fewnonzeros.
Isthisafuture-orientedalgorithm?
Steinbuch Centre for Computing42 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Strongdependency –wecannotstartbeforefinished.Weakdependency – ifwestartbefore:+/- fewnonzeros.Wecanalreadystartwiththenextiteration.
Isthisafuture-orientedalgorithm?
Steinbuch Centre for Computing43 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Excellentcandidateforhybridhardware?Asynchronous execution?
GPU?
Strongdependency –wecannotstartbeforefinished.Weakdependency – ifwestartbefore:+/- fewnonzeros.Wecanalreadystartwiththenextiteration.
Isthisafuture-orientedalgorithm?
Steinbuch Centre for Computing44 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
ParILUT – ANewParallelThreshold ILU
• HybridParILUT versionutilizingGPUandCPU,overlappingcommunication&computation.
• Asynchronous versionrelaxingdependencies.
• Useadifferentsparsity-patterngenerator:• Randomized?• Machinelearningtechniques?
• Increasingfill-intowards“full”factorization.
• ParILUT routinesavailableinMAGMA-sparse– theywillbeinGinkgo!
Helmholtz Impuls und VernetzungsfondVH-NG-1241
Nextsteps:
ThisresearchisincooperationwithEdmondChow(GaTech)andJackDongarra (UniversityofTennessee).
Slidesavailable:
Steinbuch Centre for Computing45 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Scalability
0 10 20 30 40 50 60 70
Number of Threads
0
10
20
30
40
50
60
70
Sp
ee
du
p
CSC CSRCandidatesResidualsILU-normCSR CSCAddSweep1Select2RmRemoveSweep2
10 20 30 40 50 60
Number of Threads
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Runtim
e fra
ctio
n
CSC CSRCandidatesResidualsILU-normSelectAddSweepsRemove
thermal2matrixfromSuiteSparse,RCMordering,8el/row.
IntelXeon Phi 7250“Knights Landing”[email protected],16GBMCDRAM@490GB/s
• Buildingblocksscalewith15%- 100%parallelefficiency.• Transpositionandsortarethebottlenecks.• Overallspeedup~35xwhenusing68KNLcores.
Steinbuch Centre for Computing46 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
IntelXeon Phi 7250“Knights Landing”[email protected],16GBMCDRAM@490GB/s
Scalability
10 20 30 40 50 60
Number of Threads
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Runtim
e fra
ctio
n
CSC CSRCandidatesResidualsILU-normSelectAddSweepsRemove
0 10 20 30 40 50 60 70
Number of Threads
0
10
20
30
40
50
60
70
Sp
ee
du
p
CSC CSRCandidatesResidualsILU-normCSR CSCAddSweep1Select2RmRemoveSweep2
topopt120matrixfromtopologyoptimization,67el/row.
• Buildingblocksscalewith15%- 100%parallelefficiency.• Dominatedbycandidatesearch.• Overallspeedup~52xwhenusing68KNLcores.
Steinbuch Centre for Computing47 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
IntelXeon Phi 7250“Knights Landing”[email protected],16GBMCDRAM@490GB/s
Performance
Matrix Origin Rows Nonzeros Ratio SuperLU ParILUT ParICT
ani7 2DAnisotropicDiffusion 203,841 1,407,811 6.91 10.48s 0.45s 23.34 0.30s 35.16
apache2 SuiteSparseMatrixCollect. 715,176 4,817,870 6.74 62.27s 1.24s 50.22 0.65s 95.37
cage11 SuiteSparseMatrixCollect. 39,082 559,722 14.32 60.89s 0.54s 112.56 --
jacobianMat9 Fun3DFluid FlowProblem 90,708 5,047,042 55.64 153.84s 7.26s 21.19 --
thermal2 ThermalProblem(Suite Sp.) 1,228,045 8,580,313 6.99 91.83s 1.23s 74.66 0.68s 134.25
tmt_sym SuiteSparseMatrixCollect. 726,713 5,080,961 6.97 53.42s 0.70s 76.21 0.41s 131.25
topopt120 Geometry Optimization 132,300 8,802,544 66.53 44.22s 14.40s 3.07 8.24s 5.37
torso2 SuiteSparseMatrixCollect. 115,967 1,033,473 8.91 10.78s 0.27s 39.92 --
venkat01 SuiteSparseMatrixCollect. 62,424 1,717,792 27.52 8.53s 0.74s 11.54 --
Runtimeof5ParILUT /ParICT stepsandspeedup overSuperLU ILUT*.
*WethankSherryLiandMeiyue Shaofortechnicalhelpingeneratingtheperformancenumbers.
Steinbuch Centre for Computing48 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
HowaboutGPUs?
• Fine-grainedparallelism• Highbandwidth forcoalescentreads• Nodeepcachehierarchy
• Weneedtooversubscribecoresforhidinglatency
NVIDIAV100“Volta”7.8TFLOP/sDP16GBRAM@900GB/s
Steinbuch Centre for Computing49 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
HowaboutGPUs?
• Fine-grainedparallelism• Highbandwidth forcoalescentreads• Nodeepcachehierarchy
• Weneedtooversubscribecoresforhidinglatency
Trans
Cand
ResSor
t
Trans
Add
Sweep1
Thres
Remv
Sweep2
-0.1
-0.05
0
0.05
0.1
NV
IDIA
P100 G
PU
Inte
l Hasw
ell
20c
thermal2matrixfromSuiteSparse,RCMordering,8el/row.
Runtime[s]
NVIDIAV100“Volta”7.8TFLOP/sDP16GBRAM@900GB/s
Steinbuch Centre for Computing50 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
HowaboutGPUs?
• Fine-grainedparallelism• Highbandwidth forcoalescentreads• Nodeepcachehierarchy
• Weneedtooversubscribecoresforhidinglatencytopopt120matrixfromtopologyoptimization,67el/row.
Trans
Can
dRes
Sort
Trans
Add
Swee
p1
Thres
Rem
v
Swee
p2-1
-0.5
0
0.5
1
NV
IDIA
P100 G
PU
Inte
l Hasw
ell
20c
Runtime[s]
NVIDIAV100“Volta”7.8TFLOP/sDP16GBRAM@900GB/s
Steinbuch Centre for Computing51 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
• Top-levelsolveriterationsasqualitymetric.• Fewsweepsgivea“better”preconditionerthanILU(0).• BetterthanILUT?
ParILUT quality
0 2 4 6 8 10
Number of ParICT steps (2 sweeps per step)
0
10
20
30
40
50
60
70
80
CG
Ite
ratio
ns
IC(0)ICTParICT
Anisotropicfluidflowproblemn:741,nz:4,951
Steinbuch Centre for Computing52 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
• Top-levelsolveriterationsasqualitymetric.• Fewsweepsgivea“better”preconditionerthanILU(0).• BetterthanILUT?
ParILUT quality
0 2 4 6 8 10
Number of ParICT steps (2 sweeps per step)
0
10
20
30
40
50
60
70
80
CG
Ite
ratio
ns
IC(0)ICTParICT
0 500 1000 1500
ILU(0) Pattern discrepancy ILUT
0
1
2
3
4
5
6
7
8
9
10
Nu
mb
er
of
Pa
rIC
T s
tep
s (2
sw
ee
ps
pe
r st
ep
)
Anisotropicfluidflowproblemn:741,nz:4,951
Steinbuch Centre for Computing53 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Anisotropicfluidflowproblemn:741,nz:4,951
• Top-levelsolveriterationsasqualitymetric.• Fewsweepsgivea“better”preconditionerthanILU(0).• BetterthanILUT?
ParILUT quality
• Patternstagnatesafterfewsweeps.• Pattern“morelike”ILUTthanILU(0).
0 2 4 6 8 10
Number of ParICT steps (2 sweeps per step)
0
10
20
30
40
50
60
70
80
CG
Ite
ratio
ns
IC(0)ICTParICT
0 500 1000 1500
ILU(0) Pattern discrepancy ILUT
0
1
2
3
4
5
6
7
8
9
10
Nu
mb
er
of
Pa
rIC
T s
tep
s (2
sw
ee
ps
pe
r st
ep
)
ParICT
Steinbuch Centre for Computing54 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Testmatrices
Steinbuch Centre for Computing55 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Convergence:GMRESiterations
Steinbuch Centre for Computing56 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Convergence:CGiterations
Steinbuch Centre for Computing57 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Preconditioning
Ax = b
Weiterativelysolve alinearsystemoftheformWherenonsingularand.
Theconvergenceratetypicallydependsontheconditioning ofthe linearsystem,whichistheratiobetweenthelargestandsmallesteigenvalue.
Withwecantransform thelinearsystemintoasystemwithalowerconditionnumber:
Ifwenowapplythe iterativesolvertothepreconditionedsystem, weusuallygetfasterconvergence.
A 2 Rn⇥nb, x 2 Rn
MAx = Mb (left preconditioned)
AMy = b, x = My (right preconditioned)
cond2(A) =
�max
�min
=
1�
min
1�
max
= cond2(A�1
)
MAx = Mb
M ⇡ A�1
Steinbuch Centre for Computing58 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Preconditioning
Assume ,then: .Solutionrightavailable,butcomputing
isexpensive…
M = A�1x = MAx = Mb
M = A�1
Ax = b
Weiterativelysolve alinearsystemoftheformWherenonsingularand.
Theconvergenceratetypicallydependsontheconditioning ofthe linearsystem,whichistheratiobetweenthelargestandsmallesteigenvalue.
Withwecantransform thelinearsystemintoasystemwithalowerconditionnumber:
Ifwenowapplythe iterativesolvertothepreconditionedsystem, weusuallygetfasterconvergence.
A 2 Rn⇥nb, x 2 Rn
MAx = Mb (left preconditioned)
AMy = b, x = My (right preconditioned)
cond2(A) =
�max
�min
=
1�
min
1�
max
= cond2(A�1
)
MAx = Mb
M ⇡ A�1
Steinbuch Centre for Computing59 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Preconditioning
Assume ,then: .Solutionrightavailable,butcomputing
isexpensive…
Thepreconditionedsystemisrarelyformedexplicitely,instedisappliedimplicitely:
M = A�1x = MAx = Mb
M = A�1
Ax = b
Weiterativelysolve alinearsystemoftheformWherenonsingularand.
Theconvergenceratetypicallydependsontheconditioning ofthe linearsystem,whichistheratiobetweenthelargestandsmallesteigenvalue.
Withwecantransform thelinearsystemintoasystemwithalowerconditionnumber:
Ifwenowapplythe iterativesolvertothepreconditionedsystem, weusuallygetfasterconvergence.
A 2 Rn⇥nb, x 2 Rn
MAx = Mb (left preconditioned)
AMy = b, x = My (right preconditioned)
cond2(A) =
�max
�min
=
1�
min
1�
max
= cond2(A�1
)
MAx = Mb
M ⇡ A�1
M zk+1 = Mrk+1
MA
Steinbuch Centre for Computing60 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Preconditioning
Assume ,then: .Solutionrightavailable,butcomputing
isexpensive…
Thepreconditionedsystemisrarelyformedexplicitely,instedisappliedimplicitely:
Insteadofformingthepreconditioner explicitlyandapplyingas,IncompleteFactorizationPreconditioner(ILU)trytofindanapproximatefactorization.
M = A�1x = MAx = Mb
M = A�1
zk+1 = Mrk+1
M ⇡ A�1
A ⇡ L · U
Ax = b
Weiterativelysolve alinearsystemoftheformWherenonsingularand.
Theconvergenceratetypicallydependsontheconditioning ofthe linearsystem,whichistheratiobetweenthelargestandsmallesteigenvalue.
Withwecantransform thelinearsystemintoasystemwithalowerconditionnumber:
Ifwenowapplythe iterativesolvertothepreconditionedsystem, weusuallygetfasterconvergence.
A 2 Rn⇥nb, x 2 Rn
MAx = Mb (left preconditioned)
AMy = b, x = My (right preconditioned)
cond2(A) =
�max
�min
=
1�
min
1�
max
= cond2(A�1
)
MAx = Mb
M ⇡ A�1
M zk+1 = Mrk+1
MA
Steinbuch Centre for Computing61 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU
Preconditioning
Assume ,then: .Solutionrightavailable,butcomputing
isexpensive…
Thepreconditionedsystemisrarelyformedexplicitely,instedisappliedimplicitely:
Insteadofformingthepreconditioner explicitlyandapplyingas,IncompleteFactorizationPreconditioner(ILU)trytofindanapproximatefactorization.
Intheapplicationphase,thepreconditioner isonlygivenimplicitly,requiringtwotriangularsolves:
M = A�1x = MAx = Mb
M = A�1
zk+1 = Mrk+1
M�1zk+1 = rk+1
LUzk+1| {z }=:y
= rk+1
)Ly = rk+1, Uzk+1 = y
zk+1 = Mrk+1
M ⇡ A�1
Ax = b
Weiterativelysolve alinearsystemoftheformWherenonsingularand.
Theconvergenceratetypicallydependsontheconditioning ofthe linearsystem,whichistheratiobetweenthelargestandsmallesteigenvalue.
Withwecantransform thelinearsystemintoasystemwithalowerconditionnumber:
Ifwenowapplythe iterativesolvertothepreconditionedsystem, weusuallygetfasterconvergence.
A 2 Rn⇥nb, x 2 Rn
MAx = Mb (left preconditioned)
AMy = b, x = My (right preconditioned)
cond2(A) =
�max
�min
=
1�
min
1�
max
= cond2(A�1
)
MAx = Mb
M ⇡ A�1
MAM zk+1 = Mrk+1
A ⇡ L · U