ParILUT – A New Parallel Threshold ILUhanzt/talks/ParILUT.pdf · 2 05/07/2018 Hartwig Anzt:...

61
KIT – The Research University in the Helmholtz Association Hartwig Anzt Steinbuch Centre for Computing (SCC) www.kit.edu ParILUT – A New Parallel Threshold ILU 05/07/2018 Kolloquiumsvortrag in der Fakultät für Informatik

Transcript of ParILUT – A New Parallel Threshold ILUhanzt/talks/ParILUT.pdf · 2 05/07/2018 Hartwig Anzt:...

KIT – The Research University in the Helmholtz Association

Hartwig AnztSteinbuch Centre for Computing (SCC)

www.kit.edu

ParILUT – ANewParallel Threshold ILU05/07/2018Kolloquiumsvortrag inderFakultätfürInformatik

Steinbuch Centre for Computing2 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Motivation

Wearelookingforafactorization-basedpreconditionersuchthat .isagoodapproximationwithmoderatenonzerocount(e.g.).

• Whereshouldthesenonzeroelementsbelocated?• Howcanwecomputethepreconditioner inahighlyparallelfashion?

A ⇡ L · Unnz(L+ U) = nnz(A)

S(A) = {(i, j) 2 N2 : Aij 6= 0}

Steinbuch Centre for Computing3 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

ExactLUFactorization

• Decomposesystemmatrixintoproduct .• BasedonGaussianelimination.• Triangularsolvestosolveasystem:

• De-Factostandardforsolvingdenseproblems.• Whataboutsparse?Oftensignificantfill-in…

A = L · U

Ax = b

IncompleteLUFactorization(ILU)

• Focusedonrestrictingfill-intoaspecificsparsitypattern.

• ForILU(0),isthesparsitypatternof.• Workswellformanyproblems.• Isthisthebestwecangetfornonzerocount?

• Fill-ininthresholdILU(ILUT)basesonthesignificanceofelements(e.g.magnitude).

• Oftenbetterpreconditionersthanlevel-basedILU.

• Difficulttoparallelize.

S

S

A

S

A ⇡ L · Unnz(L+ U) = nnz(A)

Wearelookingforafactorization-basedpreconditionersuchthat .isagoodapproximationwithmoderatenonzerocount(e.g.).

• Whereshouldthesenonzeroelementsbelocated?• Howcanwecomputethepreconditioner inahighlyparallelfashion?

Motivation

Ly = b ) y ) Ux = y ) x

Steinbuch Centre for Computing4 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

ExactLUFactorization

• Decomposesystemmatrixintoproduct .• BasedonGaussianelimination.• Triangularsolvestosolveasystem:

• De-Factostandardforsolvingdenseproblems.• Whataboutsparse?Oftensignificantfill-in…

A = L · U

Ax = b

IncompleteLUFactorization(ILU)

• Focusedonrestrictingfill-intoaspecificsparsitypattern.

• ForILU(0),isthesparsitypatternof.• Workswellformanyproblems.• Isthisthebestwecangetfornonzerocount?

• Fill-ininthresholdILU(ILUT)basesonthesignificanceofelements(e.g.magnitude).

• Oftenbetterpreconditionersthanlevel-basedILU.

• Difficulttoparallelize.

S

S

A

S

A ⇡ L · Unnz(L+ U) = nnz(A)

Wearelookingforafactorization-basedpreconditionersuchthat .isagoodapproximationwithmoderatenonzerocount(e.g.).

• Whereshouldthesenonzeroelementsbelocated?• Howcanwecomputethepreconditioner inahighlyparallelfashion?

Motivation

Ly = b ) y ) Ux = y ) x

Steinbuch Centre for Computing5 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

ExactLUFactorization

• Decomposesystemmatrixintoproduct .• BasedonGaussianelimination.• Triangularsolvestosolveasystem:

• De-Factostandardforsolvingdenseproblems.• Whataboutsparse?Oftensignificantfill-in…

A = L · U

Ax = b

IncompleteLUFactorization(ILU)

• Focusedonrestrictingfill-intoaspecificsparsitypattern.

• ForILU(0),isthesparsitypatternof.• Workswellformanyproblems.• Isthisthebestwecangetfornonzerocount?

• Fill-ininthresholdILU(ILUT)basesonthesignificanceofelements(e.g.magnitude).

• Oftenbetterpreconditionersthanlevel-basedILU.

• Difficulttoparallelize.

S

S

A

S

A ⇡ L · Unnz(L+ U) = nnz(A)

Wearelookingforafactorization-basedpreconditionersuchthat .isagoodapproximationwithmoderatenonzerocount(e.g.).

• Whereshouldthesenonzeroelementsbelocated?• Howcanwecomputethepreconditioner inahighlyparallelfashion?

Motivation

Ly = b ) y ) Ux = y ) x

L 2 Rn⇥n

U 2 Rn⇥n

Lij = Uij = 0 8(i, j) /2 S

lower(unit-)triangular,sparse.

uppertriangular,sparse.

R = L� U, Rij = 0 8(i, j) 2 S.

.

Steinbuch Centre for Computing6 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

ExactLUFactorization

• Decomposesystemmatrixintoproduct .• BasedonGaussianelimination.• Triangularsolvestosolveasystem:

• De-Factostandardforsolvingdenseproblems.• Whataboutsparse?Oftensignificantfill-in…

A = L · U

Ax = b

IncompleteLUFactorization(ILU)

• Focusedonrestrictingfill-intoaspecificsparsitypattern.

• ForILU(0),isthesparsitypatternof.• Workswellformanyproblems.• Isthisthebestwecangetfornonzerocount?

• Fill-ininthresholdILU(ILUT)basesonthesignificanceofelements(e.g.magnitude).

• Oftenbetterpreconditionersthanlevel-basedILU.

• Difficulttoparallelize.

S

S

A

S

A ⇡ L · Unnz(L+ U) = nnz(A)

Wearelookingforafactorization-basedpreconditionersuchthat .isagoodapproximationwithmoderatenonzerocount(e.g.).

• Whereshouldthesenonzeroelementsbelocated?• Howcanwecomputethepreconditioner inahighlyparallelfashion?

Motivation

Ly = b ) y ) Ux = y ) x

Steinbuch Centre for Computing7 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

ExactLUFactorization

• Decomposesystemmatrixintoproduct .• BasedonGaussianelimination.• Triangularsolvestosolveasystem:

• De-Factostandardforsolvingdenseproblems.• Whataboutsparse?Oftensignificantfill-in…

A = L · U

Ax = b

IncompleteLUFactorization(ILU)

• Focusedonrestrictingfill-intoaspecificsparsitypattern.

• ForILU(0),isthesparsitypatternof.• Workswellformanyproblems.• Isthisthebestwecangetfornonzerocount?

• Fill-ininthresholdILU(ILUT)basesonthesignificanceofelements(e.g.magnitude).

• Oftenbetterpreconditionersthanlevel-basedILU.

• Difficulttoparallelize.

S

S

A

S

A ⇡ L · Unnz(L+ U) = nnz(A)

Wearelookingforafactorization-basedpreconditionersuchthat .isagoodapproximationwithmoderatenonzerocount(e.g.).

• Whereshouldthesenonzeroelementsbelocated?• Howcanwecomputethepreconditioner inahighlyparallelfashion?

Motivation

Ly = b ) y ) Ux = y ) x

Steinbuch Centre for Computing8 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Motivation

Wearelookingforafactorization-basedpreconditionersuchthat .isagoodapproximationwithmoderatenonzerocount(e.g.).

• Whereshouldthesenonzeroelementsbelocated?• Howcanwecomputethepreconditioner inahighlyparallelfashion?

Rethinktheoverallstrategy!

• Useaparalleliterativeprocesstogeneratefactors.

• Thepreconditionershouldhaveamoderatenumberofnonzeroelements,butwedon’tcaretoomuchabout intermediatedata.

1. Selectasetofnonzero locations.2. Computevaluesinthose locationssuchthat isa“good”approximation.

3. Maybechangesomelocationsinfavoroflocationsthatresult inabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.

A ⇡ L · U

A ⇡ L · Unnz(L+ U) = nnz(A)

Steinbuch Centre for Computing9 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Considerations

• Thisisanoptimizationproblem…

1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat

isa“good” approximation.3. Maybechangesomelocationsinfavorof

locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.

A ⇡ L · U

0

BBBBBB@

? ? ? ? ?? ? ? ? ?? ? ?? ?

? ? ?? ? ? ?

1

CCCCCCA�

0

BBBBBB@

?? ?? ? ?? ?

? ?? ? ? ?

1

CCCCCCA⇥

0

BBBBBB@

? ? ? ? ?? ? ? ?

??

? ??

1

CCCCCCA

AR = L U⇥ILUresidual -

Steinbuch Centre for Computing10 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Considerations

• Thisisanoptimizationproblem…

1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat

isa“good” approximation.3. Maybechangesomelocationsinfavorof

locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.

A ⇡ L · U

0

BBBBBB@

? ? ? ? ?? ? ? ? ?? ? ?? ?

? ? ?? ? ? ?

1

CCCCCCA�

0

BBBBBB@

?? ?? ? ?? ?

? ?? ? ? ?

1

CCCCCCA⇥

0

BBBBBB@

? ? ? ? ?? ? ? ?

??

? ??

1

CCCCCCA

0

BBBBBB@

? ? ? ? ?? ? ? ? ? ?? ? ?? ?

? ? ?? ? ? ?

1

CCCCCCA

Steinbuch Centre for Computing11 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Considerations

• Thisisanoptimizationproblem…

1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat

isa“good” approximation.3. Maybechangesomelocationsinfavorof

locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.

A ⇡ L · U

0

BBBBBB@

? ? ? ? ?? ? ? ? ?? ? ?? ?

? ? ?? ? ? ?

1

CCCCCCA�

0

BBBBBB@

?? ?? ? ?? ?

? ?? ? ? ?

1

CCCCCCA⇥

0

BBBBBB@

? ? ? ? ?? ? ? ?

??

? ??

1

CCCCCCA

0

BBBBBB@

? ? ? ? ?? ? ? ? ? ?? ? ? ?? ?

? ? ?? ? ? ?

1

CCCCCCA

Steinbuch Centre for Computing12 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Considerations

• Thisisanoptimizationproblem…

1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat

isa“good” approximation.3. Maybechangesomelocationsinfavorof

locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.

A ⇡ L · U

0

BBBBBB@

? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ?

? ? ? ?? ? ? ? ? ?

1

CCCCCCA

0

BBBBBB@

? ? ? ? ?? ? ? ? ?? ? ?? ?

? ? ?? ? ? ?

1

CCCCCCA�

0

BBBBBB@

?? ?? ? ?? ?

? ?? ? ? ?

1

CCCCCCA⇥

0

BBBBBB@

? ? ? ? ?? ? ? ?

??

? ??

1

CCCCCCA

Steinbuch Centre for Computing13 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Considerations

• Thisisanoptimizationproblem…

1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat

isa“good” approximation.3. Maybechangesomelocationsinfavorof

locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.

A ⇡ L · U

0

BBBBBB@

? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ?

? ? ? ?? ? ? ? ? ?

1

CCCCCCA

0

BBBBBB@

? ? ? ? ?? ? ? ? ?? ? ?? ?

? ? ?? ? ? ?

1

CCCCCCA�

0

BBBBBB@

?? ?? ? ?? ?

? ?? ? ? ?

1

CCCCCCA⇥

0

BBBBBB@

? ? ? ? ?? ? ? ?

??

? ??

1

CCCCCCA

0

BBBBBB@

? ? ? ? ?? ? ? ? ?? ? ?? ?

? ? ?? ? ? ?

1

CCCCCCA�

0

BBBBBB@

? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ?

? ? ? ?? ? ? ? ? ?

1

CCCCCCA=

Residual:

Steinbuch Centre for Computing14 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Considerations1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat

isa“good” approximation.3. Maybechangesomelocationsinfavorof

locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.

A ⇡ L · U

nnz(A� L · U)nnz(L+ U)

• Thisisanoptimizationproblemwithequationsandvariables.

0

BBBBBB@

? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ?

? ? ? ?? ? ? ? ? ?

1

CCCCCCA=

0

BBBBBB@

? ? ? ? ?? ? ? ? ?? ? ?? ?

? ? ?? ? ? ?

1

CCCCCCA�

0

BBBBBB@

?? ?? ? ?? ?

? ?? ? ? ?

1

CCCCCCA⇥

0

BBBBBB@

? ? ? ? ?? ? ? ?

??

? ??

1

CCCCCCA

0

BBBBBB@

? ? ? ? ?? ? ? ? ?? ? ?? ?

? ? ?? ? ? ?

1

CCCCCCA�

0

BBBBBB@

?? ?? ? ?? ?

? ?? ? ? ?

1

CCCCCCA⇥

0

BBBBBB@

? ? ? ? ?? ? ? ?

??

? ??

1

CCCCCCA

SSparsitypattern

Steinbuch Centre for Computing15 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Considerations1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat

isa“good” approximation.3. Maybechangesomelocationsinfavorof

locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.

A ⇡ L · U

• Thisisanoptimizationproblemwithequationsandvariables.

• Wemaywanttocomputethevaluesinsuchthat,theapproximationbeingexactinthe locationsincludedin,butnotoutside!

L,U

=

0

BBBBBB@

? ? ? ? ?? ? ? ? ?? ? ?? ?

? ? ?? ? ? ?

1

CCCCCCA�

0

BBBBBB@

?? ?? ? ?? ?

? ?? ? ? ?

1

CCCCCCA⇥

0

BBBBBB@

? ? ? ? ?? ? ? ?

??

? ??

1

CCCCCCA

R = A� L · U = 0|S

0

BBBBBB@

? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ?

? ? ? ?? ? ? ? ? ?

1

CCCCCCA

nnz(L+ U)equationsvariablesnnz(L+ U)

S

nnz(A� L · U)nnz(L+ U)

SSparsitypattern

Steinbuch Centre for Computing16 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Considerations1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat

isa“good” approximation.3. Maybechangesomelocationsinfavorof

locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.

A ⇡ L · U

• Thisisanoptimizationproblemwithequationsandvariables.

• Wemaywanttocomputethevaluesinsuchthat,theapproximationbeingexactinthe locationsincludedin,butnotoutside!

• ThisistheunderlyingideaofEdmondChow’sparallelILUalgorithm1:

L,U

1ChowandPatel.“Fine-grainedParallelIncompleteLUFactorization”. In:SIAMJ.onSci.Comp.(2015).

R = A� L · U = 0|SS

nnz(A� L · U)nnz(L+ U)

L · U = A|S ) F (lij , uij) =

(1

ujj

⇣aij �

Pj�1k=1 likukj

⌘, i > j

aij �Pi�1

k=1 likukj , i j

Steinbuch Centre for Computing17 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Considerations1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat

isa“good” approximation.3. Maybechangesomelocationsinfavorof

locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.

A ⇡ L · U

• Thisisanoptimizationproblemwithequationsandvariables.

• Wemaywanttocomputethevaluesinsuchthat,theapproximationbeingexactinthe locationsincludedin,butnotoutside!

• ThisistheunderlyingideaofEdmondChow’sparallelILUalgorithm1:

• Convergesintheasymptoticsensetowards incompletefactorssuchthat

L,U

1ChowandPatel.“Fine-grainedParallelIncompleteLUFactorization”. In:SIAMJ.onSci.Comp.(2015).

R = A� L · U = 0|SS

nnz(A� L · U)nnz(L+ U)

R = A� L · U = 0|SL,U

L · U = A|S ) F (lij , uij) =

(1

ujj

⇣aij �

Pj�1k=1 likukj

⌘, i > j

aij �Pi�1

k=1 likukj , i j

Steinbuch Centre for Computing18 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Considerations1. Selectasetofnonzero locations.2. Computevalues inthoselocationssuchthat

isa“good” approximation.3. Maybechangesomelocationsinfavorof

locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.

A ⇡ L · U

• Thisisanoptimizationproblemwithequationsandvariables.

• Wemaywanttocomputethevaluesinsuchthat,theapproximationbeingexactinthe locationsincludedin,butnotoutside!

• ThisistheunderlyingideaofEdmondChow’sparallelILUalgorithm1:

• Wemaynotneedhighaccuracyhere,becausewemaychangethepatternagain…

• Onesinglefixed-pointsweep.

L,U R = A� L · U = 0|SS

nnz(A� L · U)nnz(L+ U)

1ChowandPatel.“Fine-grainedParallelIncompleteLUFactorization”. In:SIAMJ.onSci.Comp.(2015).

L · U = A|S ) F (lij , uij) =

(1

ujj

⇣aij �

Pj�1k=1 likukj

⌘, i > j

aij �Pi�1

k=1 likukj , i j

Steinbuch Centre for Computing19 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Considerations1. Selectasetofnonzero locations.2. Computevaluesinthose locationssuchthat

isa“good”approximation.3. Maybechangesomelocationsinfavorof

locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.

A ⇡ L · U

• Comparingsparsitypatternsextremelydifficult.• MaybeusetheILUresidualasconvergencecheck.

=

0

BBBBBB@

? ? ? ? ?? ? ? ? ?? ? ?? ?

? ? ?? ? ? ?

1

CCCCCCA�

0

BBBBBB@

?? ?? ? ?? ?

? ?? ? ? ?

1

CCCCCCA⇥

0

BBBBBB@

? ? ? ? ?? ? ? ?

??

? ??

1

CCCCCCA

0

BBBBBB@

? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ?

? ? ? ?? ? ? ? ? ?

1

CCCCCCA

Steinbuch Centre for Computing20 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Considerations

A ⇡ L · U

1. Selectasetofnonzero locations.2. Computevaluesinthose locationssuchthat

isa“good”approximation.3. Maybechangesomelocationsinfavorof

locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.

A• Thesparsitypatternofmightbeagoodinitialstartfornonzero locations.

Steinbuch Centre for Computing21 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Considerations

A ⇡ L · U

1. Selectasetofnonzero locations.2. Computevaluesinthose locationssuchthat

isa“good”approximation.3. Maybechangesomelocations infavorof

locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.

A

1Saad.“IterativeMethodsforSparseLinearSystems,2nd Edition”. (2003).

• Thesparsitypatternofmightbeagoodinitialstartfornonzero locations.• Then,theapproximationwillbeexactforalllocations

andnonzero inlocations1.S1 = (S(A) [ S(L0 · U0)) \ S(L0 + U0)S0 = S(L0 + U0)

0

BBBBBB@

? ? ? ? ?? ? ? ? ?? ? ?? ?

? ? ?? ? ? ?

1

CCCCCCA�

0

BBBBBB@

?? ?? ? ?? ?

? ?? ? ? ?

1

CCCCCCA⇥

0

BBBBBB@

? ? ? ? ?? ? ? ?

??

? ??

1

CCCCCCA=

0

BBBBBB@

? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ?

? ? ? ?? ? ? ? ? ?

1

CCCCCCA

Steinbuch Centre for Computing22 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Considerations

0

BBBBBB@

? ? ? ? ?? ? ? ? ?? ? ?? ?

? ? ?? ? ? ?

1

CCCCCCA�

0

BBBBBB@

?? ?? ? ?? ? ? ?

? ? ?? ? ? ? ? ?

1

CCCCCCA⇥

0

BBBBBB@

? ? ? ? ?? ? ? ? ?

? ? ? ?? ?

? ??

1

CCCCCCA=

0

BBBBBB@

? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?

1

CCCCCCA

A ⇡ L · U

1. Selectasetofnonzero locations.2. Computevaluesinthose locationssuchthat

isa“good”approximation.3. Maybechangesomelocations infavorof

locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.

A

S1 = (S(A) [ S(L0 · U0)) \ S(L0 + U0)

• Thesparsitypatternofmightbeagoodinitialstartfornonzero locations.• Then,theapproximationwillbeexactforalllocations

andnonzero inlocations1.

• Addingalltheselocations(level-fill!)mightbegoodidea…

S0 = S(L0 + U0)

Steinbuch Centre for Computing23 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Considerations

A ⇡ L · U

A

S1 = (S(A) [ S(L0 · U0)) \ S(L0 + U0)

• Thesparsitypatternofmightbeagoodinitialstartfornonzero locations.• Then,theapproximationwillbeexactforalllocations

andnonzero inlocations1.

• Addingalltheselocations(level-fill!)mightbegoodidea, butaddingthesewillagaingeneratenewnonzeroresiduals

S0 = S(L0 + U0)

S2 = (S(A) [ S(L1 · U1)) \ S(L1 + U1)

0

BBBBBB@

? ? ? ? ?? ? ? ? ?? ? ?? ?

? ? ?? ? ? ?

1

CCCCCCA�

0

BBBBBB@

?? ?? ? ?? ? ? ?

? ? ?? ? ? ? ? ?

1

CCCCCCA⇥

0

BBBBBB@

? ? ? ? ?? ? ? ? ?

? ? ? ?? ?

? ??

1

CCCCCCA=

0

BBBBBB@

? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?? ? ? ? ? ?

1

CCCCCCA

1. Selectasetofnonzero locations.2. Computevaluesinthose locationssuchthat

isa“good”approximation.3. Maybechangesomelocations infavorof

locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.

Steinbuch Centre for Computing24 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Considerations

A ⇡ L · U

1. Selectasetofnonzero locations.2. Computevaluesinthose locationssuchthat

isa“good”approximation.3. Maybechangesomelocations infavorof

locationsthatresultinabetterpreconditioner.4. Repeatuntilthepreconditionerqualitystagnates.

• Atsomepointweshouldremovesomelocationsagain,e.g.thesmallestelements,andstartoverlookingatlocations…R = A� Lk · Uk

Steinbuch Centre for Computing25 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Interleavingfixed-pointsweepsapproximatingvalueswithpattern-changingsymbolicroutines.

ParILUT

Steinbuch Centre for Computing26 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

ParILUT :Parallelisminsidetheblocks

Interleavingfixed-pointsweepsapproximatingvalueswithpattern-changingsymbolicroutines.Parallelisminsidethebuildingblocks.

Steinbuch Centre for Computing27 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Parallelisminsidetheblocks:Fixed-pointsweeps

Fixed-pointsweepsapproximatevaluesinILUfactorsandresidual1:

• Inherentlyparalleloperation.• Elementscanbeupdatedasynchronously.

• Wecanexpect100%parallelefficiencyifnumberofcores<numberofelements

• Residualnormisaglobalreduction.

1ChowandPatel.“Fine-grainedParallelIncompleteLUFactorization”. In:SIAMJ.onSci.Comp.(2015).

bilinearfixed-pointiterationcanbeparallelizedbyelements

F (lij , uij) =

(1

ujj

⇣aij �

Pj�1k=1 likukj

⌘, i > j

aij �Pi�1

k=1 likukj , i j

Steinbuch Centre for Computing28 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Parallelisminsidetheblocks:Candidatesearch

S⇤ = (S(A) [ S(L · U)) \ S(L+ U)

sparsematrixproduct

sparsematrixsum

• Combinationofsparsematrixproductandsparsematrixsums.

• BuildingblocksavailableinSparseBLAS.• Blockscanbecombinedintoonekernelfor

higher(memory)efficiency.• Kernelcanbeparallelizedbyrows.

• Costheavilydependentonsparsitypattern.

• Kernelperformanceboundbymemorybandwidth.

sparsematrixsum

Identifylocationsthataresymbolicallynonzero:

Steinbuch Centre for Computing29 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Parallelisminsidetheblocks:Selectingthresholds

Athresholdseparatingthesmallestelementsisneededforremovinginsignificantlocationsandkeepingsparsity.

• Standardapproach:sort/selectionalgorithms• Highcomputationalcost• Memory-intensive• Hardtoparallelize

• Thresholdsdonotneedtobeexact:• Inaccuratethresholdsresultinafewadditional/lesselements.• Wecanusesamplingtogetreasonableapproximations.• Multiplesampling-basedselectionrunsallow togenerate

thresholdsofreasonablequalityinparallel.

• Isthisappropriateformany-corearchitectureswith5Kthreadsexecutingsimultaneously?

Steinbuch Centre for Computing30 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Thresholdselectiononparallelarchitectures

• SelectionSort (comparisons,elementswaps)• QuickSelect (averagecomparisons,elementswaps)• Floyd-Rivest algorithm()• IntroSelect (worstcasecomparisons,elementswaps)

JohnD.McCaplin (TACC)

Selectionalgorithmstraditionallybasedonre-arrangingelementsinmemory.

O(n)O(n2)O(n) O(n)

O(n) O(n)n+min(k, n� k) +O(

pn)

• Computepower(#FLOPs)growsmuchfasterthanmemorybandwidth.

• Data-rearrangingselectionalgorithmsbecomeinefficient.

”Operations arefree,memoryaccessiswhatcounts.”

64bitread,1FLOP

Steinbuch Centre for Computing31 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Thresholdselectiononparallelarchitectures:StreamSelect

Rethinktheoverallstrategy!

• Primarygoal:reducethememorytraffic.• Accountforhighcorecounts,eachhavingasetofregisters.• Assume“nice”distributionofvalues(~uniform).• Acceptsomeinaccuracyinthegeneratedthreshold(approximation).

1. Findthelargestandsmallestelementstogetthedatarange.

2. Generateafinegridofthresholds,distribute themtothecores.3. Streamalldataonesingletime.4. Eachcorehandlesasetofthresholds andcountshowmany

elements arelarger/smaller.5. Selectthethresholdwiththeelementcountclosesttothe

targetvalue.

NVIDIAV100“Volta”7.8TFLOP/sDP16GBRAM@900GB/s

80Multiprocessors,eachwith64FP32cores[oversubscribewiththreadstohidelatency]

Steinbuch Centre for Computing32 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Thresholdselectiononparallelarchitectures:StreamSelect

Rethinktheoverallstrategy!

• Primarygoal:reducethememorytraffic.• Accountforhighcorecounts,eachhavingasetofregisters.• Assume“nice”distributionofvalues(~uniform).• Acceptsomeinaccuracyinthegeneratedthreshold(approximation).

1. Findthelargestandsmallestelementstogetthedatarange.

2. Generateafinegridofthresholds,distribute themtothecores.3. Streamalldataonesingletime.4. Eachcorehandlesasetofthresholds andcountshowmany

elements arelarger/smaller.5. Selectthethresholdwiththeelementcountclosesttothe

targetvalue.

NVIDIAV100“Volta”7.8TFLOP/sDP16GBRAM@900GB/s

80Multiprocessors,eachwith64FP32cores[oversubscribewiththreadstohidelatency]

Run5120threads(bindtocores)eachafewthresholds:

32 thresholds gives mesh granularity of 6.1035e-06

0 5 10 15 20 25 30 35

Thresholds per thread

0

10

20

30

40

Ru

ntim

e

LinearStreamSelect

Steinbuch Centre for Computing33 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

0 5 10 15 20 25 30 35

Thresholds per thread

0

10

20

30

40

Ru

ntim

e

LinearStreamSelect

Thresholdselectiononparallelarchitectures:StreamSelect

Rethinktheoverallstrategy!

• Primarygoal:reducethememorytraffic.• Accountforhighcorecounts,eachhavingasetofregisters.• Assume“nice”distributionofvalues(~uniform).• Acceptsomeinaccuracyinthegeneratedthreshold(approximation).

1. Findthelargestandsmallestelementstogetthedatarange.

2. Generateafinegridofthresholds,distribute themtothecores.3. Streamalldataonesingletime.4. Eachcorehandlesasetofthresholds andcountshowmany

elements arelarger/smaller.5. Selectthethresholdwiththeelementcountclosesttothe

targetvalue.

NVIDIAV100“Volta”7.8TFLOP/sDP16GBRAM@900GB/s

OnaV100GPU:80Multiprocessorseachwith32FP64cores

64FP32cores[oversubscribewiththreadstohidelatency]

Run5120threads(bindtocores)eachafewthresholds:

32 thresholds gives mesh granularity of 6.1035e-06

• Setsizem,subsetsizen*• Foruniformdistribution: quality~meshgranularity.

• Runtimeincreaseslinearwithsetsize.• Runtimeindependentofsubset size.

103

104

105

106

107

108

Set size m

10-4

10-2

100

102

Runtim

e

103 104 105 106 107 108

Subset size n*

10-10

10-5

100

Devi

atio

n

(n-n*) / m

Steinbuch Centre for Computing34 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Isthisafuture-orientedalgorithm?

Interleavingfixed-pointsweepsapproximatingvalueswithpattern-changingsymbolicroutines.Parallelisminsidethebuildingblocks.

Steinbuch Centre for Computing35 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Isthisafuture-orientedalgorithm?

Interleavingfixed-pointsweepsapproximatingvalueswithpattern-changingsymbolicroutines.Parallelisminsidethebuildingblocks.

Bulk-Synchronous Algorithm!

Steinbuch Centre for Computing36 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Isthisafuture-orientedalgorithm?

DependenciesInterleavingfixed-pointsweepsapproximatingvalueswithpattern-changingsymbolicroutines.Parallelisminsidethebuildingblocks.

Bulk-Synchronous Algorithm!

Steinbuch Centre for Computing37 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Isthisafuture-orientedalgorithm?

DependenciesInterleavingfixed-pointsweepsapproximatingvalueswithpattern-changingsymbolicroutines.Parallelisminsidethebuildingblocks.

Bulk-Synchronous Algorithm!

Steinbuch Centre for Computing38 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Isthisafuture-orientedalgorithm?

DependenciesInterleavingfixed-pointsweepsapproximatingvalueswithpattern-changingsymbolicroutines.Parallelisminsidethebuildingblocks.

Bulk-Synchronous Algorithm!

Steinbuch Centre for Computing39 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Isthisafuture-orientedalgorithm?

DependenciesInterleavingfixed-pointsweepsapproximatingvalueswithpattern-changingsymbolicroutines.Parallelisminsidethebuildingblocks.

Bulk-Synchronous Algorithm!

Steinbuch Centre for Computing40 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Strongdependency –wecannotstartbeforefinished.

Isthisafuture-orientedalgorithm?

Steinbuch Centre for Computing41 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Strongdependency –wecannotstartbeforefinished.Weakdependency – ifwestartbefore:+/- fewnonzeros.

Isthisafuture-orientedalgorithm?

Steinbuch Centre for Computing42 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Strongdependency –wecannotstartbeforefinished.Weakdependency – ifwestartbefore:+/- fewnonzeros.Wecanalreadystartwiththenextiteration.

Isthisafuture-orientedalgorithm?

Steinbuch Centre for Computing43 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Excellentcandidateforhybridhardware?Asynchronous execution?

GPU?

Strongdependency –wecannotstartbeforefinished.Weakdependency – ifwestartbefore:+/- fewnonzeros.Wecanalreadystartwiththenextiteration.

Isthisafuture-orientedalgorithm?

Steinbuch Centre for Computing44 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

ParILUT – ANewParallelThreshold ILU

• HybridParILUT versionutilizingGPUandCPU,overlappingcommunication&computation.

• Asynchronous versionrelaxingdependencies.

• Useadifferentsparsity-patterngenerator:• Randomized?• Machinelearningtechniques?

• Increasingfill-intowards“full”factorization.

• ParILUT routinesavailableinMAGMA-sparse– theywillbeinGinkgo!

Helmholtz Impuls und VernetzungsfondVH-NG-1241

Nextsteps:

ThisresearchisincooperationwithEdmondChow(GaTech)andJackDongarra (UniversityofTennessee).

Slidesavailable:

Steinbuch Centre for Computing45 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Scalability

0 10 20 30 40 50 60 70

Number of Threads

0

10

20

30

40

50

60

70

Sp

ee

du

p

CSC CSRCandidatesResidualsILU-normCSR CSCAddSweep1Select2RmRemoveSweep2

10 20 30 40 50 60

Number of Threads

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Runtim

e fra

ctio

n

CSC CSRCandidatesResidualsILU-normSelectAddSweepsRemove

thermal2matrixfromSuiteSparse,RCMordering,8el/row.

IntelXeon Phi 7250“Knights Landing”[email protected],16GBMCDRAM@490GB/s

• Buildingblocksscalewith15%- 100%parallelefficiency.• Transpositionandsortarethebottlenecks.• Overallspeedup~35xwhenusing68KNLcores.

Steinbuch Centre for Computing46 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

IntelXeon Phi 7250“Knights Landing”[email protected],16GBMCDRAM@490GB/s

Scalability

10 20 30 40 50 60

Number of Threads

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Runtim

e fra

ctio

n

CSC CSRCandidatesResidualsILU-normSelectAddSweepsRemove

0 10 20 30 40 50 60 70

Number of Threads

0

10

20

30

40

50

60

70

Sp

ee

du

p

CSC CSRCandidatesResidualsILU-normCSR CSCAddSweep1Select2RmRemoveSweep2

topopt120matrixfromtopologyoptimization,67el/row.

• Buildingblocksscalewith15%- 100%parallelefficiency.• Dominatedbycandidatesearch.• Overallspeedup~52xwhenusing68KNLcores.

Steinbuch Centre for Computing47 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

IntelXeon Phi 7250“Knights Landing”[email protected],16GBMCDRAM@490GB/s

Performance

Matrix Origin Rows Nonzeros Ratio SuperLU ParILUT ParICT

ani7 2DAnisotropicDiffusion 203,841 1,407,811 6.91 10.48s 0.45s 23.34 0.30s 35.16

apache2 SuiteSparseMatrixCollect. 715,176 4,817,870 6.74 62.27s 1.24s 50.22 0.65s 95.37

cage11 SuiteSparseMatrixCollect. 39,082 559,722 14.32 60.89s 0.54s 112.56 --

jacobianMat9 Fun3DFluid FlowProblem 90,708 5,047,042 55.64 153.84s 7.26s 21.19 --

thermal2 ThermalProblem(Suite Sp.) 1,228,045 8,580,313 6.99 91.83s 1.23s 74.66 0.68s 134.25

tmt_sym SuiteSparseMatrixCollect. 726,713 5,080,961 6.97 53.42s 0.70s 76.21 0.41s 131.25

topopt120 Geometry Optimization 132,300 8,802,544 66.53 44.22s 14.40s 3.07 8.24s 5.37

torso2 SuiteSparseMatrixCollect. 115,967 1,033,473 8.91 10.78s 0.27s 39.92 --

venkat01 SuiteSparseMatrixCollect. 62,424 1,717,792 27.52 8.53s 0.74s 11.54 --

Runtimeof5ParILUT /ParICT stepsandspeedup overSuperLU ILUT*.

*WethankSherryLiandMeiyue Shaofortechnicalhelpingeneratingtheperformancenumbers.

Steinbuch Centre for Computing48 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

HowaboutGPUs?

• Fine-grainedparallelism• Highbandwidth forcoalescentreads• Nodeepcachehierarchy

• Weneedtooversubscribecoresforhidinglatency

NVIDIAV100“Volta”7.8TFLOP/sDP16GBRAM@900GB/s

Steinbuch Centre for Computing49 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

HowaboutGPUs?

• Fine-grainedparallelism• Highbandwidth forcoalescentreads• Nodeepcachehierarchy

• Weneedtooversubscribecoresforhidinglatency

Trans

Cand

ResSor

t

Trans

Add

Sweep1

Thres

Remv

Sweep2

-0.1

-0.05

0

0.05

0.1

NV

IDIA

P100 G

PU

Inte

l Hasw

ell

20c

thermal2matrixfromSuiteSparse,RCMordering,8el/row.

Runtime[s]

NVIDIAV100“Volta”7.8TFLOP/sDP16GBRAM@900GB/s

Steinbuch Centre for Computing50 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

HowaboutGPUs?

• Fine-grainedparallelism• Highbandwidth forcoalescentreads• Nodeepcachehierarchy

• Weneedtooversubscribecoresforhidinglatencytopopt120matrixfromtopologyoptimization,67el/row.

Trans

Can

dRes

Sort

Trans

Add

Swee

p1

Thres

Rem

v

Swee

p2-1

-0.5

0

0.5

1

NV

IDIA

P100 G

PU

Inte

l Hasw

ell

20c

Runtime[s]

NVIDIAV100“Volta”7.8TFLOP/sDP16GBRAM@900GB/s

Steinbuch Centre for Computing51 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

• Top-levelsolveriterationsasqualitymetric.• Fewsweepsgivea“better”preconditionerthanILU(0).• BetterthanILUT?

ParILUT quality

0 2 4 6 8 10

Number of ParICT steps (2 sweeps per step)

0

10

20

30

40

50

60

70

80

CG

Ite

ratio

ns

IC(0)ICTParICT

Anisotropicfluidflowproblemn:741,nz:4,951

Steinbuch Centre for Computing52 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

• Top-levelsolveriterationsasqualitymetric.• Fewsweepsgivea“better”preconditionerthanILU(0).• BetterthanILUT?

ParILUT quality

0 2 4 6 8 10

Number of ParICT steps (2 sweeps per step)

0

10

20

30

40

50

60

70

80

CG

Ite

ratio

ns

IC(0)ICTParICT

0 500 1000 1500

ILU(0) Pattern discrepancy ILUT

0

1

2

3

4

5

6

7

8

9

10

Nu

mb

er

of

Pa

rIC

T s

tep

s (2

sw

ee

ps

pe

r st

ep

)

Anisotropicfluidflowproblemn:741,nz:4,951

Steinbuch Centre for Computing53 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Anisotropicfluidflowproblemn:741,nz:4,951

• Top-levelsolveriterationsasqualitymetric.• Fewsweepsgivea“better”preconditionerthanILU(0).• BetterthanILUT?

ParILUT quality

• Patternstagnatesafterfewsweeps.• Pattern“morelike”ILUTthanILU(0).

0 2 4 6 8 10

Number of ParICT steps (2 sweeps per step)

0

10

20

30

40

50

60

70

80

CG

Ite

ratio

ns

IC(0)ICTParICT

0 500 1000 1500

ILU(0) Pattern discrepancy ILUT

0

1

2

3

4

5

6

7

8

9

10

Nu

mb

er

of

Pa

rIC

T s

tep

s (2

sw

ee

ps

pe

r st

ep

)

ParICT

Steinbuch Centre for Computing54 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Testmatrices

Steinbuch Centre for Computing55 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Convergence:GMRESiterations

Steinbuch Centre for Computing56 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Convergence:CGiterations

Steinbuch Centre for Computing57 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Preconditioning

Ax = b

Weiterativelysolve alinearsystemoftheformWherenonsingularand.

Theconvergenceratetypicallydependsontheconditioning ofthe linearsystem,whichistheratiobetweenthelargestandsmallesteigenvalue.

Withwecantransform thelinearsystemintoasystemwithalowerconditionnumber:

Ifwenowapplythe iterativesolvertothepreconditionedsystem, weusuallygetfasterconvergence.

A 2 Rn⇥nb, x 2 Rn

MAx = Mb (left preconditioned)

AMy = b, x = My (right preconditioned)

cond2(A) =

�max

�min

=

1�

min

1�

max

= cond2(A�1

)

MAx = Mb

M ⇡ A�1

Steinbuch Centre for Computing58 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Preconditioning

Assume ,then: .Solutionrightavailable,butcomputing

isexpensive…

M = A�1x = MAx = Mb

M = A�1

Ax = b

Weiterativelysolve alinearsystemoftheformWherenonsingularand.

Theconvergenceratetypicallydependsontheconditioning ofthe linearsystem,whichistheratiobetweenthelargestandsmallesteigenvalue.

Withwecantransform thelinearsystemintoasystemwithalowerconditionnumber:

Ifwenowapplythe iterativesolvertothepreconditionedsystem, weusuallygetfasterconvergence.

A 2 Rn⇥nb, x 2 Rn

MAx = Mb (left preconditioned)

AMy = b, x = My (right preconditioned)

cond2(A) =

�max

�min

=

1�

min

1�

max

= cond2(A�1

)

MAx = Mb

M ⇡ A�1

Steinbuch Centre for Computing59 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Preconditioning

Assume ,then: .Solutionrightavailable,butcomputing

isexpensive…

Thepreconditionedsystemisrarelyformedexplicitely,instedisappliedimplicitely:

M = A�1x = MAx = Mb

M = A�1

Ax = b

Weiterativelysolve alinearsystemoftheformWherenonsingularand.

Theconvergenceratetypicallydependsontheconditioning ofthe linearsystem,whichistheratiobetweenthelargestandsmallesteigenvalue.

Withwecantransform thelinearsystemintoasystemwithalowerconditionnumber:

Ifwenowapplythe iterativesolvertothepreconditionedsystem, weusuallygetfasterconvergence.

A 2 Rn⇥nb, x 2 Rn

MAx = Mb (left preconditioned)

AMy = b, x = My (right preconditioned)

cond2(A) =

�max

�min

=

1�

min

1�

max

= cond2(A�1

)

MAx = Mb

M ⇡ A�1

M zk+1 = Mrk+1

MA

Steinbuch Centre for Computing60 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Preconditioning

Assume ,then: .Solutionrightavailable,butcomputing

isexpensive…

Thepreconditionedsystemisrarelyformedexplicitely,instedisappliedimplicitely:

Insteadofformingthepreconditioner explicitlyandapplyingas,IncompleteFactorizationPreconditioner(ILU)trytofindanapproximatefactorization.

M = A�1x = MAx = Mb

M = A�1

zk+1 = Mrk+1

M ⇡ A�1

A ⇡ L · U

Ax = b

Weiterativelysolve alinearsystemoftheformWherenonsingularand.

Theconvergenceratetypicallydependsontheconditioning ofthe linearsystem,whichistheratiobetweenthelargestandsmallesteigenvalue.

Withwecantransform thelinearsystemintoasystemwithalowerconditionnumber:

Ifwenowapplythe iterativesolvertothepreconditionedsystem, weusuallygetfasterconvergence.

A 2 Rn⇥nb, x 2 Rn

MAx = Mb (left preconditioned)

AMy = b, x = My (right preconditioned)

cond2(A) =

�max

�min

=

1�

min

1�

max

= cond2(A�1

)

MAx = Mb

M ⇡ A�1

M zk+1 = Mrk+1

MA

Steinbuch Centre for Computing61 05/07/2018 Hartwig Anzt:ParILUT – Anew parallelthreshold ILU

Preconditioning

Assume ,then: .Solutionrightavailable,butcomputing

isexpensive…

Thepreconditionedsystemisrarelyformedexplicitely,instedisappliedimplicitely:

Insteadofformingthepreconditioner explicitlyandapplyingas,IncompleteFactorizationPreconditioner(ILU)trytofindanapproximatefactorization.

Intheapplicationphase,thepreconditioner isonlygivenimplicitly,requiringtwotriangularsolves:

M = A�1x = MAx = Mb

M = A�1

zk+1 = Mrk+1

M�1zk+1 = rk+1

LUzk+1| {z }=:y

= rk+1

)Ly = rk+1, Uzk+1 = y

zk+1 = Mrk+1

M ⇡ A�1

Ax = b

Weiterativelysolve alinearsystemoftheformWherenonsingularand.

Theconvergenceratetypicallydependsontheconditioning ofthe linearsystem,whichistheratiobetweenthelargestandsmallesteigenvalue.

Withwecantransform thelinearsystemintoasystemwithalowerconditionnumber:

Ifwenowapplythe iterativesolvertothepreconditionedsystem, weusuallygetfasterconvergence.

A 2 Rn⇥nb, x 2 Rn

MAx = Mb (left preconditioned)

AMy = b, x = My (right preconditioned)

cond2(A) =

�max

�min

=

1�

min

1�

max

= cond2(A�1

)

MAx = Mb

M ⇡ A�1

MAM zk+1 = Mrk+1

A ⇡ L · U