Making Static Pivoting Dependable

1
Making Static Pivoting Dependable E. Jason Riedy EECS Department, UC Berkeley [email protected] 1. Motivation For sparse LU factorization, dynamic pivoting tightly couples symbolic and numerical computation. Dynamic structural changes limit parallel scalability. Demmel and Li use static pivoting in distributed SuperLU for performance, but intentionally perturbing the input may lead silently to erroneous results. Are there experimentally stable static pivoting heuristics that lead to a de- pendable direct solver? The answer is currently a qualified yes. Current heuristics fail on a few systems, but all failures are detectable. 2. Pivoting and Perturbation Heuristics Partial pivoting is a heuristic to limit element growth. Another approach: perturb pivots to keep divisors large. Small perturbations should not overly affect stability. Pivoting Heuristics Partial Pivoting Given: ratio u>=1. i p = argmax i=j +1..N |a(i, j )|. If |a(i p ,j )| >u ·|a(j, j )|, swap rows i p and j . Static Pivoting Given: threshold T , perturbation P . If |a(j, j )| <T , add ±P to a(j, j ) (preserving sign). Perturbation Heuristics Norm-relative: Keep perturbations small relative to global norm. Used in SuperLU with parameter ρ =2 10 machine precision 2 -17 . Diagonal-relative: Sparsity-preserving orderings limit changes to the diagonal, so keep perturbations small relative to original diagonal. Norm-Relative T = P = ρ ·A 1 Diagonal-Relative Save original diagonal entries in d. At column j , T = P = ρ ·|d(j )| 3. Weighted Bipartite Matchings To minimize the number of perturbations during factorization, we pre-pivot to place large entries on the diagonal. Parallelizable auction algorithms can maximize product or sum of the diagonal; sums produce less reliable pivots. Quantizing diagonal to integer saves around 10% of execution time with no noticable effect on pivots. Approximating the objective improving time by 30% – 300%, also with little noticable effect. 4. Iterative Refinement Newton’s method applied to Ax = b. Can use extra precision in residual and temporary solution to obtain norm- wise and componentwise accuracy. See Yozo Hida’s poster for full details. Negligible Incremental Cost: Extra-precise residuals approximately 25nnz fp operations in software, but no additional memory accesses. If the factor- ing produces a fill of over 12×, then extra-precise residuals are no more expensive than each step’s solve. 5. Summary Out of 231 large matrices from UF Collection (N from 2500 to 213 360): Extra-precise refinement failed to stabilize five matrices for the best norm- relative perturbations and two for diagonal relative (Zhao2, av41092). All failed from pivot growth > 10 8 ; simple to check. These systems were the only error estimate failures. Note: Only worried about condition numbers less than 1 / machine precision, depending on non-zeros per row / col. Little variation Failures are rare and detectable. Careful scaling should reduce pivot growth in current failure cases. Partial Pivoting Initial berr = max |b-Ax| |b|+|A|·|x| Scaled Condition Number (log2) Relative Backward Error (log2) -3.0 -2.5 -2.0 0 4 8 12 20 28 36 44 52 60 68 76 -60 -52 -44 -36 -28 -20 -12 -4 err nrm = x-x true x true Scaled Normwise Condition Number (log2) Relative Normwise Error (log2) -3.0 -2.5 -2.0 -1.5 0 4 8 12 20 28 36 44 52 60 68 76 -60 -52 -44 -36 -28 -20 -12 -4 err cmp = max |x-x true | |x true | Scaled Condition Number (log2) Relative Componentwise Error (log2) -3.0 -2.5 -2.0 -1.5 0 4 8 12 20 28 36 44 52 60 68 76 -60 -52 -44 -36 -28 -20 -12 -4 Refined Scaled Condition Number (log2) Relative Backward Error (log2) -3.0 -2.5 -2.0 -1.5 0 4 8 12 20 28 36 44 52 60 68 76 -60 -52 -44 -36 -28 -20 -12 -4 Scaled Normwise Condition Number (log2) Relative Normwise Error (log2) -3.0 -2.5 -2.0 -1.5 0 4 8 12 20 28 36 44 52 60 68 76 -60 -52 -44 -36 -28 -20 -12 -4 Scaled Condition Number (log2) Relative Componentwise Error (log2) -3.0 -2.5 -2.0 -1.5 0 4 8 12 20 28 36 44 52 60 68 76 -60 -52 -44 -36 -28 -20 -12 -4 Normwise Relative Error (log2) Normwise Relative Estimate (log2) -3.0 -2.5 -2.0 -1.5 -1.0 -60 -52 -44 -36 -28 -20 -12 -4 -60 -52 -44 -36 -28 -20 -12 -4 Componentwise Relative Error (log2) Componentwise Relative Estimate (log2) -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 -60 -52 -44 -36 -28 -20 -12 -4 -60 -52 -44 -36 -28 -20 -12 -4 Norm-Relative Perturbations, u =2 -17 Initial berr = max |b-Ax| |b|+|A|·|x| Scaled Condition Number (log2) Relative Backward Error (log2) -3.0 -2.5 -2.0 -1.5 0 4 8 12 20 28 36 44 52 60 68 76 -60 -52 -44 -36 -28 -20 -12 -4 err nrm = x-x true x true Scaled Normwise Condition Number (log2) Relative Normwise Error (log2) -3.0 -2.5 -2.0 -1.5 0 4 8 12 20 28 36 44 52 60 68 76 -60 -52 -44 -36 -28 -20 -12 -4 err cmp = max |x-x true | |x true | Scaled Condition Number (log2) Relative Componentwise Error (log2) -3.0 -2.5 -2.0 -1.5 -1.0 0 4 8 12 20 28 36 44 52 60 68 76 -60 -52 -44 -36 -28 -20 -12 -4 Refined Scaled Condition Number (log2) Relative Backward Error (log2) -3.0 -2.5 -2.0 -1.5 0 4 8 12 20 28 36 44 52 60 68 76 -60 -52 -44 -36 -28 -20 -12 -4 Scaled Normwise Condition Number (log2) Relative Normwise Error (log2) -3.0 -2.5 -2.0 -1.5 0 4 8 12 20 28 36 44 52 60 68 76 -60 -52 -44 -36 -28 -20 -12 -4 Scaled Condition Number (log2) Relative Componentwise Error (log2) -3.0 -2.5 -2.0 -1.5 -1.0 0 4 8 12 20 28 36 44 52 60 68 76 -60 -52 -44 -36 -28 -20 -12 -4 Normwise Relative Error (log2) Normwise Relative Estimate (log2) -3.0 -2.5 -2.0 -1.5 -1.0 -60 -52 -44 -36 -28 -20 -12 -4 -60 -52 -44 -36 -28 -20 -12 -4 Componentwise Relative Error (log2) Componentwise Relative Estimate (log2) -3.0 -2.5 -2.0 -1.5 -1.0 -0.5 -60 -52 -44 -36 -28 -20 -12 -4 -60 -52 -44 -36 -28 -20 -12 -4 Diagonal-Relative Perturbations, u =2 -12 Initial berr = max |b-Ax| |b|+|A|·|x| Scaled Condition Number (log2) Relative Backward Error (log2) -3.0 -2.5 -2.0 -1.5 0 4 8 12 20 28 36 44 52 60 68 76 -60 -52 -44 -36 -28 -20 -12 -4 err nrm = x-x true x true Scaled Normwise Condition Number (log2) Relative Normwise Error (log2) -3.0 -2.5 -2.0 -1.5 0 4 8 12 20 28 36 44 52 60 68 76 -60 -52 -44 -36 -28 -20 -12 -4 err cmp = max |x-x true | |x true | Scaled Condition Number (log2) Relative Componentwise Error (log2) -3.0 -2.5 -2.0 -1.5 -1.0 0 4 8 12 20 28 36 44 52 60 68 76 -60 -52 -44 -36 -28 -20 -12 -4 Refined Scaled Condition Number (log2) Relative Backward Error (log2) -3.0 -2.5 -2.0 -1.5 0 4 8 12 20 28 36 44 52 60 68 76 -60 -52 -44 -36 -28 -20 -12 -4 Scaled Normwise Condition Number (log2) Relative Normwise Error (log2) -3.0 -2.5 -2.0 -1.5 0 4 8 12 20 28 36 44 52 60 68 76 -60 -52 -44 -36 -28 -20 -12 -4 Scaled Condition Number (log2) Relative Componentwise Error (log2) -3.0 -2.5 -2.0 -1.5 -1.0 0 4 8 12 20 28 36 44 52 60 68 76 -60 -52 -44 -36 -28 -20 -12 -4 Normwise Relative Error (log2) Normwise Relative Estimate (log2) -3.0 -2.5 -2.0 -1.5 -60 -52 -44 -36 -28 -20 -12 -4 -60 -52 -44 -36 -28 -20 -12 -4 Componentwise Relative Error (log2) Componentwise Relative Estimate (log2) -3.0 -2.5 -2.0 -1.5 -1.0 -60 -52 -44 -36 -28 -20 -12 -4 -60 -52 -44 -36 -28 -20 -12 -4 Colorbars: log 10 of relative population. Bay Area Scientific Computing Day 2006, 4 March 2006, Livermore, CA

Transcript of Making Static Pivoting Dependable

Page 1: Making Static Pivoting Dependable

Making Static Pivoting DependableE. Jason Riedy

EECS Department, UC [email protected]

1. MotivationFor sparse LU factorization, dynamic pivoting tightly couples symbolic andnumerical computation. Dynamic structural changes limit parallel scalability.Demmel and Li use static pivoting in distributed SuperLU for performance,but intentionally perturbing the input may lead silently to erroneous results.

Are there experimentally stable static pivoting heuristics that lead to a de-pendable direct solver? The answer is currently a qualified yes. Currentheuristics fail on a few systems, but all failures are detectable.

2. Pivoting and Perturbation Heuristics

•Partial pivoting is a heuristic to limit element growth.•Another approach: perturb pivots to keep divisors large.•Small perturbations should not overly affect stability.

Pivoting HeuristicsPartial Pivoting

•Given: ratio u >= 1.• ip = argmaxi=j+1..N |a(i, j)|.• If |a(ip, j)| > u · |a(j, j)|,

swap rows ip and j.

Static Pivoting•Given: threshold T , perturbation P .• If |a(j, j)| < T ,

add ±P to a(j, j)(preserving sign).

Perturbation Heuristics•Norm-relative: Keep perturbations small relative to global norm. Used in

SuperLU with parameter ρ = 210√

machine precision ≈ 2−17.•Diagonal-relative: Sparsity-preserving orderings limit changes to the

diagonal, so keep perturbations small relative to original diagonal.

Norm-Relative•T = P = ρ · ‖A‖1

Diagonal-Relative•Save original diagonal entries in d.•At column j, T = P = ρ · |d(j)|

3. Weighted Bipartite Matchings•To minimize the number of perturbations during

factorization, we pre-pivot to place large entries onthe diagonal.

•Parallelizable auction algorithms can maximizeproduct or sum of the diagonal; sums produce lessreliable pivots.

•Quantizing diagonal to integer saves around 10%of execution time with no noticable effect on pivots.

•Approximating the objective improving time by 30%– 300%, also with little noticable effect.

4. Iterative Refinement

•Newton’s method applied to Ax = b.•Can use extra precision in residual and temporary solution to obtain norm-

wise and componentwise accuracy.•See Yozo Hida’s poster for full details.

Negligible Incremental Cost: Extra-precise residuals approximately 25nnzfp operations in software, but no additional memory accesses. If the factor-ing produces a fill of over 12×, then extra-precise residuals are no moreexpensive than each step’s solve.

5. SummaryOut of 231 large matrices from UF Collection (N from 2500 to 213 360):•Extra-precise refinement failed to stabilize five matrices for the best norm-

relative perturbations and two for diagonal relative (Zhao2, av41092).•All failed from pivot growth > 108; simple to check.•These systems were the only error estimate failures.•Note: Only worried about condition numbers less than ≈ 1/machine precision,

depending on non-zeros per row / col.• Little variationFailures are rare and detectable. Careful scaling should reduce pivotgrowth in current failure cases.

Partial Pivoting

Initi

al

berr = max |b−Ax||b|+|A|·|x|

Scaled Condition Number (log2)

Rel

ativ

e B

ackw

ard

Err

or (

log2

)

−3.0

−2.5

−2.0

0 4 8 12 20 28 36 44 52 60 68 76

−60

−52

−44

−36

−28

−20

−12

−4

errnrm = ‖x−xtrue‖∞‖xtrue‖∞

Scaled Normwise Condition Number (log2)

Rel

ativ

e N

orm

wis

e E

rror

(lo

g2)

−3.0

−2.5

−2.0

−1.5

0 4 8 12 20 28 36 44 52 60 68 76

−60

−52

−44

−36

−28

−20

−12

−4

errcmp = max |x−xtrue||xtrue|

Scaled Condition Number (log2)

Rel

ativ

e C

ompo

nent

wis

e E

rror

(lo

g2)

−3.0

−2.5

−2.0

−1.5

0 4 8 12 20 28 36 44 52 60 68 76

−60

−52

−44

−36

−28

−20

−12

−4

Refi

ned

Scaled Condition Number (log2)

Rel

ativ

e B

ackw

ard

Err

or (

log2

)

−3.0

−2.5

−2.0

−1.5

0 4 8 12 20 28 36 44 52 60 68 76

−60

−52

−44

−36

−28

−20

−12

−4

Scaled Normwise Condition Number (log2)

Rel

ativ

e N

orm

wis

e E

rror

(lo

g2)

−3.0

−2.5

−2.0

−1.5

0 4 8 12 20 28 36 44 52 60 68 76

−60

−52

−44

−36

−28

−20

−12

−4

Scaled Condition Number (log2)

Rel

ativ

e C

ompo

nent

wis

e E

rror

(lo

g2)

−3.0

−2.5

−2.0

−1.5

0 4 8 12 20 28 36 44 52 60 68 76

−60

−52

−44

−36

−28

−20

−12

−4

Normwise Relative Error (log2)

Nor

mw

ise

Rel

ativ

e E

stim

ate

(log2

)

−3.0

−2.5

−2.0

−1.5

−1.0

−60 −52 −44 −36 −28 −20 −12 −4

−60

−52

−44

−36

−28

−20

−12

−4

Componentwise Relative Error (log2)

Com

pone

ntw

ise

Rel

ativ

e E

stim

ate

(log2

)

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

−60 −52 −44 −36 −28 −20 −12 −4

−60

−52

−44

−36

−28

−20

−12

−4

Norm-Relative Perturbations, u = 2−17

Initi

al

berr = max |b−Ax||b|+|A|·|x|

Scaled Condition Number (log2)

Rel

ativ

e B

ackw

ard

Err

or (

log2

)

−3.0

−2.5

−2.0

−1.5

0 4 8 12 20 28 36 44 52 60 68 76

−60

−52

−44

−36

−28

−20

−12

−4

errnrm = ‖x−xtrue‖∞‖xtrue‖∞

Scaled Normwise Condition Number (log2)

Rel

ativ

e N

orm

wis

e E

rror

(lo

g2)

−3.0

−2.5

−2.0

−1.5

0 4 8 12 20 28 36 44 52 60 68 76

−60

−52

−44

−36

−28

−20

−12

−4

errcmp = max |x−xtrue||xtrue|

Scaled Condition Number (log2)

Rel

ativ

e C

ompo

nent

wis

e E

rror

(lo

g2)

−3.0

−2.5

−2.0

−1.5

−1.0

0 4 8 12 20 28 36 44 52 60 68 76

−60

−52

−44

−36

−28

−20

−12

−4

Refi

ned

Scaled Condition Number (log2)

Rel

ativ

e B

ackw

ard

Err

or (

log2

)

−3.0

−2.5

−2.0

−1.5

0 4 8 12 20 28 36 44 52 60 68 76

−60

−52

−44

−36

−28

−20

−12

−4

Scaled Normwise Condition Number (log2)

Rel

ativ

e N

orm

wis

e E

rror

(lo

g2)

−3.0

−2.5

−2.0

−1.5

0 4 8 12 20 28 36 44 52 60 68 76

−60

−52

−44

−36

−28

−20

−12

−4

Scaled Condition Number (log2)

Rel

ativ

e C

ompo

nent

wis

e E

rror

(lo

g2)

−3.0

−2.5

−2.0

−1.5

−1.0

0 4 8 12 20 28 36 44 52 60 68 76

−60

−52

−44

−36

−28

−20

−12

−4

Normwise Relative Error (log2)

Nor

mw

ise

Rel

ativ

e E

stim

ate

(log2

)

−3.0

−2.5

−2.0

−1.5

−1.0

−60 −52 −44 −36 −28 −20 −12 −4

−60

−52

−44

−36

−28

−20

−12

−4

Componentwise Relative Error (log2)

Com

pone

ntw

ise

Rel

ativ

e E

stim

ate

(log2

)

−3.0

−2.5

−2.0

−1.5

−1.0

−0.5

−60 −52 −44 −36 −28 −20 −12 −4

−60

−52

−44

−36

−28

−20

−12

−4

Diagonal-Relative Perturbations, u = 2−12

Initi

al

berr = max |b−Ax||b|+|A|·|x|

Scaled Condition Number (log2)

Rel

ativ

e B

ackw

ard

Err

or (

log2

)

−3.0

−2.5

−2.0

−1.5

0 4 8 12 20 28 36 44 52 60 68 76

−60

−52

−44

−36

−28

−20

−12

−4

errnrm = ‖x−xtrue‖∞‖xtrue‖∞

Scaled Normwise Condition Number (log2)

Rel

ativ

e N

orm

wis

e E

rror

(lo

g2)

−3.0

−2.5

−2.0

−1.5

0 4 8 12 20 28 36 44 52 60 68 76

−60

−52

−44

−36

−28

−20

−12

−4

errcmp = max |x−xtrue||xtrue|

Scaled Condition Number (log2)

Rel

ativ

e C

ompo

nent

wis

e E

rror

(lo

g2)

−3.0

−2.5

−2.0

−1.5

−1.0

0 4 8 12 20 28 36 44 52 60 68 76

−60

−52

−44

−36

−28

−20

−12

−4

Refi

ned

Scaled Condition Number (log2)

Rel

ativ

e B

ackw

ard

Err

or (

log2

)

−3.0

−2.5

−2.0

−1.5

0 4 8 12 20 28 36 44 52 60 68 76

−60

−52

−44

−36

−28

−20

−12

−4

Scaled Normwise Condition Number (log2)

Rel

ativ

e N

orm

wis

e E

rror

(lo

g2)

−3.0

−2.5

−2.0

−1.5

0 4 8 12 20 28 36 44 52 60 68 76

−60

−52

−44

−36

−28

−20

−12

−4

Scaled Condition Number (log2)

Rel

ativ

e C

ompo

nent

wis

e E

rror

(lo

g2)

−3.0

−2.5

−2.0

−1.5

−1.0

0 4 8 12 20 28 36 44 52 60 68 76

−60

−52

−44

−36

−28

−20

−12

−4

Normwise Relative Error (log2)

Nor

mw

ise

Rel

ativ

e E

stim

ate

(log2

)

−3.0

−2.5

−2.0

−1.5

−60 −52 −44 −36 −28 −20 −12 −4

−60

−52

−44

−36

−28

−20

−12

−4

Componentwise Relative Error (log2)

Com

pone

ntw

ise

Rel

ativ

e E

stim

ate

(log2

)

−3.0

−2.5

−2.0

−1.5

−1.0

−60 −52 −44 −36 −28 −20 −12 −4

−60

−52

−44

−36

−28

−20

−12

−4

Colorbars: log10 of relative population.

Bay Area Scientific Computing Day 2006, 4 March 2006, Livermore, CA