MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf ·...

72
MS58: Approaches to Reducing Communication in Krylov Subspace Methods Organizers: Laura Grigori (INRIA) and Erin Carson (NYU) Talks: 1. The s-Step Lanczos Method and its Behavior in Finite Precision (Erin Carson, James W. Demmel) 2. Enlarged Krylov Subspace Methods for Reducing Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding Krylov Methods (Siva Rajamanickam, Ichitaro Yamazaki, Andrey Prokopenko, Erik G. Boman, Michael Heroux, Jack J. Dongarra) 4. Sparse Approximate Inverse Preconditioners for Communication-Avoiding Bicgstab Solvers (Maryam Mehri Dehnavi, Erin Carson, Nicholas Knight, James W. Demmel, David Fernandez) 1

Transcript of MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf ·...

Page 1: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

MS58: Approaches to Reducing Communication in Krylov Subspace Methods

Organizers: Laura Grigori (INRIA) and Erin Carson (NYU)

Talks:

1. The s-Step Lanczos Method and its Behavior in Finite Precision (Erin Carson, James W. Demmel)

2. Enlarged Krylov Subspace Methods for Reducing Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf)

3. Preconditioning Communication-Avoiding Krylov Methods (Siva Rajamanickam, Ichitaro Yamazaki, Andrey Prokopenko, Erik G. Boman, Michael Heroux, Jack J. Dongarra)

4. Sparse Approximate Inverse Preconditioners for Communication-Avoiding Bicgstab Solvers (Maryam MehriDehnavi, Erin Carson, Nicholas Knight, James W. Demmel, David Fernandez)

1

Page 2: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

The s-Step Lanczos Method and its Behavior in Finite

Precision

Erin Carson, NYU

James Demmel, UC Berkeley

SIAM LA ‘15

October 30, 2015

Page 3: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Why Avoid “Communication”?

• Algorithms have two costs: computation and communication

• Communication : moving data between levels of memory hierarchy (sequential), between processors (parallel)

• On today’s computers, communication is expensive, computation is cheap, in terms of both time and energy!

2

Sequential Parallel

CPU Cache

CPU DRAM

DRAM

CPU DRAM

CPU DRAM

CPU DRAM

Page 4: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Future Exascale Systems

PetascaleSystems (2009)

Predicted ExascaleSystems*

Factor Improvement

System Peak 2 ⋅ 1015 flops 1018 flops ~1000

Node Memory Bandwidth

25 GB/s 0.4-4 TB/s ~10-100

Total Node Interconnect Bandwidth

3.5 GB/s 100-400 GB/s ~100

Memory Latency 100 ns 50 ns ~1

Interconnect Latency 1 𝜇s 0.5 𝜇s ~1

3

*Sources: from P. Beckman (ANL), J. Shalf (LBL), and D. Unat (LBL)

Page 5: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Future Exascale Systems

PetascaleSystems (2009)

Predicted ExascaleSystems*

Factor Improvement

System Peak 2 ⋅ 1015 flops 1018 flops ~1000

Node Memory Bandwidth

25 GB/s 0.4-4 TB/s ~10-100

Total Node Interconnect Bandwidth

3.5 GB/s 100-400 GB/s ~100

Memory Latency 100 ns 50 ns ~1

Interconnect Latency 1 𝜇s 0.5 𝜇s ~1

3

• Gaps between communication/computation cost only growing larger in future systems

*Sources: from P. Beckman (ANL), J. Shalf (LBL), and D. Unat (LBL)

Page 6: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Future Exascale Systems

PetascaleSystems (2009)

Predicted ExascaleSystems*

Factor Improvement

System Peak 2 ⋅ 1015 flops 1018 flops ~1000

Node Memory Bandwidth

25 GB/s 0.4-4 TB/s ~10-100

Total Node Interconnect Bandwidth

3.5 GB/s 100-400 GB/s ~100

Memory Latency 100 ns 50 ns ~1

Interconnect Latency 1 𝜇s 0.5 𝜇s ~1

3

• Gaps between communication/computation cost only growing larger in future systems

*Sources: from P. Beckman (ANL), J. Shalf (LBL), and D. Unat (LBL)

• Avoiding communication will be essential for applications at exascale!

Page 7: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Krylov Subspace Methods

4

• In each iteration,

• Add a dimension to the Krylov subspace 𝒦𝑚

• Orthogonalize (with respect to some ℒ𝑚)

• Examples: Lanczos/Conjugate Gradient (CG), Arnoldi/Generalized Minimum Residual (GMRES), Biconjugate Gradient (BICG), BICGSTAB, GKL, LSQR, etc.

• Projection process onto the expanding Krylov subspace

𝒦𝑚 𝐴, 𝑟0 = span 𝑟0, 𝐴𝑟0, 𝐴2𝑟0, … , 𝐴𝑚−1𝑟0

• General class of iterative solvers: used for linear systems, eigenvalue problems, singular value problems, least squares, etc.

𝑟new

𝐴𝛿

𝑟0

0

Page 8: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Krylov Solvers: Limited by CommunicationIn terms of communication:

5

Page 9: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Krylov Solvers: Limited by CommunicationIn terms of communication:

5

“Add a dimension to 𝒦𝑚” Sparse Matrix-Vector Multiplication (SpMV)• Parallel: comm. vector entries w/ neighbors• Sequential: read 𝐴/vectors from slow memory

×

Page 10: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Krylov Solvers: Limited by CommunicationIn terms of communication:

5

“Add a dimension to 𝒦𝑚” Sparse Matrix-Vector Multiplication (SpMV)• Parallel: comm. vector entries w/ neighbors• Sequential: read 𝐴/vectors from slow memory

“Orthogonalize (with respect to some ℒ𝑚)” Inner products

Parallel: global reduction (All-Reduce)Sequential: multiple reads/writes to slow memory

×

×

Page 11: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Krylov Solvers: Limited by CommunicationIn terms of communication:

Dependencies between communication-bound kernels in each iteration limit performance!

SpMV

orthogonalize

5

“Add a dimension to 𝒦𝑚” Sparse Matrix-Vector Multiplication (SpMV)• Parallel: comm. vector entries w/ neighbors• Sequential: read 𝐴/vectors from slow memory

“Orthogonalize (with respect to some ℒ𝑚)” Inner products

Parallel: global reduction (All-Reduce)Sequential: multiple reads/writes to slow memory

×

×

Page 12: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Given: initial vector 𝑣1 with 𝑣1 2= 1

𝑢1 = 𝐴𝑣1

for 𝑖 = 1, 2, … , until convergence do

𝛼𝑖 = 𝑣𝑖𝑇𝑢𝑖

𝑤𝑖 = 𝑢𝑖 − 𝛼𝑖𝑣𝑖

𝛽𝑖+1 = 𝑤𝑖 2

𝑣𝑖+1 = 𝑤𝑖/𝛽𝑖+1

𝑢𝑖+1 = 𝐴𝑣𝑖+1 − 𝛽𝑖+1𝑣𝑖

end for

6

The Classical Lanczos Method

Page 13: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Given: initial vector 𝑣1 with 𝑣1 2= 1

𝑢1 = 𝐴𝑣1

for 𝑖 = 1, 2, … , until convergence do

𝛼𝑖 = 𝑣𝑖𝑇𝑢𝑖

𝑤𝑖 = 𝑢𝑖 − 𝛼𝑖𝑣𝑖

𝛽𝑖+1 = 𝑤𝑖 2

𝑣𝑖+1 = 𝑤𝑖/𝛽𝑖+1

𝑢𝑖+1 = 𝐴𝑣𝑖+1 − 𝛽𝑖+1𝑣𝑖

end for

6

SpMV

The Classical Lanczos Method

Page 14: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Given: initial vector 𝑣1 with 𝑣1 2= 1

𝑢1 = 𝐴𝑣1

for 𝑖 = 1, 2, … , until convergence do

𝛼𝑖 = 𝑣𝑖𝑇𝑢𝑖

𝑤𝑖 = 𝑢𝑖 − 𝛼𝑖𝑣𝑖

𝛽𝑖+1 = 𝑤𝑖 2

𝑣𝑖+1 = 𝑤𝑖/𝛽𝑖+1

𝑢𝑖+1 = 𝐴𝑣𝑖+1 − 𝛽𝑖+1𝑣𝑖

end for

The Classical Lanczos Method

6

SpMV

Inner products

Page 15: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Communication-Avoiding KSMs

7

• Idea: Compute blocks of 𝑠 iterations at once

• Communicate every 𝑠 iterations instead of every iteration

• Reduces communication cost by 𝑶(𝒔)!

• (latency in parallel, latency and bandwidth in sequential)

Page 16: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Communication-Avoiding KSMs

7

• Idea: Compute blocks of 𝑠 iterations at once

• Communicate every 𝑠 iterations instead of every iteration

• Reduces communication cost by 𝑶(𝒔)!

• (latency in parallel, latency and bandwidth in sequential)

• An idea rediscovered many times…• First related work: s-dimensional steepest descent - Khabaza

(‘63), Forsythe (‘68), Marchuk and Kuznecov (‘68): • Flurry of work on s-step Krylov methods in ‘80s/early ‘90s: see,

e.g., Van Rosendale, 1983; Chronopoulos and Gear, 1989• Goals: increasing parallelism, avoiding I/O, increasing

“convergence rate”

Page 17: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Communication-Avoiding KSMs

7

• Idea: Compute blocks of 𝑠 iterations at once

• Communicate every 𝑠 iterations instead of every iteration

• Reduces communication cost by 𝑶(𝒔)!

• (latency in parallel, latency and bandwidth in sequential)

• An idea rediscovered many times…• First related work: s-dimensional steepest descent - Khabaza

(‘63), Forsythe (‘68), Marchuk and Kuznecov (‘68): • Flurry of work on s-step Krylov methods in ‘80s/early ‘90s: see,

e.g., Van Rosendale, 1983; Chronopoulos and Gear, 1989• Goals: increasing parallelism, avoiding I/O, increasing

“convergence rate”

• Resurgence of interest in recent years due to growing problem sizes; growing relative cost of communication

Page 18: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Communication-Avoiding KSMs: CA-Lanczos

8

• Main idea: Unroll iteration loop by a factor of 𝑠; split iteration loop into an outer loop (k) and an inner loop (j)

• Key observation: starting at some iteration 𝑖 ≡ 𝑠𝑘 + 𝑗,

𝑣𝑠𝑘+𝑗 , 𝑢𝑠𝑘+𝑗 ∈ 𝒦𝑠+1 𝐴, 𝑣𝑠𝑘+1 + 𝒦𝑠+1 𝐴, 𝑢𝑠𝑘+1 for 𝑗 ∈ 1, … , 𝑠 + 1

Page 19: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Communication-Avoiding KSMs: CA-Lanczos

8

Expand solution space 𝒔 dimensions at once• Compute “basis matrix” 𝒴𝑘 with columns spanning

𝒦𝑠+1 𝐴, 𝑣𝑠𝑘+1 + 𝒦𝑠+1 𝐴, 𝑢𝑠𝑘+1

• Requires reading 𝑨/communicating vectors only once• Using “matrix powers kernel”

Orthogonalize all at once• Compute/store block of inner products between basis vectors in

Gram matrix:

𝒢𝑘 = 𝒴𝑘𝑇𝒴𝑘

• Communication cost of one global reduction

Outer loop 𝒌: Communication step

• Main idea: Unroll iteration loop by a factor of 𝑠; split iteration loop into an outer loop (k) and an inner loop (j)

• Key observation: starting at some iteration 𝑖 ≡ 𝑠𝑘 + 𝑗,

𝑣𝑠𝑘+𝑗 , 𝑢𝑠𝑘+𝑗 ∈ 𝒦𝑠+1 𝐴, 𝑣𝑠𝑘+1 + 𝒦𝑠+1 𝐴, 𝑢𝑠𝑘+1 for 𝑗 ∈ 1, … , 𝑠 + 1

Page 20: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

9

Perform 𝑠 iterations of updates• Using 𝒴𝑘 and 𝒢𝑘, this requires no communication!• Represent 𝑛-vectors by their 𝑂 𝑠 coordinates in 𝒴𝑘:

𝑣𝑠𝑘+𝑗 = 𝒴𝑘𝑣𝑘,𝑗′ , 𝑢𝑠𝑘+𝑗 = 𝒴𝑘𝑢𝑘,𝑗

′ , 𝑤𝑠𝑘+𝑗 = 𝒴𝑘𝑤𝑗′

Inner loop:Computation

steps, no communication!

Communication-Avoiding KSMs: CA-Lanczos

Page 21: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

9

Communication-Avoiding KSMs: CA-Lanczos

𝐴𝑣𝑖+1

×𝑛

𝑛

Perform 𝑠 iterations of updates• Using 𝒴𝑘 and 𝒢𝑘, this requires no communication!• Represent 𝑛-vectors by their 𝑂 𝑠 coordinates in 𝒴𝑘:

𝑣𝑠𝑘+𝑗 = 𝒴𝑘𝑣𝑘,𝑗′ , 𝑢𝑠𝑘+𝑗 = 𝒴𝑘𝑢𝑘,𝑗

′ , 𝑤𝑠𝑘+𝑗 = 𝒴𝑘𝑤𝑗′

Inner loop:Computation

steps, no communication!

Page 22: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

9

ℬ𝑘𝑣𝑘,𝑗+1′

𝑂(𝑠)

𝑂(𝑠)

×

Communication-Avoiding KSMs: CA-Lanczos

𝐴𝑣𝑖+1

×𝑛

𝑛

Perform 𝑠 iterations of updates• Using 𝒴𝑘 and 𝒢𝑘, this requires no communication!• Represent 𝑛-vectors by their 𝑂 𝑠 coordinates in 𝒴𝑘:

𝑣𝑠𝑘+𝑗 = 𝒴𝑘𝑣𝑘,𝑗′ , 𝑢𝑠𝑘+𝑗 = 𝒴𝑘𝑢𝑘,𝑗

′ , 𝑤𝑠𝑘+𝑗 = 𝒴𝑘𝑤𝑗′

Inner loop:Computation

steps, no communication!

Page 23: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

9

ℬ𝑘𝑣𝑘,𝑗+1′

𝑂(𝑠)

𝑂(𝑠)

×

Communication-Avoiding KSMs: CA-Lanczos

𝑣𝑖𝑇𝑢𝑖

×

𝐴𝑣𝑖+1

×𝑛

𝑛

Perform 𝑠 iterations of updates• Using 𝒴𝑘 and 𝒢𝑘, this requires no communication!• Represent 𝑛-vectors by their 𝑂 𝑠 coordinates in 𝒴𝑘:

𝑣𝑠𝑘+𝑗 = 𝒴𝑘𝑣𝑘,𝑗′ , 𝑢𝑠𝑘+𝑗 = 𝒴𝑘𝑢𝑘,𝑗

′ , 𝑤𝑠𝑘+𝑗 = 𝒴𝑘𝑤𝑗′

Inner loop:Computation

steps, no communication!

Page 24: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

9

ℬ𝑘𝑣𝑘,𝑗+1′

𝑂(𝑠)

𝑂(𝑠)

×

× ×

𝑣𝑘,𝑖′𝑇 𝒢𝑘𝑢𝑘,𝑖

Communication-Avoiding KSMs: CA-Lanczos

𝑣𝑖𝑇𝑢𝑖

×

𝐴𝑣𝑖+1

×𝑛

𝑛

Perform 𝑠 iterations of updates• Using 𝒴𝑘 and 𝒢𝑘, this requires no communication!• Represent 𝑛-vectors by their 𝑂 𝑠 coordinates in 𝒴𝑘:

𝑣𝑠𝑘+𝑗 = 𝒴𝑘𝑣𝑘,𝑗′ , 𝑢𝑠𝑘+𝑗 = 𝒴𝑘𝑢𝑘,𝑗

′ , 𝑤𝑠𝑘+𝑗 = 𝒴𝑘𝑤𝑗′

Inner loop:Computation

steps, no communication!

Page 25: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Given: initial vector 𝑣1 with 𝑣1 2= 1

𝑢1 = 𝐴𝑣1

for 𝑘 = 0, 1, … , until convergence doCompute 𝒴𝑘 , compute 𝒢𝑘 = 𝒴𝑘

𝑇𝒴𝑘

Let 𝑣𝑘,1′ = 𝑒1, 𝑢𝑘,1

′ = 𝑒𝑠+2

for 𝑗 = 1, … , 𝑠 do

𝛼𝑠𝑘+𝑗 = 𝑣𝑘,𝑗′𝑇 𝒢𝑘𝑢𝑘,𝑗

𝑤𝑘,𝑗′ = 𝑢𝑘,𝑗

′ − 𝛼𝑠𝑘+𝑗𝑣𝑘,𝑗′

𝛽𝑠𝑘+𝑗+1 = 𝑤𝑘,𝑗′𝑇 𝒢𝑘𝑤𝑘,𝑗

′ 1/2

𝑣𝑘,𝑗+1′ = 𝑤𝑘,𝑗

′ / 𝛽𝑠𝑘+𝑗+1

𝑢𝑘,𝑗+1′ = ℬ𝑘𝑣𝑘,𝑗+1

′ − 𝛽𝑠𝑘+𝑗+1𝑣𝑘,𝑗′

end forCompute 𝑣𝑠𝑘+𝑠+1 = 𝒴𝑘𝑣𝑘,𝑠+1

′ , 𝑢𝑠𝑘+𝑠+1 = 𝒴𝑘𝑢𝑘,𝑠+1′

end for

10

The CA-Lanczos Method

Page 26: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

via CA Matrix Powers Kernel

Global reduction

to compute 𝒢𝑘

10

Given: initial vector 𝑣1 with 𝑣1 2= 1

𝑢1 = 𝐴𝑣1

for 𝑘 = 0, 1, … , until convergence doCompute 𝒴𝑘 , compute 𝒢𝑘 = 𝒴𝑘

𝑇𝒴𝑘

Let 𝑣𝑘,1′ = 𝑒1, 𝑢𝑘,1

′ = 𝑒𝑠+2

for 𝑗 = 1, … , 𝑠 do

𝛼𝑠𝑘+𝑗 = 𝑣𝑘,𝑗′𝑇 𝒢𝑘𝑢𝑘,𝑗

𝑤𝑘,𝑗′ = 𝑢𝑘,𝑗

′ − 𝛼𝑠𝑘+𝑗𝑣𝑘,𝑗′

𝛽𝑠𝑘+𝑗+1 = 𝑤𝑘,𝑗′𝑇 𝒢𝑘𝑤𝑘,𝑗

′ 1/2

𝑣𝑘,𝑗+1′ = 𝑤𝑘,𝑗

′ / 𝛽𝑠𝑘+𝑗+1

𝑢𝑘,𝑗+1′ = ℬ𝑘𝑣𝑘,𝑗+1

′ − 𝛽𝑠𝑘+𝑗+1𝑣𝑘,𝑗′

end forCompute 𝑣𝑠𝑘+𝑠+1 = 𝒴𝑘𝑣𝑘,𝑠+1

′ , 𝑢𝑠𝑘+𝑠+1 = 𝒴𝑘𝑢𝑘,𝑠+1′

end for

The CA-Lanczos Method

Page 27: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

via CA Matrix Powers Kernel

Global reduction

to compute 𝒢𝑘

10

Local computations: no communication!

Given: initial vector 𝑣1 with 𝑣1 2= 1

𝑢1 = 𝐴𝑣1

for 𝑘 = 0, 1, … , until convergence doCompute 𝒴𝑘 , compute 𝒢𝑘 = 𝒴𝑘

𝑇𝒴𝑘

Let 𝑣𝑘,1′ = 𝑒1, 𝑢𝑘,1

′ = 𝑒𝑠+2

for 𝑗 = 1, … , 𝑠 do

𝛼𝑠𝑘+𝑗 = 𝑣𝑘,𝑗′𝑇 𝒢𝑘𝑢𝑘,𝑗

𝑤𝑘,𝑗′ = 𝑢𝑘,𝑗

′ − 𝛼𝑠𝑘+𝑗𝑣𝑘,𝑗′

𝛽𝑠𝑘+𝑗+1 = 𝑤𝑘,𝑗′𝑇 𝒢𝑘𝑤𝑘,𝑗

′ 1/2

𝑣𝑘,𝑗+1′ = 𝑤𝑘,𝑗

′ / 𝛽𝑠𝑘+𝑗+1

𝑢𝑘,𝑗+1′ = ℬ𝑘𝑣𝑘,𝑗+1

′ − 𝛽𝑠𝑘+𝑗+1𝑣𝑘,𝑗′

end forCompute 𝑣𝑠𝑘+𝑠+1 = 𝒴𝑘𝑣𝑘,𝑠+1

′ , 𝑢𝑠𝑘+𝑠+1 = 𝒴𝑘𝑢𝑘,𝑠+1′

end for

The CA-Lanczos Method

Page 28: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Complexity Comparison

11

Flops Words Moved Messages

SpMV Orth. SpMV Orth. SpMV Orth.

Classical CG

𝑠𝑛

𝑝

𝑠𝑛

𝑝 𝑠 𝑛 𝑝 𝑠 log2 𝑝 𝑠 𝑠 log2 𝑝

CA-CG𝑠𝑛

𝑝𝑠2𝑛

𝑝𝑠 𝑛 𝑝 𝑠2 log2 𝑝 1 log2 𝑝

Example of parallel (per processor) complexity for 𝑠 iterations of Classical Lanczos vs. CA-Lanczos for a 2D 9-point stencil:

(Assuming each of 𝑝 processors owns 𝑛/𝑝 rows of the matrix and 𝑠 ≤ 𝑛/𝑝)

All values in the table meant in the Big-O sense (i.e., lower order terms and constants not included)

Page 29: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Complexity Comparison

11

Flops Words Moved Messages

SpMV Orth. SpMV Orth. SpMV Orth.

Classical CG

𝑠𝑛

𝑝

𝑠𝑛

𝑝 𝑠 𝑛 𝑝 𝑠 log2 𝑝 𝑠 𝑠 log2 𝑝

CA-CG𝑠𝑛

𝑝𝑠2𝑛

𝑝𝑠 𝑛 𝑝 𝑠2 log2 𝑝 1 log2 𝑝

All values in the table meant in the Big-O sense (i.e., lower order terms and constants not included)

Example of parallel (per processor) complexity for 𝑠 iterations of Classical Lanczos vs. CA-Lanczos for a 2D 9-point stencil:

(Assuming each of 𝑝 processors owns 𝑛/𝑝 rows of the matrix and 𝑠 ≤ 𝑛/𝑝)

Page 30: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Complexity Comparison

11

Flops Words Moved Messages

SpMV Orth. SpMV Orth. SpMV Orth.

Classical CG

𝑠𝑛

𝑝

𝑠𝑛

𝑝 𝑠 𝑛 𝑝 𝑠 log2 𝑝 𝑠 𝑠 log2 𝑝

CA-CG𝑠𝑛

𝑝𝑠2𝑛

𝑝𝑠 𝑛 𝑝 𝑠2 log2 𝑝 1 log2 𝑝

All values in the table meant in the Big-O sense (i.e., lower order terms and constants not included)

Example of parallel (per processor) complexity for 𝑠 iterations of Classical Lanczos vs. CA-Lanczos for a 2D 9-point stencil:

(Assuming each of 𝑝 processors owns 𝑛/𝑝 rows of the matrix and 𝑠 ≤ 𝑛/𝑝)

Page 31: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

From Theory to Practice

• Parameter 𝑠 is limited by machine parameters and matrix sparsity structure

• We can auto-tune to find the best 𝑠 based on these properties

• That is, find 𝑠 that gives the fastest speed per iteration

12

Page 32: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

From Theory to Practice

• Parameter 𝑠 is limited by machine parameters and matrix sparsity structure

• We can auto-tune to find the best 𝑠 based on these properties

• That is, find 𝑠 that gives the fastest speed per iteration

• In practice, we don’t just care about speed per iteration, but also the number of iterations

Runtime = (time/iteration) x (# iterations)

12

Page 33: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

From Theory to Practice

• Parameter 𝑠 is limited by machine parameters and matrix sparsity structure

• We can auto-tune to find the best 𝑠 based on these properties

• That is, find 𝑠 that gives the fastest speed per iteration

• In practice, we don’t just care about speed per iteration, but also the number of iterations

Runtime = (time/iteration) x (# iterations)

• We also need to consider how convergence rate and accuracy are affected by choice of 𝑠!

12

Page 34: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

From Theory to Practice

13

• CA-KSMs are mathematically equivalent to classical KSMs

Page 35: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

From Theory to Practice

13

• CA-KSMs are mathematically equivalent to classical KSMs

• But can behave much differently in finite precision!

Page 36: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

From Theory to Practice

13

• CA-KSMs are mathematically equivalent to classical KSMs

• Roundoff error bounds generally grow with increasing 𝑠

• But can behave much differently in finite precision!

Page 37: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

From Theory to Practice

13

• CA-KSMs are mathematically equivalent to classical KSMs

• Roundoff error bounds generally grow with increasing 𝑠

• But can behave much differently in finite precision!

• Two effects of roundoff error:

Page 38: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

From Theory to Practice

13

• CA-KSMs are mathematically equivalent to classical KSMs

• Roundoff error bounds generally grow with increasing 𝑠

• But can behave much differently in finite precision!

• Two effects of roundoff error:

1. Decrease in accuracy → Tradeoff: increasing blocking factor 𝑠 past a certain point: accuracy limited

Page 39: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

From Theory to Practice

13

• CA-KSMs are mathematically equivalent to classical KSMs

• Roundoff error bounds generally grow with increasing 𝑠

• But can behave much differently in finite precision!

• Two effects of roundoff error:

1. Decrease in accuracy → Tradeoff: increasing blocking factor 𝑠 past a certain point: accuracy limited

2. Delay of convergence → Tradeoff: increasing blocking factor 𝑠 past a certain point: no speedup expected

Page 40: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

From Theory to Practice

13

• CA-KSMs are mathematically equivalent to classical KSMs

• Roundoff error bounds generally grow with increasing 𝑠

• But can behave much differently in finite precision!

• Two effects of roundoff error:

Runtime = (time/iteration) x (# iterations)

1. Decrease in accuracy → Tradeoff: increasing blocking factor 𝑠 past a certain point: accuracy limited

2. Delay of convergence → Tradeoff: increasing blocking factor 𝑠 past a certain point: no speedup expected

Page 41: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

From Theory to Practice

13

• CA-KSMs are mathematically equivalent to classical KSMs

• Roundoff error bounds generally grow with increasing 𝑠

• But can behave much differently in finite precision!

• Two effects of roundoff error:

Runtime = (time/iteration) x (# iterations)

1. Decrease in accuracy → Tradeoff: increasing blocking factor 𝑠 past a certain point: accuracy limited

2. Delay of convergence → Tradeoff: increasing blocking factor 𝑠 past a certain point: no speedup expected

Page 42: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Paige’s Results for Classical Lanczos

• Using bounds on local rounding errors in Lanczos, Paige showed that

1. The computed Ritz values always lie between the extreme eigenvalues of 𝐴 to within a small multiple of machine precision.

2. At least one small interval containing an eigenvalue of 𝐴 is found by the 𝑛th iteration.

3. The algorithm behaves numerically like Lanczos with full reorthogonalization until a very close eigenvalue approximation is found.

4. The loss of orthogonality among basis vectors follows a rigorous pattern and implies that some Ritz values have converged.

14

Page 43: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Paige’s Results for Classical Lanczos

• Using bounds on local rounding errors in Lanczos, Paige showed that

1. The computed Ritz values always lie between the extreme eigenvalues of 𝐴 to within a small multiple of machine precision.

2. At least one small interval containing an eigenvalue of 𝐴 is found by the 𝑛th iteration.

3. The algorithm behaves numerically like Lanczos with full reorthogonalization until a very close eigenvalue approximation is found.

4. The loss of orthogonality among basis vectors follows a rigorous pattern and implies that some Ritz values have converged.

Do the same statements hold for CA-Lanczos?

14

Page 44: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Finite precision Lanczos process: (𝐴 is 𝑛 × 𝑛 with at most 𝑁 nonzeros per row)

𝐴 𝑉𝑚 = 𝑉𝑚 𝑇𝑚 + 𝛽𝑚+1 𝑣𝑚+1𝑒𝑚

𝑇 + 𝛿 𝑉𝑚

𝑉𝑚 = 𝑣1, … , 𝑣𝑚 , 𝛿 𝑉𝑚 = 𝛿 𝑣1, … , 𝛿 𝑣𝑚 , 𝑇𝑚 =

𝛼1 𝛽2

𝛽2 ⋱ ⋱

⋱ ⋱ 𝛽𝑚

𝛽𝑚 𝛼𝑚

Paige’s Lanczos Convergence Analysis

15

Page 45: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Finite precision Lanczos process: (𝐴 is 𝑛 × 𝑛 with at most 𝑁 nonzeros per row)

𝐴 𝑉𝑚 = 𝑉𝑚 𝑇𝑚 + 𝛽𝑚+1 𝑣𝑚+1𝑒𝑚

𝑇 + 𝛿 𝑉𝑚

𝑉𝑚 = 𝑣1, … , 𝑣𝑚 , 𝛿 𝑉𝑚 = 𝛿 𝑣1, … , 𝛿 𝑣𝑚 , 𝑇𝑚 =

𝛼1 𝛽2

𝛽2 ⋱ ⋱

⋱ ⋱ 𝛽𝑚

𝛽𝑚 𝛼𝑚

Paige’s Lanczos Convergence Analysis

for 𝑖 ∈ {1, … , 𝑚},𝛿 𝑣𝑖 2 ≤ 휀1𝜎

𝛽𝑖+1 𝑣𝑖𝑇 𝑣𝑖+1 ≤ 2휀0𝜎

𝑣𝑖+1𝑇 𝑣𝑖+1 − 1 ≤ 휀0 2

𝛽𝑖+12 + 𝛼𝑖

2 + 𝛽𝑖2 − 𝐴 𝑣𝑖 2

2 ≤ 4𝑖 3휀0 + 휀1 𝜎2

15

where 𝜎 ≡ 𝐴 2, and𝜃𝜎 ≡ 𝐴 2

Page 46: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Finite precision Lanczos process: (𝐴 is 𝑛 × 𝑛 with at most 𝑁 nonzeros per row)

𝐴 𝑉𝑚 = 𝑉𝑚 𝑇𝑚 + 𝛽𝑚+1 𝑣𝑚+1𝑒𝑚

𝑇 + 𝛿 𝑉𝑚

𝑉𝑚 = 𝑣1, … , 𝑣𝑚 , 𝛿 𝑉𝑚 = 𝛿 𝑣1, … , 𝛿 𝑣𝑚 , 𝑇𝑚 =

𝛼1 𝛽2

𝛽2 ⋱ ⋱

⋱ ⋱ 𝛽𝑚

𝛽𝑚 𝛼𝑚

Paige’s Lanczos Convergence Analysis

Classic Lanczos (Paige, 1976):

for 𝑖 ∈ {1, … , 𝑚},𝛿 𝑣𝑖 2 ≤ 휀1𝜎

𝛽𝑖+1 𝑣𝑖𝑇 𝑣𝑖+1 ≤ 2휀0𝜎

𝑣𝑖+1𝑇 𝑣𝑖+1 − 1 ≤ 휀0 2

𝛽𝑖+12 + 𝛼𝑖

2 + 𝛽𝑖2 − 𝐴 𝑣𝑖 2

2 ≤ 4𝑖 3휀0 + 휀1 𝜎2

15

where 𝜎 ≡ 𝐴 2, and𝜃𝜎 ≡ 𝐴 2

휀0 = 𝑂 휀𝑛

휀1 = 𝑂 휀𝑁𝜃

Page 47: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Finite precision Lanczos process: (𝐴 is 𝑛 × 𝑛 with at most 𝑁 nonzeros per row)

𝐴 𝑉𝑚 = 𝑉𝑚 𝑇𝑚 + 𝛽𝑚+1 𝑣𝑚+1𝑒𝑚

𝑇 + 𝛿 𝑉𝑚

𝑉𝑚 = 𝑣1, … , 𝑣𝑚 , 𝛿 𝑉𝑚 = 𝛿 𝑣1, … , 𝛿 𝑣𝑚 , 𝑇𝑚 =

𝛼1 𝛽2

𝛽2 ⋱ ⋱

⋱ ⋱ 𝛽𝑚

𝛽𝑚 𝛼𝑚

Paige’s Lanczos Convergence Analysis

Classic Lanczos (Paige, 1976):

for 𝑖 ∈ {1, … , 𝑚},𝛿 𝑣𝑖 2 ≤ 휀1𝜎

𝛽𝑖+1 𝑣𝑖𝑇 𝑣𝑖+1 ≤ 2휀0𝜎

𝑣𝑖+1𝑇 𝑣𝑖+1 − 1 ≤ 휀0 2

𝛽𝑖+12 + 𝛼𝑖

2 + 𝛽𝑖2 − 𝐴 𝑣𝑖 2

2 ≤ 4𝑖 3휀0 + 휀1 𝜎2

15

where 𝜎 ≡ 𝐴 2, and𝜃𝜎 ≡ 𝐴 2

CA-Lanczos:

휀0 = 𝑂 휀𝑛

휀1 = 𝑂 휀𝑁𝜃

휀0 = 𝑂 휀𝑛𝚪𝟐

휀1 = 𝑂 휀𝑁𝜃𝚪

Page 48: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Finite precision Lanczos process: (𝐴 is 𝑛 × 𝑛 with at most 𝑁 nonzeros per row)

𝐴 𝑉𝑚 = 𝑉𝑚 𝑇𝑚 + 𝛽𝑚+1 𝑣𝑚+1𝑒𝑚

𝑇 + 𝛿 𝑉𝑚

𝑉𝑚 = 𝑣1, … , 𝑣𝑚 , 𝛿 𝑉𝑚 = 𝛿 𝑣1, … , 𝛿 𝑣𝑚 , 𝑇𝑚 =

𝛼1 𝛽2

𝛽2 ⋱ ⋱

⋱ ⋱ 𝛽𝑚

𝛽𝑚 𝛼𝑚

Paige’s Lanczos Convergence Analysis

Classic Lanczos (Paige, 1976):

for 𝑖 ∈ {1, … , 𝑚},𝛿 𝑣𝑖 2 ≤ 휀1𝜎

𝛽𝑖+1 𝑣𝑖𝑇 𝑣𝑖+1 ≤ 2휀0𝜎

𝑣𝑖+1𝑇 𝑣𝑖+1 − 1 ≤ 휀0 2

𝛽𝑖+12 + 𝛼𝑖

2 + 𝛽𝑖2 − 𝐴 𝑣𝑖 2

2 ≤ 4𝑖 3휀0 + 휀1 𝜎2

15

where 𝜎 ≡ 𝐴 2, and𝜃𝜎 ≡ 𝐴 2

CA-Lanczos:

휀0 = 𝑂 휀𝑛

휀1 = 𝑂 휀𝑁𝜃

휀0 = 𝑂 휀𝑛𝚪𝟐

휀1 = 𝑂 휀𝑁𝜃𝚪

Γ ≤ maxℓ≤𝑘

𝒴ℓ+

2 ∙ 𝒴ℓ 2 ≤ 2𝑠+1 ∙ maxℓ≤𝑘

𝜅 𝒴ℓ

Page 49: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

• Roundoff errors in CA variant follow same pattern as classical variant, but amplified by factor of Γ or Γ2

• Theoretically confirms empirical observations on importance of basis conditioning (dating back to late ‘80s)

• A loose bound for the amplification term:

Γ ≤ maxℓ≤𝑘

𝒴ℓ+

2 ∙ 𝒴ℓ 2 ≤ 2𝑠+1 ∙ maxℓ≤𝑘

𝜅 𝒴ℓ

• What we really need: 𝒴 |𝑦′| 2 ≤ Γ 𝒴𝑦′ 2 to hold for the computed basis 𝒴and coordinate vector 𝑦′ in every bound.

• Tighter bound on 𝚪 possible; requires some light bookkeeping

• Example: for bounds on 𝛽𝑖+1 𝑣𝑖𝑇 𝑣𝑖+1 and 𝑣𝑖+1

𝑇 𝑣𝑖+1 − 1 , we can use the definition

Γ𝑘,𝑗 ≡ max𝑥∈{ 𝑤𝑘,𝑗

′ , 𝑢𝑘,𝑗′ , 𝑣𝑘,𝑗

′ , 𝑣𝑘,𝑗−1′ }

𝒴𝑘 𝑥2

𝒴𝑘𝑥2

The Amplification Term Γ

16

Page 50: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Problem: 2D Poisson, 𝑛 = 256, random starting vector

𝑣𝑖+1𝑇 𝑣𝑖+1 − 1 ≤ 휀0 2

𝛽𝑖+1 𝑣𝑖𝑇 𝑣𝑖+1 ≤ 2휀0𝜎

Computed value

Bound Amplification factor Γ2

𝒔 = 𝟒

Page 51: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Problem: 2D Poisson, 𝑛 = 256, random starting vector

Computed value

Bound Amplification factor Γ

𝒔 = 𝟖

𝑣𝑖+1𝑇 𝑣𝑖+1 − 1 ≤ 휀0 2

𝛽𝑖+1 𝑣𝑖𝑇 𝑣𝑖+1 ≤ 2휀0𝜎

Page 52: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

𝒔 = 𝟏𝟐

Problem: 2D Poisson, 𝑛 = 256, random starting vector

Computed value

Bound Amplification factor Γ2

𝑣𝑖+1𝑇 𝑣𝑖+1 − 1 ≤ 휀0 2

𝛽𝑖+1 𝑣𝑖𝑇 𝑣𝑖+1 ≤ 2휀0𝜎

Page 53: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Results for CA-Lanczos

18

• Back to our question: Do Paige’s results, e.g.,loss of orthogonality eigenvalue convergence

hold for CA-Lanczos?

Page 54: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

• The answer is YES!

Results for CA-Lanczos

18

…but

• Back to our question: Do Paige’s results, e.g.,loss of orthogonality eigenvalue convergence

hold for CA-Lanczos?

Page 55: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

• Only if:

• 휀0 ≡ 2휀 𝑛+11𝑠+15 Γ2 ≤1

12

• i.e., Γ ≤ 24𝜖 𝑛 + 11𝑠 + 15− 1 2

= 𝑂 𝑛𝜖 −1/2

• Otherwise, e.g., can lose orthogonality due to computation with (numerically) rank-deficient basis

• The answer is YES!

Results for CA-Lanczos

18

…but

• Back to our question: Do Paige’s results, e.g.,loss of orthogonality eigenvalue convergence

hold for CA-Lanczos?

Page 56: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

• Only if:

• 휀0 ≡ 2휀 𝑛+11𝑠+15 Γ2 ≤1

12

• i.e., Γ ≤ 24𝜖 𝑛 + 11𝑠 + 15− 1 2

= 𝑂 𝑛𝜖 −1/2

• Otherwise, e.g., can lose orthogonality due to computation with (numerically) rank-deficient basis

• The answer is YES!

Results for CA-Lanczos

18

…but

• Back to our question: Do Paige’s results, e.g.,loss of orthogonality eigenvalue convergence

hold for CA-Lanczos?

• Take-away: we can use this bound on Γ to design a better algorithm!• Mixed precision, selective reorthogonalization, dynamic basis size, etc.

Page 57: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Problem: Diagonal matrix with 𝑛 = 100 with evenly spaced eigenvalues between 𝜆𝑚𝑖𝑛 = 0.1and 𝜆𝑚𝑎𝑥 = 100; random starting vector

Top plots:

Computed Γ2

24(𝜖(𝑛 + 11𝑠 + 15) −1

Bottom Plots:

𝒔 = 𝟐

Computed Ritz values True eigenvalues

Bounds on range of computed Ritz values

Page 58: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

𝒔 = 𝟒

Bottom Plots:

Problem: Diagonal matrix with 𝑛 = 100 with evenly spaced eigenvalues between 𝜆𝑚𝑖𝑛 = 0.1and 𝜆𝑚𝑎𝑥 = 100; random starting vector

Top plots:

Computed Γ2

24(𝜖(𝑛 + 11𝑠 + 15) −1

Computed Ritz values True eigenvalues

Bounds on range of computed Ritz values

Page 59: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

𝒔 = 𝟏𝟐

Bottom Plots:

Problem: Diagonal matrix with 𝑛 = 100 with evenly spaced eigenvalues between 𝜆𝑚𝑖𝑛 = 0.1and 𝜆𝑚𝑎𝑥 = 100; random starting vector

Top plots:

Computed Γ2

24(𝜖(𝑛 + 11𝑠 + 15) −1

Computed Ritz values True eigenvalues

Bounds on range of computed Ritz values

Page 60: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Problem: Diagonal matrix with 𝑛 = 100 with evenly spaced eigenvalues between 𝜆𝑚𝑖𝑛 = 0.1 and 𝜆𝑚𝑎𝑥 = 100; random starting vector

max𝑖

|𝑧𝑖𝑚 𝑇

𝑣𝑚+1|

min𝑖

𝛽𝑚+1𝜂𝑚,𝑖(𝑚)

Page 61: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Problem: Diagonal matrix with 𝑛 = 100 with evenly spaced eigenvalues between 𝜆𝑚𝑖𝑛 = 0.1 and 𝜆𝑚𝑎𝑥 = 100; random starting vector

max𝑖

|𝑧𝑖𝑚 𝑇

𝑣𝑚+1|

min𝑖

𝛽𝑚+1𝜂𝑚,𝑖(𝑚)

Measure of loss of orthogonality

Measure of Ritz value convergence

Page 62: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Problem: Diagonal matrix with 𝑛 = 100 with evenly spaced eigenvalues between 𝜆𝑚𝑖𝑛 = 0.1 and 𝜆𝑚𝑎𝑥 = 100; random starting vector

max𝑖

|𝑧𝑖𝑚 𝑇

𝑣𝑚+1|

min𝑖

𝛽𝑚+1𝜂𝑚,𝑖(𝑚)

Page 63: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Problem: Diagonal matrix with 𝑛 = 100 with evenly spaced eigenvalues between 𝜆𝑚𝑖𝑛 = 0.1 and 𝜆𝑚𝑎𝑥 = 100; random starting vector

max𝑖

|𝑧𝑖𝑚 𝑇

𝑣𝑚+1|

min𝑖

𝛽𝑚+1𝜂𝑚,𝑖(𝑚)

Page 64: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Problem: Diagonal matrix with 𝑛 = 100 with evenly spaced eigenvalues between 𝜆𝑚𝑖𝑛 = 0.1 and 𝜆𝑚𝑎𝑥 = 100; random starting vector

max𝑖

|𝑧𝑖𝑚 𝑇

𝑣𝑚+1|

min𝑖

𝛽𝑚+1𝜂𝑚,𝑖(𝑚)

Page 65: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Problem: Diagonal matrix with 𝑛 = 100 with evenly spaced eigenvalues between 𝜆𝑚𝑖𝑛 = 0.1 and 𝜆𝑚𝑎𝑥 = 100; random starting vector

max𝑖

|𝑧𝑖𝑚 𝑇

𝑣𝑚+1|

min𝑖

𝛽𝑚+1𝜂𝑚,𝑖(𝑚)

Page 66: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Extending the results of Greenbaum (1989):

21

Eigenvalue approximations generated at each step by a perturbed Lanczos recurrence for 𝐴 are equal to those generated by exact Lanczos applied a larger matrix whose eigenvalues lie within intervals about the eigenvalues of 𝐴.

Page 67: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

𝜆

Extending the results of Greenbaum (1989):

21

Eigenvalue approximations generated at each step by a perturbed Lanczos recurrence for 𝐴 are equal to those generated by exact Lanczos applied a larger matrix whose eigenvalues lie within intervals about the eigenvalues of 𝐴.

Page 68: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

𝜆

𝑂(𝜖𝑛3 𝐴 )

Classical Lanczos

Extending the results of Greenbaum (1989):

21

Eigenvalue approximations generated at each step by a perturbed Lanczos recurrence for 𝐴 are equal to those generated by exact Lanczos applied a larger matrix whose eigenvalues lie within intervals about the eigenvalues of 𝐴.

Page 69: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

𝜆

𝑂(𝜖𝑛3 𝐴 )

𝑂(𝜖𝑛3 𝐴 𝚪𝟐)

Classical Lanczos

CA-Lanczos

Extending the results of Greenbaum (1989):

Eigenvalue approximations generated at each step by a perturbed Lanczos recurrence for 𝐴 are equal to those generated by exact Lanczos applied a larger matrix whose eigenvalues lie within intervals about the eigenvalues of 𝐴.

21

Page 70: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

𝜆

𝑂(𝜖𝑛3 𝐴 )

𝑂(𝜖𝑛3 𝐴 𝚪𝟐)

Classical Lanczos

CA-Lanczos

Extending the results of Greenbaum (1989):

Ongoing work…

21

Eigenvalue approximations generated at each step by a perturbed Lanczos recurrence for 𝐴 are equal to those generated by exact Lanczos applied a larger matrix whose eigenvalues lie within intervals about the eigenvalues of 𝐴.

Page 71: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Future Directions

22

• New Algorithms/Applications• Application of communication-avoiding ideas and solvers to new

computational science domains

• Design of new high-performance preconditioners

• Improving Usability• Automating parameter selection via “numerical auto-tuning”

• Finite-Precision Analysis• Bounds on stability and convergence for other Krylov methods

(particularly in the nonsymmetric case)

• Extension of “Backwards-like” error analyses

Broad research agenda: Design methods for large-scale problems that optimize performance subject to application-specific numerical constraints

Page 72: MS58: Approaches to Reducing Communication in Krylov ...erinc/ppt/Carson_SIAMLA15.pdf · Communication (Sophie M. Moufawad, Laura Grigori, Frederic Nataf) 3. Preconditioning Communication-Avoiding

Thank you!

contact: [email protected]://www.cims.nyu.edu/~erinc/