Western Michigan University vasilije.perovic@wmich · Motivation Gaussian Elimination Parallel...
Transcript of Western Michigan University vasilije.perovic@wmich · Motivation Gaussian Elimination Parallel...
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Parallel Implementations of Gaussian Elimination
Vasilije Perovic
Western Michigan University
January 27, 2012
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Linear systems of equations
General form of a linear system of equations is given by
a11x1 + · · · + a1nxn = b1a21x1 + · · · + a2nxn = b2...
...am1x1 + · · · + amnxn = bm
where aij ’s and bi ’s are known and we are solving for xi ’s.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Ax = b
More compactly, we can rewrite system of linear equations in theform
Ax = b
where
A =
a11 a12 . . . a1na21 a22 . . . a2n...
.... . .
...am1 am2 . . . amn
, x =
x1x2...xn
, b =
b1b2...bm
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Why study solutions of Ax = b
1 One of the most fundamental problems in the field ofscientific computing
2 Arises in many applications:
Chemical engineeringInterpolationStructural analysisRegression AnalysisNumerical ODEs and PDEs
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Why study solutions of Ax = b
1 One of the most fundamental problems in the field ofscientific computing
2 Arises in many applications:Chemical engineeringInterpolationStructural analysisRegression AnalysisNumerical ODEs and PDEs
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Methods for solving Ax = b
1 Direct methods
2 Iterative methods
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Methods for solving Ax = b
1 Direct methods – obtain the exact solution (in real arithmetic)in finitely many operations
2 Iterative methods – generate sequence of approximations thatconverge in the limit to the solution
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Methods for solving Ax = b
1 Direct methods – obtain the exact solution (in real arithmetic)in finitely many operations
Gaussian elimination (LU factorization)QR factorizationWZ factorization
2 Iterative methods – generate sequence of approximations thatconverge in the limit to the solution
Jacobi iterationGauss-Seidal iterationSOR method (successive over-relaxation)
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Methods for solving Ax = b
1 Direct methods – obtain the exact solution (in real arithmetic)in finitely many operations
Gaussian elimination (LU factorization)
QR factorizationWZ factorization
2 Iterative methods – generate sequence of approximations thatconverge in the limit to the solution
Jacobi iterationGauss-Seidal iterationSOR method (successive over-relaxation)
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination
When solving Ax = b we will assume throughout this presentationthat A is non-singular and A and b are known
a11x1 + · · · + a1nxn = b1a21x1 + · · · + a2nxn = b2...
...an1x1 + · · · + annxn = bn .
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination
Assuming that a11 �= 0 we first subtract a21/a11 times the firstequation from the second equation to eliminate the coefficient x1in the second equation, and so on until the coefficients of x1 in thelast n − 1 rows have all been eliminated. This gives the modifiedsystem of equations
a11x1+ a12x2 + · · · + a1nxn = b1a(1)22
x2 + · · · + a(1)2n xn = b(1)
2
......
...
a(1)n1 x2 + · · · + a(1)nn xn = b(1)n ,
where
a(1)ij = aij − a1jai1a11
b(1)i = bi − b1ai1a11
i , j = 1, 2, . . . , n .
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination
Assuming that a11 �= 0 we first subtract a21/a11 times the firstequation from the second equation to eliminate the coefficient x1in the second equation, and so on until the coefficients of x1 in thelast n − 1 rows have all been eliminated. This gives the modifiedsystem of equations
a11x1+ a12x2 + · · · + a1nxn = b1a(1)22
x2 + · · · + a(1)2n xn = b(1)
2
......
...
a(1)n1 x2 + · · · + a(1)nn xn = b(1)n ,
where
a(1)ij = aij − a1jai1a11
b(1)i = bi − b1ai1a11
i , j = 1, 2, . . . , n .
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination (Forward Reduction)
Applying the same process the last n− 1 equations of the modifiedsystem to eliminate coefficients of x2 in the last n − 2 equations,and so on, until the entire system has been reduced to the (upper)triangular form
a11 a12 · · · a1na(1)22
· · · a(1)2n
. . ....
a(n−1)nn
x1x2· · ·xn
=
b1b(1)2
...
b(n−1)n
.
• The superscripts indicate the number of times the elements hadto be changed.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination (Forward Reduction)
Applying the same process the last n− 1 equations of the modifiedsystem to eliminate coefficients of x2 in the last n − 2 equations,and so on, until the entire system has been reduced to the (upper)triangular form
a11 a12 · · · a1na(1)22
· · · a(1)2n
. . ....
a(n−1)nn
x1x2· · ·xn
=
b1b(1)2
...
b(n−1)n
.
• The superscripts indicate the number of times the elements hadto be changed.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination – Example
• Perform the forward reduction the the system given below
4 −9 22 −4 4
−1 2 2
x1x2x3
=
231
We start by writing down the augmented matrix for the givensystem:
4 −9 2 22 −4 4 3
−1 2 2 1
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination – Example
• Perform the forward reduction the the system given below
4 −9 22 −4 4
−1 2 2
x1x2x3
=
231
We start by writing down the augmented matrix for the givensystem:
4 −9 2 22 −4 4 3
−1 2 2 1
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination – Example
We perform appropriate row operations to obtain:
4 −9 2 22 −4 4 3
−1 2 2 1
−−→
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination – Example
We perform appropriate row operations to obtain:
4 −9 2 22 −4 4 3
−1 2 2 1
R2−( 2
4)R1→R2−−−−−−−−−−→
R3−(−1
4)R1→R3
4 −9 2 20 0.5 3 20 −0.25 2.5 1.5
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination – Example
We perform appropriate row operations to obtain:
4 −9 2 22 −4 4 3
−1 2 2 1
R2−( 2
4)R1→R2−−−−−−−−−−→
R3−(−1
4)R1→R3
4 −9 2 20 0.5 3 20 −0.25 2.5 1.5
R3−(−0.250.5 )R2→R3−−−−−−−−−−−→
4 −9 2 20 0.5 3 20 0 4 2.5
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination – Example
Note that the row operations used to eliminate x1 from the secondand the third equations are equivalent to multiplying on the leftthe augmented matrix:
1 0 0
−0.5 1 00.25 0 1
·
4 −9 2 22 −4 4 3
−1 2 2 1
=
4 −9 2 20 0.5 3 20 −0.25 2.5 1.5
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination – Example
Note that the row operations used to eliminate x1 from the secondand the third equations are equivalent to multiplying on the leftthe augmented matrix:
1 0 0
−0.5 1 00.25 0 1
� �� �L1
·
4 −9 2 22 −4 4 3
−1 2 2 1
=
4 −9 2 20 0.5 3 20 −0.25 2.5 1.5
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination – Example
Note that the row operations used to eliminate x1 from the secondand the third equations are equivalent to multiplying on the leftthe augmented matrix:
1 0 00 1 00 0.5 1
� �� �L2
·
1 0 0
−0.5 1 00.25 0 1
� �� �L1
· [A|b] =
4 −9 2 20 0.5 3 20 0 4 2.5
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Gaussian Elimination – Example
1 0 00 1 00 0.5 1
� �� �L2
·
1 0 0
−0.5 1 00.25 0 1
� �� �L1� �� �
�L= L2·L1
· [A|b] =
4 −9 2 20 0.5 3 20 0 4 2.5
Thus, we can write
(L2 · L1) · A = �L · A = U
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
LU factorization
In general, we can think of row operations as multiplying withmatrices on the left, thus the i th elimination step is equivalent asmultiplication on the left by
Li =
1. . .
1−li+1,i
.... . .
−ln,i 1
.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
LU factorization
Continuing in this fashion we obtain
�LAx = �Lb , �L = Ln−1 · · · L2L1 ,
A = �L−1U = LU .
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
LU factorization
Continuing in this fashion we obtain
�LAx = �Lb , �L = Ln−1 · · · L2L1 ,
A = �L−1U = LU .
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
LU factorization
Continuing in this fashion we obtain
A = �L−1U = LU .
Thus, the Gaussian elimination algorithm for solving Ax = b ismathematically equivalent to the three-step process:
1. Factor A = LU2. Solve (forward substitution) Ly = b3. Solve (back substitution) Ux = y .
• Order of operations:n3
3+ n2 − n
3multiplications/divisions
n3
3+ n2
2− 5n
6.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
LU factorization
Continuing in this fashion we obtain
A = �L−1U = LU .
Thus, the Gaussian elimination algorithm for solving Ax = b ismathematically equivalent to the three-step process:
1. Factor A = LU2. Solve (forward substitution) Ly = b3. Solve (back substitution) Ux = y .
• Order of operations:n3
3+ n2 − n
3multiplications/divisions
n3
3+ n2
2− 5n
6.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Need for partial pivoting
Apply LU factorization without pivoting to
A =
�0.0001 1
1 1
�
in three-decimal-digit floating point arithmetic.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Need for partial pivoting
A =
�0.0001 1
1 1
�
Solution: L and U are easily obtainable and are given by:
L =
�1 0
fl(1/10−4) 1
�, fl(1/10−4) rounds to 104,
U =
�10−4 10 fl(1− 104 · 1)
�, fl(1− 104 · 1) rounds to − 104,
so LU =
�1 0104 1
� �10−4 10 −104
�=
�10−4 11 0
�
but A =
�10−4 11 1
�.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Need for partial pivoting
A =
�10−4 11 1
�but LU =
�10−4 11 0
�
Remark 1: Note that original a22 has been entirely “lost” fromthe computation by subtracting 104 from it. Thus if we were touse this LU factorization in order to solve a system there would beno way to guarantee an accurate answer. This is called numericalinstability.
Remark 2: Suppose we attempted to solve system Ax = [1, 2]T
for x using this LU decomposition. The correct answer isx ≈ [1, 1]T . Instead, if we were to stick with thethree-digit-floating point arithmetic we would get an answer�x = [0, 1]T which is completely erroneous.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Need for partial pivoting
A =
�10−4 11 1
�but LU =
�10−4 11 0
�
Remark 1: Note that original a22 has been entirely “lost” fromthe computation by subtracting 104 from it. Thus if we were touse this LU factorization in order to solve a system there would beno way to guarantee an accurate answer. This is called numericalinstability.Remark 2: Suppose we attempted to solve system Ax = [1, 2]T
for x using this LU decomposition. The correct answer isx ≈ [1, 1]T . Instead, if we were to stick with thethree-digit-floating point arithmetic we would get an answer�x = [0, 1]T which is completely erroneous.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
General Theory
Partial Pivoting
Sequential Algorithm
Sequential Algorithm
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Row-Oriented Algorithm
Pipelined Implementation
Row-Oriented Algorithm
1 Determination of the local pivot element,2 Determination of the global pivot element,3 Exchange of the pivot row,4 Distribution of the pivot row,5 Computation of the elimination factors,6 Computation of the matrix elements.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Row-Oriented Algorithm
Pipelined Implementation
Row-Oriented Algorithm
Remark 1: The computation of the solution vector x in thebackward substitution is inherently serial, since the values of xkdepend on each other and are computed one after another. Thus,in step k the processor Pq owning row k computers the value of xkand sends the value to all other processors by a single broadcastoperation.
Remark 2: Note that the data distribution is not quite adequate.For example, suppose we are at the i th step and that i > m · n/pwhere m is a natural number. In that case, processorsp0, p1, . . . , pm−1 are idle since all their work is done until the backsubstitution step at the very end. Hence there is an issue with loadbalancing.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Row-Oriented Algorithm
Pipelined Implementation
Row-Oriented Algorithm
Remark 1: The computation of the solution vector x in thebackward substitution is inherently serial, since the values of xkdepend on each other and are computed one after another. Thus,in step k the processor Pq owning row k computers the value of xkand sends the value to all other processors by a single broadcastoperation.Remark 2: Note that the data distribution is not quite adequate.For example, suppose we are at the i th step and that i > m · n/pwhere m is a natural number. In that case, processorsp0, p1, . . . , pm−1 are idle since all their work is done until the backsubstitution step at the very end. Hence there is an issue with loadbalancing.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Row-Oriented Algorithm
Pipelined Implementation
Row-Oriented Algorithm
Remark 2: Note that the data distribution is not quite adequate.For example, suppose we are at the i th step and that i > m · n/pwhere m is a natural number. In that case, processorsp0, p1, . . . , pm−1 are idle since all their work is done until the backsubstitution step at the very end. Hence there is an issue with loadbalancing.
block-cyclic row distribution
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Row-Oriented Algorithm
Pipelined Implementation
Row-Oriented Algorithm
Remark 3: We could also consider column-oriented datadistribution. In that case, at the k th step the processor that ownsk th column would have all needed data to compute the new pivot.On the one hand, this would reduce communication betweenprocessors that we had when considering row-oriented datadistribution. On the other hand, with column orientation pivotdetermination is all done serially. Thus, in case when n >> p(which is often case) row oriented data distribution might be moreadvantageous since the pivot determination is done in parallel
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Row-Oriented Algorithm
Pipelined Implementation
Pipelined Implementation
1 Challenges and possible approachesSend ahead strategy
Challenges with pivotingColumn pivoting
2 AdvantagesIt allows asynchronous execution: “processes can reduce theirportions of the augmented matrix as soon as the pivot rowsare available”It allows processes to overlap communication and computationtimes.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Row-Oriented Algorithm
Pipelined Implementation
Pipelined Implementation
1 Challenges and possible approachesSend ahead strategyChallenges with pivoting
Column pivoting2 Advantages
It allows asynchronous execution: “processes can reduce theirportions of the augmented matrix as soon as the pivot rowsare available”It allows processes to overlap communication and computationtimes.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Row-Oriented Algorithm
Pipelined Implementation
Pipelined Implementation
1 Challenges and possible approachesSend ahead strategyChallenges with pivotingColumn pivoting
2 AdvantagesIt allows asynchronous execution: “processes can reduce theirportions of the augmented matrix as soon as the pivot rowsare available”It allows processes to overlap communication and computationtimes.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Row-Oriented Algorithm
Pipelined Implementation
Pipelined Implementation
1 Challenges and possible approachesSend ahead strategyChallenges with pivotingColumn pivoting
2 AdvantagesIt allows asynchronous execution: “processes can reduce theirportions of the augmented matrix as soon as the pivot rowsare available”
It allows processes to overlap communication and computationtimes.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Row-Oriented Algorithm
Pipelined Implementation
Pipelined Implementation
1 Challenges and possible approachesSend ahead strategyChallenges with pivotingColumn pivoting
2 AdvantagesIt allows asynchronous execution: “processes can reduce theirportions of the augmented matrix as soon as the pivot rowsare available”It allows processes to overlap communication and computationtimes.
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Final Remarks
Compute communication and computation times.
Largely depends on what kind of distributed system used(usual configurations that have been studying are 1D and 2Dmeshes as well as hypercube).
Compute isoefficiency functions and determine scalability.Row (column) oriented algorithms, stable but not scalable.Pipelined implementation is perfectly scalable.
Structured systemsSymmetric/Hermitian.Banded and upper (lower) triangular.Positive definite (CG method).
Sparse systems and iterative methods (numerical PDEs)
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Final Remarks
Compute communication and computation times.Largely depends on what kind of distributed system used(usual configurations that have been studying are 1D and 2Dmeshes as well as hypercube).
Compute isoefficiency functions and determine scalability.Row (column) oriented algorithms, stable but not scalable.Pipelined implementation is perfectly scalable.
Structured systemsSymmetric/Hermitian.Banded and upper (lower) triangular.Positive definite (CG method).
Sparse systems and iterative methods (numerical PDEs)
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Final Remarks
Compute communication and computation times.Largely depends on what kind of distributed system used(usual configurations that have been studying are 1D and 2Dmeshes as well as hypercube).
Compute isoefficiency functions and determine scalability.
Row (column) oriented algorithms, stable but not scalable.Pipelined implementation is perfectly scalable.
Structured systemsSymmetric/Hermitian.Banded and upper (lower) triangular.Positive definite (CG method).
Sparse systems and iterative methods (numerical PDEs)
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Final Remarks
Compute communication and computation times.Largely depends on what kind of distributed system used(usual configurations that have been studying are 1D and 2Dmeshes as well as hypercube).
Compute isoefficiency functions and determine scalability.Row (column) oriented algorithms, stable but not scalable.Pipelined implementation is perfectly scalable.
Structured systemsSymmetric/Hermitian.Banded and upper (lower) triangular.Positive definite (CG method).
Sparse systems and iterative methods (numerical PDEs)
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel
Motivation
Gaussian Elimination
Parallel Implementation
Discussion
Final Remarks
Compute communication and computation times.Largely depends on what kind of distributed system used(usual configurations that have been studying are 1D and 2Dmeshes as well as hypercube).
Compute isoefficiency functions and determine scalability.Row (column) oriented algorithms, stable but not scalable.Pipelined implementation is perfectly scalable.
Structured systemsSymmetric/Hermitian.Banded and upper (lower) triangular.Positive definite (CG method).
Sparse systems and iterative methods (numerical PDEs)
Vasilije Perovic CS 6260: Gaussian Elimination in Parallel