Parallel Implementation of the Inversion of Polynomial Matrices Alina Solovyova-Vincent March 26,...

Post on 21-Dec-2015

222 views 3 download

Tags:

Transcript of Parallel Implementation of the Inversion of Polynomial Matrices Alina Solovyova-Vincent March 26,...

Parallel Implementation of the Inversion of Polynomial

Matrices

Alina Solovyova-Vincent

March 26, 2003

A thesis submitted in partial fulfillment of the requirements for the degree of Master of Science with

a major in Computer Science.

Acknowledgments

I would like to thank Dr. Harris for his generous help and support.

I would like to thank my committee members, Dr. Kongmunvattana and

Dr. Fadali for their time and helpful comments.

Overview

IntroductionExisting algorithmsBusłowicz’s algorithmParallel algorithmResults Conclusions and future work

Definitions

A polynomial matrix is a matrix which has polynomials in all of its entries.

H(s) = Hnsn+Hn-1sn-1+Hn-2sn-2+…+Ho,

where Hi are constant r x r matrices,

i=0, …, n.

Definitions

Example: s+2 s3+ 3s2+s s3 s2+1

n=3 – degree of the polynomial matrix

r=2 – the size of the matrix H

Ho= H1= …2 0

0 1

1 1

0 0

Definitions

H-1(s) – inverse of the matrix H(s)

One of the ways to calculate it

H-1(s) = adj H(s) /det H(s)

Definitions

A rational matrix can be expressed as a ration of a numerator polynomial matrix and a denominator scalar polynomial.

Who Needs It???

Multivariable control systemsAnalysis of power systemsRobust stability analysisDesign of linear decoupling controllers… and many more areas.

Existing Algorithms

Leverrier’s algorithm ( 1840)[sI-H] - resolvent matrix

Exact algorithms Approximation methods

The Selection of the Algorithm

Before

Buslowicz’s algorithm (1980)

After

Large degree of polynomial operations

Lengthy calculationsNot very general

Some improvements at the cost of increased computational complexity

Buslowicz’s Algorithm

Benefits:More general than methods proposed earlierOnly requires operations on constant matricesSuitable for computer programming

Drawback: the irreducible form cannot be ensured in general

Details of the Algorithm

Available upon request

Challenges Encountered (sequential)

Several inconsistencies in the original paper:

Challenges Encountered (parallel)

for(k=0; k<n*i+1; k++) {

}

Dependent loops

for (i=2; i<r+1; i++) {

calculations requiring R[i-1][k]

}

O(n2r4)

Challenges Encountered (parallel)

Loops of variable length

for(k=0; k<n*i+1; k++) {

for(ll=0; ll<min+1; ll++) { main calculations } }

Varies with k

Shared and Distributed Memory

Main differences Synchronization of the processes

Shared Memory (barrier) Distributed memory (data exchange)

for (i=2; i<r+1; i++) { calculations requiring R[i-1]

*Synchronization point }

Platforms

Distributed memory platforms:

SGI 02 NOW MIPS R5000 180MHzP IV NOW 1.8 GHz P III Cluster 1GHz P IV Cluster Zeon 2.2GHz

Platforms

Shared memory platforms:

SGI Power Challenge 10000 8 MPIS R10000

SGI Origin 200016 MPIS R12000 300MHz

Understanding the Results

n – degree of polynomial (<= 25)r – size of a matrix (<=25)Sequential algorithm – O(n2r5)Average of multiple runsUnloaded platforms

Sequential Run Times (n=25, r=25)

Platform Times (sec)

SGI O2 NOW 2645.30

P IV NOW 22.94

P III Cluster 26.10

P IV Cluster 18.75

SGI Power Challenge 913.99

SGI Origin 2000 552.95

Results – Distributed Memory

Speedup

SGI O2 NOW - slowdown

P IV NOW - minimal speedup

Speedup (P III & P IV Clusters)

Results – Shared Memory

Excellent results!!!

Speedup (SGI Power Challenge)

Speedup (SGI Origin 2000)

Superlinear speedup!

Run times (SGI Power Challenge)

8 processors

Run times (SGI Origin 2000)

n =25

Run times (SGI Power Challenge)

r =20

Efficiency

2 4 6 8 16 24

P IIICluster

89.7% 76.5% 61.3% 58.5% 40.1% 25.0%

P IVCluster

88.3% 68.2% 49.9% 46.9% 26.1% 15.5%

SGI PowerChallenge

99.7% 98.2% 97.9% 95.8% n/a n/a

SGI Origin 2000

99.9% 98.7% 99.0% 98.2% 93.8% n/a

Conclusions

We have performed an exhaustive search of all available algorithms;We have implemented the sequential version of Busłowicz’s algorithm;We have implemented two versions of the parallel algorithm;We have tested parallel algorithm on 6 different platforms;We have obtained excellent speedup and efficiency in a shared memory environment.

Future Work

Study the behavior of the algorithm for larger problem sizes (distributed memory).

Re-evaluate message passing in distributed memory implementation.

Extend Buslowicz’s algorithm to inverting multivariable polynomial matrices

H(s1, s2 … sk).

Questions