ANL--82-30 DE82 019798

Distribution Category:Mathematics and Computers (UC-32)

ANL--82-30

DE82 019798

ANL-82-30

ARGONNE NATIONAL LABORATORY9700 South Cass AvenueArgonne, Illinois 60439

UNPACK WORKING NOTE # 15.UNPACK -A PACKAGE FOR SOLVING LINEAR SYSTEMS

by

J. J. Dongarra and G. W. Stewart*

Applied Mathematics Division

DISCLAIME R

May 1982

*University of Maryland,College Park, Maryland 20742

CONTENTS

A S T R A C ........................................................................... ....................... iv

INTRODUCTION.............................................................................................1

H IS T O R Y ..................................................................... ...... .................. 2

UNPACK AND MATRIX DECOMPOSITIONS ................................................... 3

NOMENCLATURE AND MATRIX DECOMPOSITIONS.........................................6

THE SQUARE PART OF LINPACK...................................8

THE LEAST SQUARES PART OF UNPACK....................... .................... 13

NUMERICAL PROPERTIES.......................................1?

EFFICIENCY....... ........................ 19

DESIGN AND IMPLEMENTATION........... ........................ 22

T E S T IN G .............. ..................................................... ....... .... ............. . . 2 6

CONCLUSIONS ..................................................................... 29

ACKN OW LEG M ENTh................................................................................. 31

REFERENCES..............................................................................................31

ill

ABSTRACT

This paper describes the design, development, and use of thesoftware package called UNPACK, a collection of subroutines tosolve various systems of simultaneous linear algebraic equations.The package has been designed to be machine-independent and fullyportable and to run efficiently in many operating environments.

iv

UNPACK WORKING NOTE # 15.UNPACK -A PACKAGE FOR SOLVING LINEAR SYSTEMS

TJ. I. Dongarra *Applied Mathematics DivisionArgonne National Laboratory

Argonne, Illinois 60439

G. W. StewarttDepartment of Computer Science

University of MarylandCollege Park, Maryland 20742

INTIDDUCTION

UNPACK is a collection of Fortran subroutines that analyzeand solve linear equations and linear least squares problems. Thepackage solves linear systems whose matrices are general, banded,symmetric indefinite, symmetric positive definite, triangular, andtridiagonal. In addition, the package computes the QR and singularvalue decompositions of rectangular matrices and applies them toleast squares problems.

The software for LINPACK can be obtained from either

National Energy Software Center (NESC)Argonne National Laboratory9700 South Cass AvenueArgonne, Illinois 60439Phone: 312-972-7250Cost: Determined by NESC policy

orIMSLSixth Floor, NBC Building7500 Bellaire BoulevardHouston, Texas 77038Phone: 713-772-1927Cost: 875.00 (Tape included)

Requestors in European Organization for Economic Cooperation andDevelopment countries may obtain the software by writing to

0 Work supported in par. by the Applied Mathematical Sciences Research Program(KC-04-02) of the Office of Energy Research of the U.S. Department of Energy underF tract W-31-lO9-Eng-38.Work supported in part by the Computer Science Section of the National Science

Foundation under Contract MCS 760327.

LINPACK -A Package for Solving Linear Systems

NBA Data BankB.P. No. 9 (Bat. 45)F-91191 Gif-sur-YvetteFranceCost: Free.

The documentation for the codes is contained in the following bouk:

J.J. Dongarra, J.R. Bunch, C.B. Moler, G.W. Stewart,LINPACK Users' GuideSociety for Industrial and Applied Mathematics (1979)Cost: $17.00.

HISTORY

In June 1974 Jim Pool, then director of the Research Section ofthe Applied Mathematics Division (AMD) at Argonne National Labora-tory, initiated informal meetings to consider producing a package ofhigh-quality software for the solution of linear systems and relatedproblems. The participants included inembers: of i.e AMI) staff,visiting scientists, and various consultants and speakers at AMD col-loquia. It was decided that such a package was needed and that asecure technological basis was available for its production.

In February 1975 the participants met at Argonne to lay thegroundwork for the project and hammer out what was and was notto be included in the package. A proposal was submitted to theNational Science Foundation (NSF) in August 1975. NSF agreed tofund such a project for three years beginning January 1976; theDepartment of Energy also provided support at Argonne.

The package was developed by four participants working fromtheir respective institutions:

J.J. Dongarra Argonne National LaboratoryJ.R. Bunch University of California at San DiegoC.B. Moler University of New MexicoG.W. Stewart University of Maryland

Argonne served as e. center for the project. The participants metthere summers to coordinate their work. In addition Argonne pro-vided administrative and technical support, it was responsible for

2

J. J. Dongarra and G. W. Stewart

collecting and editing the programs as well as distributing them totest sites.

In the summer of 1976 the members of the project visitedArgonne for a month. They brought with them a first draft of thesoftware and documentation. It became clear that more uniformitywas needed to create a coherent package. After much discussionand agonizing, formats and conventions for the package were esta-blished. By the end of that summer an early version of the codesemerged, along with rough documentation.

The fall of 1976 and winter of 1977 saw the further develop-ment of the pack.ge. The participants worked on their respectiveparts at their home institutions and met once during the winter todiscuss progress.

During the summer of 1977 the participants developed a set oftest programs to support the codes and documentation that were tobecome LINPACK. This set was sent to 26 test sites in the fall of1977. The test sites included universities, government aboratories,and private industry. In addition to running the test. programs ontheir local computers and reporting any problems that occurred,the sites also installed the package at their computer centers andannounced it to their user communities. Thus, the package receivedreal user testing on a wide variety of systems.

As a result of the testing, in mid 1978 some changes wereincorporated into the package, and a second test version was sent tothe test sites. The programs were retested and timing informationreturned for the UNPACK routines on various systems.

By the end of 1978 the codes were sent to NESC and IMSL fordistribution, and the users' guide was completed and sent to SIAMfor printing. At the beginning of 1979 the programs and the docu-mentation were available to the public.

UNPACK AND MATRIX DECOMPOSITIONS

LINPACK is based on a decompositionL approach to numericallinear algebra. The general idea is the following. Given a probleminvolving a matrix A, one factors or decomposes A into a product ofsimpler matrices from which the problem can easily be solved. Thisdivides the computational problem into two parts; first the

3


computation of an appropriate decomposition, then its use in solvingthe problem at hand. Since LINPACK is organized around matrixdecompositions, it is appropriate to begin with a general discussionof the decompositional approach to numerical linear algebra.

Consider the problem of solving the linear system

Az = b, (1)

where A is a nonsingular matrix of order n. In older textbooks thisproblem is treated by writing (1) as a system of scalar equations andeliminating unknowns in such a way that the system becomes uppertriangular (Gaussian elimination) or even diagonal (Gauss-Jordanelimination). This approach has the advantage that it is easy tounderstand and that it leads to pretty computational tableaus suit-able for hand calculation. However, it has the drawback that thelevel of detail obscures the very broad applicability of the method.

In contrast, the computational approach begins with theobservation that it is possible to factor A in the form

A = LU, (2)

where L is a lower triangular matrix with ones on its diagonal and Uis upper triangular. * The solution to (1) can then be writ ton in theform

z = A-'b = U~'L-'b = U-lc,

where c = L-1b. This suggests the following algorithm for solving (1):

1: Factor A in the f orrmL (2);

2: Solve the system Lc = b; (3)

3: Solve the system U = c.

Since both L and U are triangular, steps 2 and 3 of the above algo-rithm are easily done.

The approach to matrix computations through decompositionshas turned out to be quite fruitful. Here are some of the advantages.First, the approach separates the computation into two stages: thecomputation of a decomposition and the use of the decomposition tosolve the problem at hand. These stages are exemplified by the con-trast between statement 1 and statements 2 and 3 of (3). In particu-lar, it means that the decomposition can be used repeatedly to solvenew problems. For example, if (1) must be solved for many right-

* This is not strictly true. It may be necessary to permute the rows of A (a processcalled pivoting) in order to ensure the existence of the factorization (2). In finiteprecision arithmetic, pivoting must be done to ensure numerical stability.

4


hand sides b, it is necessary to factor A only once. This mayrepresent an enormous savings, since the factorization of A is an0(n8) process, whereas steps 2 and 3 of (3) require only O(n2 )operations.

Second, the approach suggests ways of avoiding the explicitcomputation of matrix inverses or generalized inverses. This isimportant because the first thing a computationally naive personthinks of when faced with a formula like z = A'b is to invert andmultiply; and such a procedure is always computationally expensiveand numerically risky.

Third, a decomposition practically begs for new jobs to do. Forexample, from (2) and the fact that det (L) = 1, it follows that

det (A) = det(L)det (U) = det (U).

Since U is triangular, det (A) is just the product of the diagonal ele-ments of U. As another example, consider the problem of solvingthe transposed system A T z = b . Since z = A-Tb = (IU)Tb= L~ U-Tb, this system may be solved by replacing statements 2and 3 in (3) with

2': Solve UTc = b;

3': Solve LTz = c.

Note that it is not at all trivial to see how row elimination as it isusually presented can be adapted to solve transposed systems.

Fourth, the decompositional approach introduces flexibilityinto matrix computations. There are many decompositions, and aknowledgeable person can select the one best suited to his applica-tion.

Fifth, if one is given a decomposition of a matrix A and a sim-ple change is made in A (e.g., the alteration of a row or column), onefrequently can compute the decomposition of the altered matrixfrom the original decomposition at far less cost than the o. initiocomputation of the decomposition. This general idea of updating adecomposition has been an important theme during the past decadeof numerical linear algebra research.

Finally, the decompositional approach provides theoreticalsimplification and unification. This is true both inside and outside ofnumerical analysis. For example, the realization that the Crout,Doolittle, and square root methods all compute LU decompositions

5


enables one to recognize that they are all variants of Gaussian elimi-nation. Outside of numerical analysis, the spectral decompositionhas long been used by statisticians as a canonical form for mul-tivariate models.

UNPACK is organized around four matrix decompositions: theLU decomposition, the (pivoted) Cholesky decomposition, the QRdecomposition, and the singular value decomposition. The term L(Jdecomposition is used here in a very general sense to mean the fac-torization of a square matrix into a lower triangular part and anupper triangular part, perhaps with some pivoting. These decompo-sitions will be treated at greater length later, when the actual LiN-PACK subroutines are discussed. But first a digression on nomencla-ture and organization is necessary.

NOMENCLATURE AND CONVENTIONS

The name of a LINPACK subroutine is divided into a prefix, aninfix, and a suffix as follows:

TXXYY

The prefix T signifies the type of arithmetic and takes the followingvalues:

S single precisionD double precisionC complex

Where it is supported, a prefix of Z, signifying double precision

complex arithmetic, is permitted.

The infix XX is used in two different ways, which reflects a fun-damental division in the INPACK subroutines. The first group ofcodes is concerned principally with solving the system Ax = b for asquare matrix A, and incidentally with computing conditionnumbers, determinants, and inverses. Although all of these codesuse variations of the LU decomposition, the user is rarely interestedin the d' :.omposition itself. However, the structural properties ofthe matrix, such as symmetry and handedness, make a great deal ofdifference in arithmetic and storage costs. Accordingly, the infix XXin this part of LINPACK' is used to designate the structure of thematrix.

* Because this part deals exclusively with square matrices, it will be called the9uymrs ;w.

6

J. J. Dongarra and C. W. Stewart

In the square part the infix can have the following values:

GE General matrix -no assumptions about the structureGB General banded matrixPO Symmetric positive definite matrixPP Symmetric positive definite matrix in packed storage

formatPB Symmetric positive definite banded matrixSI Symmetric indefinite matrixSP Symmetric indefinite matrix in packed storage formatHI Hermitian indefinite matrix (with prefix C or Z only)HP Hermitian indefinite matrix in packed storage format

(with prefix C or Z only)GT General tridiagonal matrixPT' Positive definite tridiagonal matrix

The second part of UNPACK is called the least squares partbecause one of its chief uses is to solve linear least squares prob-lems. It is built around subroutines to compute the Cholesky decom-position, the QR decomposition, and the singular value decomposi-tion. Here nothing special is assumed about the form of the matrix,except symmetry for the Cholesky decomposition; however, it is nowthe decomposition itself that the user is interested in. Accordingly,in the least squares part of UNPACK the infix is used to designatethe decomposition. There are three options:

CH Cholesky decompositionQR QR decompositionSV Singular value decomposition

The suffix YY specifies the task the subroutine is to perform.The possibilities are

FA' Compute an LU factorizationCO' Compute an LU factorization and estimate the condition

numberDC Compute a decompositionSL Apply the results of FA. CO. or DC to solve a problemDI' Compute the determinant or inverseUD Update a Cholesky decompositionDD Downdate a Cholesky decompositionEX Update a Cholesky decomposition after a permutation

(exchange)

One small corner of LINPACK, devoted to triangular systems, isnot decornpositional, since a triangular matrix needs no reduction.Codes in this part are designated by an infix of TR, and the only two

Square part only.ILeast 3qu res part only.

7

UNPACK -A Package for Solving Linear Systems

sufFixes are SL and CO.

In addition to the uniform manner of treating subroutinenames, there are a number of other uniformities of nomenclature inLINPACK. A square matrix is always designated by A (Al' and AIM)in the packed and banded cases), and the dimension is always N.Rectangular input matrices are denoted by X, and they are always Nx P. The use of P as an integer is the sole deviation in the callingsequences from the FORTRAN implicit typing convention; it was doneto keep LINPACK in conformity with standard statistical notation.

Since FORTRAN associates no dope vectors with arrays, it isnecessary to pass the first dimension of an array to the subroutine.This number is always denoted by LDA or LDX, depending on thearray name. A frequent, though unavoidable, source of errors in theuse of LINPACK is the confusion of IDA (the leading dirmension of thearray A) with N (the order of the matrx A), which may be smallerthan IDA.

Since many UNPACK subroutines perform more than one task,it is necessary to have a parameter to say which ta V:s are to bedone. This parameter is always called JOB in I]INPACK, although themethod of encoding options varies from subroutine to subroutine.(The JOB parameter in SRSl; which has a lot to do, is of Byzan-tine complexity, although it is very easy to use once the trick isknown.)

The status of a computation on return from a UNPACK subrou-tine is always signaled by INFO. The name was deliberately chosento have neutral connotations, since it is not necessarily an errorflag. As with JOB, the exact meaning of INFO varies from subroutineto subroutine.

THE SQUARE PART OF UNPACK

The square part of UNPACK is well illustrated by the codeswith the infix GE, i.e., those codes dealing with a general squarematrix. The basic decomposition used by the GUI codes is of theform

PA = LU,

where L is a unit lower triangular matrix and U is upper triangular.The matrix P is a permutation matrix that represents row

8


interchanges made to ensure numerical stability. The algorithmused to determine the interchanges is called partial pivoting (cf.Forsythe and Moler [1967] or Stewart [1974]). The decomposition iscomputed by the subroutine SGEFA,* which uses Gaussian elimina-tion and overwrites A with L and U. (This is typical of the LINPACKcodes; the original matrix is always overwritten by its decomposi-tion.) Information on the interchanges is returned in IPVI'.

There are two GE subroutines to manipulate the decompositioncomputed by SGMA GES1. solves the system Az = b or the systemATZ = b, as specified by the parameter JOB. The solution xoverwrites the right-hand side b The subroutine SGEDI computesthe determinant and inverse of A. Because the value of a deter-minant can easily underfiow or overflow (if A is 6OxbO,det(10"A) = 10det (A)), the determinant is coded into an array DETof length two in the form

det (A) = DEI'(1)* 10** DEC (2).

As was indicated earlier, an explicit matrix inverse is seldomneeded for most problems. Consequently, SGEDI should be used tocompute the inverse only when there is no alternative. The inverseoverwrites the LU decomposition, so that the array A cannot beused in subsequent calls to SGESL and SEDI. It is technically possi-ble to recover A by factoring A' (SGEFA) and inverting it (SGfDI);but, owing to rounding errors, the matrix obtained in this way maydiffer from the original.

A perennial question that arises when linear systems aresolved is how accurate the computed solution is. The answer to thisquestion is usually cast in terms of the condition number, ic(A),defined by

,(A) = IAI||A~'I|, (4)

where 1|-|| is a suitable matrix norm. For example, if the systemAz = b is solved by SGEFA and SGESL in t -digit decimal arithmeticand y is the computed solution, then

11 -z ls; f (n)ic(A )10-t, (5)

where f (n) is a slowly growing function of the order n of A [Wilkin-son, 1963].

The UNPACK subroutine SGECO. in addition to factoring A,returns an estimate of c(A). Because ic(A) can become arbitrarily

* For definiteness the prefix S will be used for all [UNPACK codes, it being understoodthat D, C, and 2 are also options.

9

10 LINPACK -A Package for Solving Linear Systems

large, SGECO actually returns the reciprocal of ic(A) in the parame-ter RCOND. The user can inspect this number to determine theaccuracy of the purported solution. However, it must be borne inmind that the interpretation of the condition number depends onhow the problem has been scaled, a point that will be discussedfurther in the section on Numerical Properties.

Most of the remaining square part of LINPACK can be regardedas adaptations of the GE routines to special forms of squarematrices. One of the most frequently occurring forms is the positivedefinite matrix, where the matrix A is symmetric (AT = A) andsatisfies xTA : > 0 whenever z $ 0. In this case the LU factorizationcan be written in the form

A = RTR, (6)

where R is upper triangular.* This Cholesky factorization is whatthe subroutine SPOFA computes. The advantage of SPOFA overEEFA is that it requires half the storage and 'calf the work.

Specifically, since A is symmetric, only its upper half need bestored. Likewise, the upper triangular factor R in (6) can overwritethe upper half of A. The lower pare of the array containing A is notreferenced by SPOFA and hence can be used to store other informa-tion. The PO routines include an SI. subroutine to solve linear sys-tems and a DI subroutine to compute the determinant or theinverse.

One conventional way of storing a symmetric matrix is to packeither its lower or upper part into a linear array. The LNPACK sub-routine SPPFA computes the Cholesky factorization of a positivedefinite matrix with its upper part packed in the order indicatedbelow:

1 2 4 7 113 5 8 12

6 9 1310 14

15

The Cholesky factor R is returned in the same order. There arecorresponding SPPCO. SPPSL, and SPPDI subroutines.

When a symmetric matrix is indefinite (i.e., there are vectorsz and y such that yTAy < 0 < zTAz), it has no Cholesky factoriza-tion. However, there is a permutation matrix P such that PTAP canbe decomposed st.ably in the form

* This decomposition is often written A = L r L, where L is lower triangular. Theupper triangular form was chosen for its consistency with the QR decomposition.


PTAP = UDUT, (7)

where U is triangular and D is block diagonal with only 1xi or 2x2blocks. The subroutines SIFA and SSICt) compute this factorization,and SSIL and 3SIDI apply it, as usual, to solve systems or computedeterminants and inverses. EIDI also computes the inertia of A.There are corresponding packed-storage subroutines.

There are two exceptional aspects of symmetric indefinite pro-grams. First, the matrix 'in (7) is upper triangular, giving the fac-torization the appearance

as opposed to

for the classical LU factorization (2). This unusual factorization isrequired so that the ragorithm can both work with the upper half ofA (which keeps it consonant with the Po routines) and also remaincolumn oriented for efficiency (more on this later).

The other aspect concerns the passage from real to complexarithmetic. A complex positive definite matrix is generally requiredto be Hermit .2.n (i.e., equal to its conjugate transpose); hence thereis no ambiguAy in the properties of A for the CPOYY routines. In theindefinite case, a complex matrix may be either symmetric (AT = A)or Hermitian. This point was resolved by letting the C9YY routineshandle complex symmetric matrices and devising a new infix HI for

11

N

N N


Hermitian matrices.

In many applications the nonzero elements of A are clusteredaround the diagonal of A. Such a matrix is called a band matrix.The distance mj to the left along a row from the diagonal to thefarthest off-diagonal element is the lower band width; the distancem to the right from the diagonal to the farthest element is theupper band width. The structure of an 8x8 matrix with lower bandwidth one and upper band width two is illustrated below:

X X X0 0 0 0 0X x x x 0 0 0 0o X X X X 0 0 00 0 x x x x00o 0 0 X X X X uo0 0 0 x x x x

0 0 0 0 0 X X Xo 0 0 0 0 0ox x

By storing only the nonzero diagonals of a band matrix A, thematrix may be placed in an array of dimensions n x(mj + im, + 1),where n is the order of A. Thus, if mj and r, are fxed, the storagerequirements grow only linearly with increasing n, and it. is possibleto represent very large systems in a modest amount of memory.The amount of work required 'o solve band systems also growslinearly with n.

UNPACK provides routines SGBFA, SGBCO, SGBSL, and 5GBDI tomanipulate band matrices. Because pivoting is required for numeri-cal stability, the user must arrange the nonzero elements in annx(2mL + mr + 1) array. Several possible schemes for storing bandmatrices were considered for LINPACK. The final choice was dic-tated by a decision to have the GB routines reflect the loop struc-ture and arithmetic properties of the GE routines. For matrix ele-ments within the band structure, the two sets of routines performthe same arithmetic operations in the same order. The computedsolution to a band system obtained by SGBSL and the solution to thesame system stored as a full matrix and computed by SGESL shouldagree to the I&.;t bit.

Some users of LINPACK have found that its approach to storingband matrices is complicated and unnatural. However, a simple pro-gram, listed in the LINPACK Users' Guide, will automatically placethe elements where they belong. The subroutine SGBDI computesonly the determinant, since the inverse of a band matrix is not itselfa band matrix.


There are corresponding routines with infix PB for bandedpositive definite matrices. Since these are symmetric and requireno pivoting, the storage requirement is reduced to nx(m.n + 1), andthe work is correspondingly reduced. For the important case of tri-diagonal matrices (m' = m = 1), two special subroutines, 3TS,and SPTSL, are provided. The latter subroutine employs the 'Millay"algorithm, which simultaneously factors the matrix from each endto save looping overhead.*

THE LEAST SQUARES PART OF UNPACK

Although the least squares part of UNPACK has many andvaried applications, it will unify the exposition if we concentrate onthe linear least squares problem. Here we are given an nxp matrixX and an n-vector y and wish to determine a p-vector b such that

IIy -XbII =min, (8)where |1-|1 denotes the usual Euclidean norm. It can be shown that

any solution of (8) must satisfy the normal equatiors

Ab = c, (9)

where

A = XTX, c = XTy. (10)

At any solution the vector Xb is the projection of y onto the columnspace of X, and the residual vector r = y - Xb is the projection of yonto the orthogonal complement of the column space of X.

If the columns of X are independent, the normal equations (9)are positive definite. Consequently, the solution b can be obtainedby first calling SPOFA to compute the Cholesky factorization of Aand then calling SPOSL to solve (9). In fact, any least squares prob-lem involving an initial set of columns of X can be solved in thismanner. To see this, partition X in the form X = (X1 X2 ), where X1 isnxk, and consider the problem

||y - Xi6|| = min. (11)

Then the normal equations assume the form

A11b = c1,

where All = XTX 1. Moreover, if the Cholesky factor R of A is parti-tioned in the form

" My cond. burns at both nds;/ It ill n iotlast the night;/ TS, ah my foes, and,oh, my friends --/ It mins a lovely light. Edna St. Vincent Milay, 1899-1950.

13


R11 R12

R=[ 0 R 2]' (:2)

where R I1 is k xk, then

A 1 1 = Rf1R 11.

Thus, R1 1 is the Cholesky factor of All. From the point of view of LIN-PACK, this means that once the Cholesky factor of A has beenobtained from SPOFA, the reduced problem (11) can be solved bycalling SPOSL with k replacing p as the order of the matrix.

It frequently happens that the columns of the least squaresmatrix are linearly dependent, or nearly so. In this case it is neces-sary to stabilize the least squares solution in some way. Thy, subrou-tine SCHDC provides one way by effectively moving the dependentcolumns to the end of X. Specifically, when SCIIDC is applied to A, itproduces a permutation matrix P and a Cholesky factorization

PTAP = RTR (13)

that satisfies

ram rJ (j=k,k+1, ,p).

Thus in the partition (12), if the leading element of R 22 is small, allthe elements are small, and the last columns of XP are nearlydependent on the first ones. These may then be discarded, and aleast squares solution involving the initial columns of XP may beobtained by calling SPOSL as described above Incidentally, the ratio(r1i/rp,)2 from the pivoted Cholesky factorization is a reliable esti-mator of the condition number of A, and its reciprocal may be usedin place of the number ROND produced by SPOCO [Stewart, 1980].

In some applications, it is necessary to add or delete rowsfrom a least squares fit. For the addition of a row zx, this amountsto computing the Cholesky factorization of

X = A + zzT,

where A is defined in (10). The subroutine SCHUD (UD = update) pro-vides a way of computing the Cholesky factor I of r from that of A.This procedure is cheaper [o(p2)] than computing 2 from A and fac-toring it with SPOFA [O(p3 )]. SCHUD may also be used to solve leastsquares problems for which n is too large to allow X to fit into high-speed memory. The trick is to bring in X one row at a time and userepeated calls to SCHUD to incorporate the rows into R.

14


The deletion of a row amounts to computing the Choleskydecomposition of X = A - xxT from that of A, a process that issometimes called douindating. The subroutine SCIDD accomplishesthis; however, it is important to realize that while updating is a verystable numerical procedure downdat.ing is not, and the uncriticaluse of 9CHDD can result irn anomalous output.

In data analysis and model building it is often necessary tocompare the least squares approximations corresponding todifferent subsets of columns of X. We have seen that this can bedone if the subset in question can be moved to the beginning of X,that is, if the Cholesky decomposition of PTAP can be computedfrom that of A, where P is a permutation matrix such that XP hasthe selected columns at the beginning. The subroutine SCHEX (EX =

exchange) does this for a class of permutations from which all oth-ers can be built up.

It is a commonplace in numerical linear algebra that wheneverpossible one should avoid using the norma: equations to solve linearleast squares problems. One way of accomplishing this is by therow-wise formation of R described above. Another way is to use theUNPACK subroutines WRDC and WRSL to compute and manipulatethe QR decomposition of X. This decomposition has the form

Qrx =[g,

where Q is an orthogonal matrix and R is upper triangular. Fromthe orthogonality of Q it follows that

R T R = XQQTX = XrX,

which implies that the R factor in the QR decomposition of X is justthe Cholesky factor of XTX. It can further be shown that if Q is par-titioned in the form

p n -p

Q=(Qx )

then the least squares solution b satisfies

Rb = z,

where z = QJy. Moreover,

Xb =Qxz, r=y - Xb = Qs,

where s = Q y Thus the least squares approximation to y and itsresidual vecr can be obtained from the QR decomposition.

15


Since Q is an nxn matrix, it will be impossible to store itexplicitly, even when the nxp matrix X can be stored. To circum-vent this difficulty, SQRDC computes Q in the form

Q=H1H2 'H,,

where each H is a Householder transformation requiring onlyn-j+1 words of storage for its representation. Thus the entire QRdecomposition, consisting of R and the factored form of Q, can bestored in X and an auxiliary array of length p.

SIQa. manipulates the QR decomposition computed byWRDC. Under the control of the JOB parameter, SQRSL can returnthe vectors QTy, Qy, b, Xb, and r. Moreover, it can return theanalogous quantities corresponding to the first k columns of X, inthe same way as can be done with the Cholesky decomposition; cf.(10) and the following discussion.

WRDC also has a pivoting option, which results in the compu-tation of the QR decomposition of the permuted mat.rix XP. !i theabsence of rounding errors, the permutation matrix P is the sameas the one produced by the pivoting Cholesky decomp&iticn of XTX;cf. (13). Thus the pivoting option in SRDC can be used to estimatecondition numbers and detect near-degeneracies in rank. Whenn <p, the pivoting option can be used to collect a well-conditionednxn submatrix of X, providing one exists.

The final decomposition computed by IUNPACK is the singularvalue decomposition. As above, let X be an nxp matrix where, fordefiniteness, n p. Then there are orthogonal matrices U and Vsuch that

U IXV = ,t

where E = diag (a. -- ap) with

17as """ '' ap a -This decomposition is a supple theoretical tool that is widely used inanalysis involving matrices. If E is known, it can be used to detectdegeneracies and compute the condition number, which is a1/ o, inthe 2-norm. The decomposition is also required in many kinds of sta-tistical computations -ridge regression, canonical correlations, andcross validation, to name just three.

The singular value decomposition is computed by SVDC. whichimplements the only iterative algorithm in UNPACK. In addition to

16


E, SSVDC will return at the user's request any of 1', U, or the first pcolumns of U.

NUMERICAL PROPERTIES

The previous four sections addressed the question "What doesLINPACK do?" Here we shall consider the related question "How welldoes it do it?" In other words, how do UNPACK codes behave onactual computers? It will be convenient to divide this question intotwo parts and ask, first, how LINPACK codes perform in the presenceof inexact arithmetic and, second, how efficient the LINPACK codesare. The first question will be treated in this section and the secondin the next.

For the nonexpert there is perhaps no more bewildering sub-ject than rounding-error analysis. The =attitudes about rounding-error generally range from exaggerated fears that it hopelessly con-taminates everything to unjustified confidence that it is never reallyimportant. This is not the place to enter into a detailed discussion ofrounding-error analysis, which has been exhaustively ' eated b' Wil-kinson [1963, 1965] and others. However, it is impossible todescribe how rounding-error affects LINPACK codes without prrx;ent-ing some technical background.

The natural question to ask about a computed solution to aproblem is how accurate it is. In numerical linear algebra, however,the question is best asked in two stages. The first stage is to ask ifthe solution is stable. The term stable is used here in a very specificsense, which can be illustrated by considering the problem of solv-ing the system Ax = h. Suppose that this system is solved in t-digitdecimal floating-point arithmetic to give a computed solution 2.Then 2 is said to be stable if there is a small matrix E such that

(A + E)E = b, (14)

i.e., I satisfies the slightly perturbed system (14). The matrix E isrequired to be small in the sense that if a = max( Ia I) ande = max(|e. |), then e/ a = 0(10-'). Thus if eight digits are carriedin the computation and A is scaled so that its largest element is one,then the largest element of E must not exceed about 10-.

The notion of stability is useful because it frequently makesthe problem of accuracy moot. Suppose, for example, that a user ofUNPACK must solve a linear system ATzT = b, where AT is the true

17


value of the matrix and ZT is the true solution. Suppose, further -as often happens in practice - that owing to measurement or com-putational errors the matrix actually given to the LINPACK code is

A = AT + F. (15)

Then there are three "solutions" floating around: the true solutionzT; the exact solution z of Az = b; and finally the computed solution2 of Ax = b, which satisfies (14). Now often E in (14) will be smallerthan F in (15), and consequently x and x will be nearer to eachother than either is to ZT. In this case, the noise in the true matrixhas already altered the solution more than the subsequent errorsmade by the LUNPACK codes. If the user is unhappy with the answer,further computational refinements will not relieve the situation; theuser must go back and get more accurate data, i.e., a betterapproximation to AT.

All the decompositions computed by LINPACK are stable. Forexample, if R denotes the computed Cholesky factor of the positivedefinite matrix A, then FR is equal to A + E for some small matrixE. Again, the output of SRDC is the numbers that would beobtained by performing exact computations on X + E, where I issmall.

it is not to be expected that all things computed by [INPACKpre done stably, although most of them are. The solutions of linearsystems and linear least squares problems are stable, as are deter-minants. The updating routines are stable, with the exception ofSIHDD. The most widcpread unstable computation is that of theinverse.

Turning now from the question of stability to that of accuracy,we note that the inequality (5) already provides an answer in termsof the condition number K(A). Problems with large values of c(A) aresaid to be ill conditioned, and their solutions are very sensitive toperturbations in the matrix A. For the UNPACK routines, a rule ofthumb is that if K(A) = 10# one can expect to lose about k decimaldigits of accuracy in the solution. It is important to remember thatthis rule accounts only for rounding-errors made by LINPACK itself,and that it compares x and z defined as above; cf. (14). The relationbetween z and the true value ZT can be assessed only if the user iswilling to provide additional information about the error F in (15).

The condition estimate provided by the lINPACK routines withsuffix CO is generally reliable, although it can be fooled by highly


contrived examples [O'Leary, 1980]. For large systems it is cheap touse, requiring only 0(n2 ) operations as opposed to 0(n 3) for the fac-torization of A. The condition estimate obtained from the pivotedCholesky decomposition is about as reliable [Stewart, 1980]. Thesingular value decomposition provides a completely reliable compu-tation of K(A) in the pect rl matrix norm.

We close this discussion of stability and condition estimateswith a caveat. Bounds like (5) attempt to summarize the behavior ofcomputed solutions to linear systems with a few numbers, and it isnot surprising that something can be lost in this compression of thedata. In particular, neither the condition number (4) nor theinterpretation of the bound (5) is invariant under the scaling of therows and the columns of A. Exactly what is the proper scaling is acomplicated matter that at present is imperfectly understood. TheLJNPACK Users' Guide offers some advice, which must, however, beregarded as tentative.

The focus of the discussion up to this point has been on theeffects of finite precision arithmetic on the LINPACK routines. How-ever, something must also be said about how 1lNP \CK deals witn thefinite range of the arithmetic, i.e., with underflow and overflow. Theideal in this regard would be to produce programs that succeedwhen both the problem and its answer are representable it, the com-puter. Although the IINPACK programs do not attain this austeregoal, they come near it by scaling strategic computations in such away that overflows do not occur and underflows may be set to zerowithout affecting the results. Unfortunately this is not true every-where, and some UNPACK programs can be made to fail by givingthem data very near the underflow and overflow points. However, it.is safe to say that UNPACK goes far in freeing the u er rrom many ofthe difficulties associated with scaling.

EFFICIENCY

The efficiency of programs for manipulating matrices is noteasy to discuss, since features that speed up a program on one sys-tem may slow it down on another. In this section we shall discuss insome detail the effects of two aspects of LINPACK on efficiency: thecolumn orientation of the algorithms and the use of Basic LinearAlgebra Subprograms (BLAS).

There was a time when one had to go out of one's way to code a

19

20 UNPACK -A Package for Solving linear Systems

matrix routine that would not run at nearly top efficiency on anysystem with an optimizing compiler. Owing to the proliferation ofexotic computer architectures, this situation is no longer true. How-ever, one of the new features of many modern computers -namely,hierarchical memory organization - can be exploited by some algo-rithmic ingenuity.

Typically, a hierarchical memory structure involves asequence of computer memories, ranging from a small but very fastmemory at the bottom to a capacious but slow memory at the top.Since a particular memory in the hierarchy (call it M) is not as bigas tht memory at the next higher level (M), only part of the infor-mation in M' will be contained ii M. If a reference is made to infor-mation that is in M, then it is retrieved as usual. However, if theinformation is not in M, then it must be retrieved from M, with aloss of time. In order to avoid repeated retrieval, irforn tion istransferred from M to M in blocks, the supposition being that if aprogram references an item in a particular block, the next refer-ence is likely to be in the same block. Programs having this pro-perty are said to have locality of reference.

LINPACK uses column-oriented algorithms to preserve localityof reference. By column orientation is meant that the I NPACKcodes always reference arrays down columns, not across rows. Thisworks because FORTRAN stores arrays in column order. Thus, as oneproceeds down a column of an array, the mcmury referencesproceed sequentially. On the other hand, as one proceeds across arow the memory references jump, the length of the jump being pro-portional to the length of a column. The effects of column orienta-tion are quite dramatic; on systems with virtual or cache memories,the LINPACK codes 'vill significantly outperform comrarable codesthat are not column oriented.

There are two comments to be made about column orienta-tion. First, the textbook examples of matrix algorithms are seldomcolumn oriented. For example, the classical recursive algorithm forsolving the lower triangular system Lx = b is the following:


fori := 1 t.o0n loopzi :=bi;

for j .= i+1 toon loopzen oo=,-I ij x1

end loop;xi := Xi /Iii ;

end loop;

On the other hand, the column-oricn',.d algorithm is the following:

zx :=b3 (j:=12.nfor j := 1 ton loop

z3 := z / t ;fori:= j+1 ton loop

ze := o - lio, ;end loop;

end loop;.

From this example it is seen that translating from a row-orientedalgorithm to a column-oriented one is riot a mechanical procedure.An even more extreme example, cited in the discussion of matrixdecompositions, is the algorithm for symmetric indefinite matrices,where column orientation requires that the underlying decomposi-tion be modified.

The second comment is that LINPACK codes should not betranslated into languages such as PL/I or PASCAL, where arrays arestored by rows. Instead, row-oriented subroutines with the same cal-ling sequences should be produced. Unfortunately, doing this for allof LINPACK is a formidable undertaking.

Another important factor affecting the efficiency of UNPACK isthe use of the BLAS [Lawson et al., 1979]. This set of subprogramsperforms basic operations of linear algebra, such as computing aninner product or adding a multiple of one vector to another. In LIN-PACK the great majority of floating-point calculations are donewithin the BLAS. The reasons why the BIAS were used in UINPACK willbe discussed in the next section. Here we are concerned with theeffects of the BLAS on the efficiency of the programs.

The BIAS affect efficiency in three ways. First, the overheadentailed in calling the BLAS reduces the efficiency of the code. Thisreduction is negligible for large matrices, but it can be quite

21

22 IJNPACK -A Package for Solving Linear Systems

significant for small matrices. The point at which it becomes unim-portant varies from system to system; for square matrices it. is typi-cally between n = 25 and n = 100. If this should seem like an unac-ceptably large overhead, remember that on many modern systemsthe solution of a system of order twenty five or less is itself a negligi-ble calculation. Nonetheless, it cannot be denied that a personwhose programs depend critically on solving small matrix problemsin inner loops will be better off with BIAS-less versions of the IAN-PACK codes. Fortunately, the BLAS can be removed from thesmaller, more frequently used programs in a short editing session.

The BLAS improve the efficiency of programs when they arerun on nonoptimizing compilers. This is because doubly subscriptedarray references in the inner loop of the algorithm are replaced bysingly subscripted array references in the appropriate BILAS. Theeffect can be seen in matrices of quite small order, and for largeorders the savings are quite large.

Finally, improved efficiency can be achieved by coding a set ofBLAS to take advantage of the special features of the computers onwhich LINPACK is being run. For most computers, this ,ply meansproducing machine language versions. However, the code Can alsotake advantage of more exotic architectural features, :uh as vec-tor operations.

An important conclusion to be drawn from the foregoing dis-cussion is that on today's computers efficiency is not portable. Themodifications that make a program efficient on one system maymake it inefficient on another, and would-be designers of portablesoftware must be prepared to draw flak no matter what they do.

DESIGN AND IMPLEMENTATION

One of the aims of LINPACK was to provide easy-to-use softwarefor solving linear equations and least square problems. Althoughsuch software existed before LINPACK, the best codes were oftendifficult to obtain and not readily transportable across machines.The algorithms in UNPACK are built around four standard decompo-sitions of a matrix arid are not new. The contribution of UNPACKhas been in the directions of uniformity, portability, and efficiency.

The software in LNPACK owes its form to a set of decisionsmade early in the course of the project. Some of the decisions were


determined by the nature of the package, but others were arbitraryin the sense that other ways of proceeding would have workedequally well.

Two of the major decisions were already discussed in the sec-tion on efficiency: namely, the use of column-oriented algorithm5and the use of the BLAS. Against the former, one can argue that, insome algorithms, it prevents the user from obtaining greater accu-racy by accumulating inner products in double precision. We feltthat the sacrifice of this feature, which is not portable, was a smallprice to pay for the superior performance of LINPACK on systemswith hierarchical memories. The decision to use the T3LAS was moreproblematical. It was made in the absence of complete information,and the timings collected subsequently can be used to argue pro orcon, depending on one's application. If considerations of efficiencyare dropped, then the I3LAS are a clear plus, since they reduce theamount of code while they imprpve clarity.

Two other major decisions are rather controversial. The firstconcerns the absence of driver programs .o coordinate the LINPACKsubroutines. Most problems will require the use of two l.INPACK sub-routines, one to process the coefhcient matrix and one to process aparticular right-hand side. This modularity results in significantsavings in computer time when there is a sequence of problemsinvolving the same matrix but different right-hand sides. Such asituation is so common and the savings so important that no provi-sion has been made for solving a single system with just one subrou-tine. Actually, this should cause few problems for the user, sincethere is nothing in UNPACK as complicated as the LSPACK [Smithet al., 1976; Garbow et i., 1977] cedes, vsere many routines areneeded to sclve a given problein. Another reason for not provic'ingdriver programs is to keep the size of the package manageable.Given the way the package is structured, an extra driver for eachmatrix structure and decomposition would have significantlyincreased the number of routines.

The second controversial decision concerns error checking.No checks are made on quantities such as the order of the matrix orthe leading dimension of the array. The reason is that, except in thesimplest programs, the number rof things to check is very large andthe checking would add significantly to the length of the code. More-over, an elaborate data structure would have to be devised to reporterrors, so elaborate as to be beyond most casual users. Althoughthe UNPACK programs will inform the user if a computed

23


decomposition is exactly singular, no attempt is made to check fornear singularities, much less to recover from them. The reason isthat what constitutes a near singularity depends on how the prob-lem has been scaled, a process that is imperfectly understood atthis time. However, the user may monitor the condition number anddo something if it is too large.

The naming conventions and data structures for the packagehave already oeen discussed. By and large, the decisions were easy,either because they were necessary or because it did not makemuch difference as long as something was decided.

Each of the UNPACK routines has a standard prologue:

Subroutine statementDeclaration of subroutine parametersBrief description of the routineList of the input arguments, their types and dimensions,

and role in the algorithmList of the output arguments, their types and dimensionsDate and author of the routineExternal user routines needed, such as the BLASFortran functions usedDeclaration of variables for internal subroutine usage

The subroutine arguments are arranged in a specific order.The first three parameters are usually the matrix, A; the leadingdimension of the matrix, LDA; and the order of the matrix, N. As hasbeen pointed out, LDA and N should not be confused. Informationabout additional matrices, if any, is entered in the same way. Thelast two parameters are the JOB parameter, which tells the subrou-tine what to do, and the INFO parameter, which tells the user whatthe subroutine has done.

The parameters for a subroutine are declared in the orderINTEGER, LOGICAL, REAL DOUBLE PRECISION, COMPLEX, and COM-PLEX*16. They are arranged in order of appearance within eachclassification.

After a description of the subroutine's purpose there follows adescription of the parameters passed to and from each subroutine.This section is divided into two parts: information the routine needsfor execution, and information generated and returned from theroutine. These two parts are headed ON ENTRY and ON RETURN.

24


The parameters listed are typed; dimensioning information is sup-plied, as well as a brief description of tie function of the parame-ters. This way of specifying the input and cutput works well when thecalling sequence is short. However, when the parameter list is longand complicated, as in the singular value routine, the convention isnot so clean.

After the pararrter information, the author's name appearswith a date. This date shows when the routine wa !ast updated andis very important in maintaining the package.

Next is a list of routines used by the documented routine. Thislist has the following order: UNPACK routines used, BLAS used, andFortran functions used. Finally, the declarations fo the internalvariables are given.

Four versions of LINPACK exist, corresponding to the datatypes single precision, double precision, complex, and double preci-sion complex. Because the algorithms are essentially the same forall data types, we decided to write code for the complex versiononly. This complex master version was then processed by anautomatic programming-editing system called TAMPR [Boyle, 1980;Boyle and Dritz, 1974] to produce the other three types of pro-grams. This automatic processing of the complex version reducedthe coding and debugging time while contributing to the integrity ofthe package as a whole.

It was decided that the LINPACK programs should follow thecanons of the structured programming school; that is, they shoulduse only simple control structures, such as if -then -else. Sincethese structures do not exist in Fortran V, it was necessary to simu-!ate them. This process was greatly aided by the TAMPR system,

,which has the ability to recognize structure and generate appropri-ately indented code.*

The programs in UNPACK conform to the ANSI Fortran 66Standard [ANSI, 1966]. We adopted Fortran because it is thelanguage most widely used in scientific computations. Actually, thecodes are restricted to a subset of the Standard that excludes COM-MON, EXTERNAL, EQUIVALENCE, and any input or output statements.While the use of these excluded forms does not necessarily lead tounreadable, nonportable programs, it was felt that they should beavoided if possible [Smith, 1976].

" At one point the participants considered publishing the coding conventions bywhich the structures were implemented, but the appearance of Fortran 77 madeit pointless to do so.

26 LINPACK -A Packeg for Solving Linear Systems

The documentation of UNPACK is designed to serve both thecasual user and the person who must know the technical details ofthe programs. The LINPACK Users' Guide treats, by chapter, thevarious types of matrix problems solved by LINPACK. In addition,appendices give related information such as the BLAS, timing data,and program listings.

Each chapter is self-contained. A chapter usually consists ofseven sections:

OverviewUsageExamplesAlgorithmic DetailsProgramming DetailsPerformanceNotes and Peferences

The fir3t three sections of each chapter contain the user-orientedmaterial. Basic information on how to use the routinres and someexamples of common usage are presented. The technical material iscontained in the remaining sections. These give dcL'.iled descrip-tions of the algorithms and their implementations, as well as opera-tion counts and a discussion of the eects of rounding error. Thefinal section gives historical information and references for furtherreading.

TESTING

Only too often software is produced with little testing andevaluation. In the development of LINPACK, considerable time andefort were spent in designing and implementing a test package. Insome cases, the test programs were harder to design than the pro-grams they tested. The chief goal was to ensure the numerical sta-bility and the portability of the programs.

We have already observed that no program can be expected tosolve ill-conditioned problems accurately. Yet ill-conditioned prob-lems must be included in any test package, since these problemsoften cause an algorithm to fail catastrophically. This creates theproblem of how to judge the solution of an ill-conditioned problem.The answer adopted for the LINPACK tests was to demand that thesolution be stable -that is, that it be the exact solution of a slightlyperturbed problem. For example, in testing the program for


computing the QR factorization of a matrix X, it was required of thecomputed Q and R that QR reproduce X to within a modest multi-ple of the rounding unit. 'imilarly, the computed solution z ofb - Az was required to satisfy

flb -AjIII 111 I -

where a is near the rounding unit. This criterion is equivalent totesting for stability.

In problems such as matrix inversion, which are not donestably, the UNPACK programs were required at least to produceaccurate solutions to well-conditioned problems.

The programs were also tested at "edge of the machine thatis, the programs were given problems with entries near theunderflow point or near the overflow point of the machine. Our pur-pose was to test how close UNPACK came to being free of overflowand underflow problems. An interesting fact that emerged fromthese tests is that a 64-bit floating-point word with an 8-bit binaryexponent is a handicap in serious computations.

An important, though technically unachievable, INPACK goalwas to produce a completely portable package. Completely portablemeans that no changes need to be made to the software to run onany system The testing program was critical in approxima.it!g thisgoal. As the results came back, it was found that this or that systemhad unexpected, perverse features. As each of these problems wascircumvented, UNPACK came nearer and nearer to complete porta-bility.

The test programs that were used are distributed with thepackage. They will help spot major flaws in the installation of theUNPACK routines, although they are not designed to test the codesexhaustively. It must be admitted that the quality of the test pro-grams is not as high as UNPACK itself, although every effort wasmade to ensure their portability.

Timing information was collected on a wide range ofcomputer-compiler combinations. In the multiprogrammingenvironment of modern computers, it is often very dimcult to meas-ure the execution time of a program reliably. Significant variationscan occur, depending on the load of the machine, the niount of 1/Ointerference, and the resolution of the timing program. The timing

27


data were gathered by a number of people in quite different environ-ments. Our experience with gathering timing data indicates that wecan expect a variation of 10 to ,5 per cent if the timings arerepeated.

Execution times also vary widely from computer to computer.It was found that this variability can be reduced by dividing the rawtime for a process by the raw time for another process that involvesthe same amount of work. For example, the time required to exe-cute GESiL might be compared to the time required to computeA' . By the use of this scaling technique the authors of LNPACKwere able to extract meaningful results from the mass of raw timingdata collected during the testing. These arc reported in the LIN-PACK Users' Guide.

UNPACK was tested by various people on many 'lif!erent com-pilers and operating systems. The authors of l.,NPAClK performed theinitial testing on their own computers to ensure that the programswere working correctly. Then, the package was sent out to the testsites for further testing. To say that LINPACK wouWd have beenimpossible without the help of the test sites is not an Lxg2:retior.They twice nursed our test programs through their sysemrns, madethe routines available to their user communities, commented on thrdocumentation, and reported results. Listed below are Lth machinesused in the testing:

Amdahl 470/i16 Honey well 6030Burroughs 6700 IBM 360/91CDC Cyber 1?5 IBM 370.'158CDC 6600 IBM 360 / 16CDC 7600 IBM 370/163CRAY-I IBM 370/19DEC KL-20 Itel AS/bDEC KA-10 Univ ac 1110Data General Eclipse C330

As it turned out, few errors were revealed by the testing; onthe contrary most failures were due to errors in compilers andoperating systems. The test package is being used by CRAY as oneof their Fortran compiler tests. While the package does not exerciseall of the Fortran language, it does provide a good check of thenumerical environment

As a result of the extensive testing and c ur efforts in writingthe codes, LINPACK comes close to being a fully portable package.


There are no machine-dependent parameters or constants. The rou-tines run, without modification, on all Fortran-based systems weknow of.

CONCLUSIONS

In this final section we should like to offer some subjective con-clusions on the LINPACK project and its implementation. The readershould keep in mind that these conclusions are the opinions of thetwo authors of this paper and do not necessarily reflect those of theother UNPACK participants.

LINPACK was not funded as a software development project;rather the National Science Foundation regarded it as research intomethods for producing mathematical software. Although many useful ideas emerged from the project, it is safe to say the authors weremore interested in development of the package thon in softwareresearch. The reason for the curious rationale is thtL the NationalScience Foundation is constrained to help "research," whichexcludes software development. We feel that this :um 'raint is detri-mental to scientific endeavors in all fields and sheu!d somehow beremoved. At the very least, it could be recognized that softwaredevelopment, by its very nature, involves a great deal of unstruc-tured research, and this is sufAcient justification for supportingsuch projects.

The fact that the LINPACK authors were scattered across thecountry did not impede the project. This is important because suchan arrangement is a way of getting senior people from universitiesand other institutions involved in software development. In thesedays of computer networks, communications are not a problem. Butthe arrangement cces require that the participL..nts set aside two orthree weeks a yea ,to meet at a fixed location. There is no substitutefor face-to-face cuntact.

The major difficulty with the way the project was organized wasthat there was no senior member with final authority to decide hardcases. Instead, each participant was responsible for a specific partof the package, and common matters -such as nomenclature, pro-gramming conventions, and documentation - were decided by con-sensus. As might be expected, the process frequently degeneratedinto bickering, usually about matters that did not seem very impor-tant a few months later. Agreements were always reached, and we

29

30 UNPACK -A Package for Solving Linear Systems

do not feel that UNPACK suffered from the compromises. But wewould advise anyone embarking on a project of this sort to set up acourt of last resort, especially if more than three or four people areinvolved.

It goes without saying that LINPACK could not have succeededwithout the support of the Applied Mathematics Division at ArgonneNational Laboratory. Not only did they provide tangible support inthe form of offices, computer time, and secretarial assistance, butthey also handled the many administrative details associated withthe project. Most important of all, the members of the divisiontreated the project participants with warm hospitality.

Only recently have people become aware of how greatly thedevelopment of mathematical software is aided by appropriate com-puter tools. We were fortunate in having the TAMPR system availableto generate code from master complex programs and format itaccording to its structure. We also used the PFORT verifier to checkour programs for portability. Although, strictly speakin,. it is not asoftware tool, we found the WATFIV system with its extersi'e errorchecking useful in debugging our programs. We r.2grit. that we didnot have one of the mathematical typesetting systems that are nowappearing; if we had, we would undoubtedly have pr parcd the UIN-PACK Users' Chide on it.

At the beginning of the project, we decided to get as muchadvice from others as possible. To let people know what we weredoing, we distributed informal reports under the title ''IJNI'ACKWorking Notes." We also made early versions of the programs avail-able to those who requested them. Although we had to spend a greatdeal of time justifying specific decisions to people who would havedone things otherwise, the valuable suggestions we got more thancompensated for the trouble.

We learned not to expect that tests, however extensive, woulduncover all program bugs. By the time a well-written piece ofmathematical software has been run on two or three systems, mostof the obvious errors have been detected, and the remaining errorsare quite subtle. For example, the shift of origin in the singularvalue routine is calculated incorrectly. This was not discovered dur-ing testing because it had no dramatic effect on the converg.e_ ze :fthe algorithm and no effect at all on the stability of the result. By nomeans do we intend to imply that extensive testing is pointless, wehave already noted that the tests helped improve the portability of


LINPACK. But we found that there is considerable truth to theinverse of Murphy's law: if something must go wrong, it won't.

UNPACK is by no means perfect, and each of the participantshas his particular regrets. The condition estimator would have beenmore useful if it had been a 2-norm and null vector estimator Thepackage should contain programs for packed triangular matricesand for updating the QR decomposition. We could have spent moretime polishing the test drivers.

But perfection is an elusive thing. We are convinced that 1,1N-PACK is a good package and do not regret the time we spent produc-ing it. We are reminded of Darwin's advice to the would-be worldtraveler:

But I have too deeply enjoyed the voyage, not to recommend av""naturalist ... to start. Ile may feel assured, he will rn'et with nodifficulties or dangers, excepting in rare cases, ;?early so bad as hebeforehand anticipates. In a moral point of view, the effect ought tobe, to teach him good-humoured pat i.ence, freedom fr.rL. selfishnss,

the habit of acting for himself, and of making the best of everyoccurrence. ... Travelling ought also to teach him d'is ruist; but atthe same time he will discover, how many truly kindharted peoplethere are, with whom he never before had, or ever again will haveany further communication, who yet are ready to cffrer him themost disinterested assistance.

ACKNOWLFDGKENTS

This report will appear in the book entitled Sources and Develop-ment of Mathematical Software, edited by Wayne Cowell and to bepublished by Prentice-Hall in late 1982.

REFERKNCIE

ANSI [1966]. FORTRAN. ANS X3.9-1966, American National Stan-dards Institute, New York.

Boyle, J. [1980]. "Software Adaptability and Program Transforma-tion." Software F'rgincering. Eds. W. Freeman and P M. Lewis.Academic Press, New York, pp. 7:-90.

31


Boyle, J., and K. W. Dritz [1974]. "An Automated Programming Sys-tem to Aid the Development of Quality Mathematical Software."IFIP Proceedings, North-Holland, Amsterdam, pp. 542-546.

Dongarra, J. J., J. R. Bunch, C. B. Moler, tnr G. W. Stewart [1979].UNPACK Users' Guide. SIAM Publications, r niladelphia.

Forsythe and Moler [1967]. Computer Solution of Linear AlgebraSystems. Prentice-Hall, Englewood Cliffs, New Jersey.

Garbow, B. S., et al. [1977]. Matrix Ligensystem Routines -EISPACK Guide Extension. Lecture Notes in Computer Science,Vol. 51. Springer-Verlag, Berlin.

Lawson, C., R. Hanson, D. Kincaid, and F. Krogh [1979]. "BasicLinear Algebra Subprograms for Fortran Usage." .4CM ''rans. onMath. Soft., 5: 308-323.

O'Leary [1980]. "Estimating Matrix Condition Numbers." SIAMScientific and Statistical Computing, 2: 205-209.

Smith, B. T. [1976]. Fortran Poisoning and Antidotes In LectureNotes in Computer Science, Vol. 57. Portability of !NVumericalSoftware. Ed. W. Cowell, Springer-Verlag, Berlin.

Smith, B. T. et al. [ 1976]. Matrix Eigensystem Routines - EIA'4CKGuide. Lecture Notes in Computer Science, Vol. 6. 2nd ed.Springer-Verlag, Berlin.

Stewart, G. W. [1974]. Introduction to Matrix Computations.Academic Press, New York.

Stewart, G. W. [1980]. "The Efficient Generation of Random Orthogo-nal Matrices with an Application to Condition Estimators." SI.AMNumer. Anal., 17: 403-409.

Wilkinson, J. 1I. [1963]. Rounding Errors in Algebraic Processes.Prentice-ilall, Englewood Cliffs, New Jersey.

Wilkinson, J. H. [1965]. The Algebraic Eigenvalue Problem. OxfordUniversity Press, London.

ANL--82-30 DE82 019798

Documents

Transcript of ANL--82-30 DE82 019798