Functional Program Testing

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-6, NO. 2, MARCH 1980

6) Include in the algorithm a limitation on system resourcesto be used, and optiniize within this constraint.7) Develop more sophisticated cost formulas-for example,

take record length and physical allocation of files and devicesinto account.8) Implement the algorithm and test it on a variety of

realistic applications to analyze its efficiency.

REFERENCES[1] A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The Design and

Analysis of Computer Algorithms. Reading, MA: Addison-Wesley, 1974.

[2] M. M. Astrahan et al., "System R: Relational approach to data-base management," ACM Trans. Database Syst., vol. 1, pp. 97-137, June 1976.

[3] L. L. Beck, "An approach to the creation of structured data pro-cessing systems," in Proc. 1976 ACM-SIGMOD nt. Conf Manag.,Data, 1976, pp. 179-188.

[4] -, "Automatic design of structured data processing systems,"Ph.D. dissertation, Southem Methodist Univ., Dalas, TX, Nov.1975.

[5] -, "On minimal sets of operations for relational data sublan-guages," Dep. Comput. Sd., Southem Methodist Univ., Dalas,TX, Tech. Rep. CS 7802, Feb. 1978.

[6] -, "A relational problem definition language for structureddata processing," in Proc. 1978 ACM Annu. Conf., 1978, pp.375-384.

[7] R. F. Boyce, D. D. Chamberlin, W. F. King, and M. M. Hammer,"Specifying queries as relational expressions: The SQUARE datasublanguage," Commun. Assoc. Comput. Mach., vol. 18, pp.621-628, Nov. 1975.

[8] D. D. Chamberlin and R. F. Boyce, "SEQUEL: A structuredEnglish query language," in 1974 Proc. ACM-SIGMOD Workshopon Data Description, Access, and Control, 1974, pp. 249-264.

[9] E. F. Codd, "A relational model of data for large shared databanks," Commun. Assoc. Comput. Mach., vol. 13, pp. 377-397,June 1970.

[10] -, "Relational completeness of data base sublanguages," inCourant Computer Science Symposia, Vol. 6: Data Base Sys-tems. Bnglewood Cliffs, NJ: Prentice-Hall, 1971, pp. 65-99.

[11] D. D. Chamberlin, "Relational data-base management systems,"ACM Comput. Surv., vol. 8, pp.43-66, Mar. 1976.

[12] C. J. Date, An Introduction to Database Systems. Reading, MA:Addison-Wesley, 1977.

[13] G. D. Held, M. R. Stonebraker, and E. Wong, "INGRES: A rela-tional data base system," in 1975 Proc. AFIPS Nat. Comput.Conf., May 1975, pp.409-416.

[141 A. S. Michaels, B. Mittman, and C. R. Carlson, "A comparison ofrelational and CODASYL approaches to data-base management,"ACM Comput. Surv., vol. 8, pp. 125-152, Mar. 1976.

[15] M. M. Zloof, "Query by example," IBM T. J. Watson Res. Center,Yorktown Heights, NY, Res. Rep. RC 4917, July 1974.

[16] -, "Query by example," in Proc. AFIPS Nat. Comput. Conf.,May 1975, pp.431-438.

Leland L. Beck received the B.A. degree inmathematics and physics from Rice University,Houston, TX, in 1965, and the M.A.S. degreein systems engineering and the Ph.D. degreein computer science from Southem Method-ist University, Dallas, TX, in 1972 and 1975,respectively.He has several years of computing experience

in industry and govemment and is presentlyAssistant Professor of Computer Science atSouthern Methodist University. His principal

interests are in software engineering, information systems, and databasemanagement.Dr. Beck is a member of the Association for Computing Machinery

and the IEEE Computer Society.

Functional Program TestingWILLIAM E. HOWDEN

Abstract-An approach to functional testing is described in which thedesign of a program is viewed as an integrated collection of functions.The selection of test data depends on the functions used in the designand on the value spaces over which the functions are defined. The basicideas in the method were developed during the study of a collectionof scientific programs containing errors. The method was the mostreliable testing technique for discovering the errors. It was found to besignificantly more reliable than structural testing. The two techniquesare compared and their relative advantages and limitations are discussed.

Index Terms-Effectiveness, experiments, Fortran, functonal, reli-ability, scientific, structural, testing.

Manuscript received December 6, 1978; revised August 4, 1979. Thiswork was supported by the National Bureau of Standards and wascarried out under the supervision of Dr. D. Fife.The author is with the Department of Mathematics, University of

Victoria, Victoria, B.C., Canada.

1. INTRODUCTIONIN THE "black box" approach to program testing, the in-

ternal structure of a program is ignored during test dataselection. Tests are constructed from the functional propertiesof the program that are specified in the program's require-ments. The disadvantage of the black box approach is that itignores important functional properties of the programs whichare part of its design or implementation and which are notdescribed in the requirements.Structural testing is an approach to testing in which the

internal control structure of a program is used to guide theselection of test data. It is an attempt to take the internalfunctional properties of a program into account during testdata generation and to avoid the limitations of black boxfunctional testing.

0098-5589/80/0300-0162$00.75 C 1980 IEEE

162

HOWDEN: FUNCTIONAL PROGRAM TESTING

Different structural testing methods have been proposed.Branch testing requires that tests be constructed in such away that every branch in a program is traversed at least once[1] -[81 . More ambitious methods require the testing of all"logical paths" [9] - [ 11]. The concept of a logical path isoften left undefined. If it is taken to mean any possible flowof control through a program, then the technique is imprac-tical. Programs containing loops may have an infinite numberof control paths. Even when control paths which traversethe same loop different numbers of times are not counted asdifferent logical paths, a simple program may still have a pro-hibitively large number of different logical paths. One ap-proach to this problem is to group control paths into sets andto test one path from each set [9] . Another is to require thatprograms be constructed as a hierarchy of simple abstractprocedures. The procedures should be small enough so thateach control path (up to multiple iterations of loops) througha procedure can be independently tested [12] .This paper describes an approach to program testing that was

developed during a project in which the error discovery effec-tiveness of different validation methods, including both blackbox functional testing and structural testing, was analyzedfor a collection of scientific programs containing errors. Theprograms were taken from edition five of the IMSL packageof statistical and numerical analysis programs [131. The IMSLroutines were selected because the IMSL package is well docu-mented and systematically maintained. The errors that wereanalyzed were those that occurred in edition five and were cor-rected in edition six, the current version of the package. Theerrors can be considered to be of some subtlety to have sur-vived to edition five status.The first part of the project involved a comprehensive and

integrated survey of known validation methods and of pre-vious effectiveness studies [14]-[161. During this part ofthe project, it became obvious that much has been writtenabout structural testing, but little about black box testing, themethod it was supposed to replace, and over which it wassupposed to be an improvement. A study of black box func-tional testing was carried out, and it was discovered that themethod could be refined in such a way that it could be used totest the important functional properties of the design of aprogram, as well as the functional properties that are part ofthe program's requirements.In the second part of the project, it was determined for each

of the errors in the programs which of the validation methodsincluded in the survey would be effective in revealing the error.In a study of 83 errors, testing was the most effective methodfor 42 errors and static analysis for 41 errors. The most effec-tive testing method was found to be the refined approachto black box functional testing which was developed duringthe survey phase of the project. The following sections of thepaper describe this method, called functional testing, andillustrate its use with several examples. Its effectiveness iscompared with that of structural testing.

II. FUNCTIONAL TESTINGA. Functions

In mathematics, a function is defined to be a set of orderedpairs (xi, yi). The first element of the ordered pair xi is aninput value and the second element yi is the corresponding

output value. In functional testing, a program is considered tobe a function and is thought of in terms of input values andcorresponding output values.Programs usually have one or more input and one or more

output variables. Each variable is defined over a set of possiblevalues called the domain of the variable. In the simplest case,domains contain numbers. They can also contain data struc-tures or other programs. Functional testing requires that theselection of test data be made on the basis of the importantproperties of the elements in the domains of a program's inputand output variables.

B. Numeric VariablesThe single important property of an element in the domain

of a numeric variable is its numeric value. Functional testingrequires that the set of allowable values for each input andoutput variable for a program be formally defined. The do-main of a numeric variable will usually consist either of a finiteset of discrete points or one or more subsegments of theintegers or reals. If the variable is an input variable, and itsdomain is a small set of discrete values, then tests should becarried out which involve each of the values. If the domainof the variable consists of one or more intervals of numbers,then tests should be carried out which involve the endpointsof the intervals and at least one value which is interior to eachinterval.In addition to the selection of data which tests a program

over the domains of its input variables, data must be selectedwhich result in the generation of output over the domains ofthe program's output variables. If a program has an outputvariable whose domain consists of a small set of discrete values,then tests should be carried out which result in the generationof each of the output values. If an output variable has a do-main which consists of one or more intervals of numbers, thentests should be carried out which result in the generation ofoutput values which lie at the endpoints of the intervals and atinterior points in each interval.

If an input or output variable can take on values of differenttypes (e.g., numeric or blank character), then tests should beconstructed which include each of the different types. Pro-grams should also be tested for illegal values. Input data shouldbe selected which lie outside the domains of input variables tocheck that the program guards against such values. Attemptsshould be made to construct test data that result in the genera-tion of output that lies outside the domains of the outputvariables.Experience with testing scientific programs indicates that a

complete set of functional tests should include certain testvalues which have special mathematical properties. Thesevalues may or may not lie at the endpoints of variable domainintervals or correspond to one of the discrete values in a smallfinite domain. The most important values appear to be zero,one, and real numbers which are small in absolute value.Special relationships between variables are also important.If several variables play functionally similar roles (e.g., thetwo input variables used to hold the degrees of freedom in achi-square distribution program), then separate tests shouldbe constructed which involve both identical and distinct valuesfor the variables. A similar rule should be applied to variablesthat are used for both input and output. A test should be

163


constructed in which a different value is generated on outputfrom that used for input. Another test should be constructedin which the same value is generated as output as that used forinput.In some programs, the domain of one variable is dependent

on the value of another variable. Variations of the above rulesare usually sufficient for the selection of functional test data.Suppose, for example, that x and y are input variables andthe allowable range of values for x consists of the reals in theinterval [0, y] . Suppose that y is constrained to have valuesin the interval [0, oo). The program should be tested forx=y=O,x=Oy, O<x=y, and O<x<y. It should alsobe tested for illegal values of the form y = 0 < x and 0 <y < x.

C. Arrays and VectorsScientific programs usually have at least one array or vector

variable. The elements in the domain of an array or vectorhave more complicated properties than the elements in thedomain of a numeric variable. In addition to the values of theelements of the arrays in the domain, the arrays also havedimensions.Functional testing requires that the set of allowable values

for the dimensions of the arrays in the domain of an arrayvariable be completely specified. This is in addition to thespecifications of the sets of allowable values for the individualelements of arrays. The allowable set of integers for an arraydimension is usually given as an interval of integers. Supposethat the endpoints and an interval interior value are selectedfor each dimension interval. These selections can be combinedto form 3k different sets of dimensions for a k-dimensionalarray. Tests should be constructed for each of these possiblesets of dimensions.The values of the elements of an array form a set that can be

treated either as a single entity or as a collection of values ofindividual variables. In special cases, an array can be thoughtof as having a single value (e.g., all elements of an array havethe same value or the array forms a unit matrix). If the ele-ments of an array are constrained to have values which lie inthe same interval, then it is possible to consider the domainsof the elements of an array as a single domain for the wholearray and to construct tests in which the array takes on bothendpoint values and values which are interior to the domaininterval.Arrays can be thought of as taking on special values other

than interval endpoint and interior values. There are twokinds of special array values. The first is defined in terms ofspecial values of individual array elements. An array has thespecial value zero if all of its elements have the value zero.The unit array involves a special pattern of values of arrayelements. The second kind of special value is defined in termsof special relationships between array elements. Some specialrelationships are problem independent. It has proved to beimportant, for example, to carry out tests in which the ele-ments of input arrays are distinct. It is also important to carryout tests in which the elements of input arrays are the same.In some examples, the special relationships depend on a pro-gram's input specifications. A program may require, forexample, that the elements of a particular input vector form

a monotonically increasing sequence. Test vectors shouldbe constructed which both satisfy and fail to satisfy therequirement.In certain examples, it has proved to be necessary to carry

out tests in which some of the elements of an array take onspecial or extremal values, but not all elements of the array.Tests should be constructed of this type, in which the ele-ments of an array are treated as individual variables.Functional testing requires the construction of tests in

which each of the possible special sets of values for an array'sdimensions is combined with each of the special sets of valuesfor its elements.

D. Array SubstructuresIn many applications, particular columns or rows or other

substructures of an array have a functional identity of theirown. An n by n + 1 array, for example, may be used torepresent a set of n linear equations in n unknowns. Eachrow is used to represent a single equation and is a functionallyidentifiable substructure of the array. It is necessary in func-tional testing to consider the value of an array taken as awhole, the values of individual array elements, and also thevalues of array substructures such as columns and rows.An array can be used to implement a hierarchy of substruc-

tures. The top of the hierarchy consists of the array itself.The highest level substructures are formed at the next lowestlevel. Each of these substructures is divided into lower levelsubstructures, and so on, until substructures consisting ofindividual array elements are reached.The application of functional testing to programs having

array substructures can be best illustrated by first consideringthe simple case in which there is a single intermediate level ofsubstructures. The substructures are assumed to be homoge-neous in the sense that all of the substructures have the samedimensions and different substructures do not play differentfunctional roles. The first step in constructing functional testdata is to consider the array as a single indivisible entity.Each set of special values for the array's dimensions is com-bined with the special values for the array taken as a whole.The next step is to consider the array as a collection of sub-structures. Each substructure is considered to be a singleindivisible element of the array. For each set of special valuesof the array's dimensions, array values are selected so that atleast one (but possibly not all) of the substructures takes oneach of the possible special values for that substructure. Thenext step is to consider the substructures independently, asseparate arrays or vectors. Dimensions should be selected forthe array so that the dimensions of the substructures take onall possible special values. For each of these substructuredimensions, array values should be chosen so that some (butpossibly not all) of the elements of some substructure take onall possible special values.The following example illustrates the use of functional

testing for a program that involves arrays which contain twolevels of functionally identifiable substructures. The programin this example computes eigenvectors and has an input arrayA. Pairs of columns in A are used to hold the real and imagi-nary parts of a complex input vector. One possible special

164


input value for testing the program is that in which A is zero(i.e., all vectors are zero). Another is that in which at leastone, but not all, input vectors are zero, and a third is that inwhich at least one, but not all, of the complex componentsof a vector is zero. Arrays, vectors, and complex componentscan also be considered which have elements whose values lieat the endpoints and in the interior of allowable intervals ofvalues. Each of these special values should be combined insome test with each of the allowable special values for thedimensions of the arrays and substructures.

E. Subroutine ArgumentsNumerical analysis and statistical programs often have input

variables whose values are the names of subroutines. Func-tional testing requires that each of the "functionally different"types of subroutines be included in at least one test. Suppose,for example, that it is necessary to test a program P which canbe used for computing a zero of a function. Suppose thatP has an input variable which is used to transmit the nameof a subroutine which can be used to compute values for thefunction. Assume that P is an iterative approximation pro-cedure for which convergence is not guaranteed, and thatP has another input variable whose value is an upper bound onthe number of approximation steps that are allowed before theprogram terminates with a failure message. Tests should beconstructed for at least two kinds of input functions: those forwhich P converges in the required number of steps and thosefor which it does not.

F. Combinations of Test ValuesIf a program has n input (or output) variables, each of which

can take on k special test values, then there are kn possiblecombinations of test values. For a program with six inputvariables, each of which has three special values, this canamount to more than 2000 combinations. In order to avoidthis combinatorial explosion, functional testing does notrequire the construction of tests involving all possible specialvalue combinations.

Situations occur in which an error will not be discoveredunless certain combinations of special values for input or out-put variables are included in at least one test. If the numberof input or output variables is small enough, then all possiblecombinations of special values should be used in tests in orderto force the discovery of errors like this. If the number ofvariables is too large, then one method that can be used is toidentify small "functionally related" subsets of variables andto require that tests be constructed which include all combina-tions of special values of the variables in each subset. Theconcept of a functionally related variable subset cannot beformally defined, but such subsets are often easily recognized.An example was discussed above in which two of the inputvariables to a statistics program were used to hold values forthe degrees of freedom for a chi-square distribution. Thesetwo variables are clearly functionally related. In anotherexample, a zero finding program had an input variable F forthe function for which a zero was required, and another inputvariable M which was used to provide an upper bound on thenumber of approximations allowed in the search for the zero

of F. The two input variables F andM are functionally related.Variables which appear in the same assignment statement orbranch predicate are often functionally related.In the studies that were carried out on the IMSL routines,

one of the methods that was evaluated combined functionaltesting with branch testing. It was discovered that this methodoften forces the consideration of combinations of input testvalues that are required to reveal an error. Recall that branchtesting requires that each branch in a program be tested atleast once on some test. The combined method requires thatevery possible branch be tested at least once for each specialvalue (e.g., domain endpoint) of each input or output variable.Suppose, for example, that x is an input variable for a programand that c is a functionally important special value for x.Suppose that b is a branch in the program and that ifx is giventhe value of c, there is some combination of values for theother input variables such that b will be traversed when theprogram is executed. Then in the combined approach, atest must be constructed for which x has the value c and whichresults in the traversal of b. The effect of this combinedmethod is to force the testing of the program with an inputvalue of c for x for each of the different computations whichare carried out in different parts of the program.

III. FUNCTIONAL TESTING AND PROGRAM DESIGNA. Design FunctionsThe preceding section described what is basically a black box

approach to functional testing. With the exception of theexamination of relevant substructures of arrays and vectors,the interior functional structure of programs was ignoredduring test data generation. The only function that was con-sidered during test data generation was the function computedby the program. The design and implementation of a programinvolves the integration and concatenation of a set of func-tions. In the approach to functional testing which is proposedin this paper, it is required that a complete set of functionaltests be constructed for each of the functions which are partof a program's design. Several different types of "designfunctions" are described below.In some cases, there are functions which are part of a pro-

gram's design which correspond directly to sections of theprogram and which can be easily identified. The simplestexample is that in which several functions are joined togetherto form a single program that has several related but indepen-dent functional capabilities. This is common practice in thedesign of scientific subroutines. A statistical program, forexample, may have an integer input variable which determineswhether the program carries out a lower one-tailed, an upperone-tailed, or a two-tailed hypothesis test. This type of func-tional structure is modeled by the diagram in Fig. 1. Thesolid lines describe "transfer of control" to "subfunctions."The dotted lines denote return of control. These transfersof control may not actually exist in the program, in whichcase they are part of the conceptual design rather than theimplementation of the program. The oval is a decision-makingmechanism.Programs are often divided into a sequence of functionally

meaningful sections of code, each of which computes some

165


.04

f

I I

II

I II

"I.,

Fig. 1 .. Pa-rallel functional capabilities.

Fig. 2. Sequential decomposition into subfunctions.

/-,

II--

Fig. 3. Control function.

program "subfunction." A numerical analysis routine, forexample, whose input data consist of an array of numbers maycontain an initial section of code which normalizes the valuesin the array. A complete set of functional tests should beconstructed for the section of code that implements the nor-

malizing function. Fig. 2 describes the structure of a programwhich consists of a sequence of functional sections of code.The control transfer arrows are drawn horizontally to indicatethat the different sections of code are all part of the same levelof abstraction in the program design. Vertical arrows are usedto denote decompositions in which a function at one level ofabstraction is divided into functions at lower levels.The occurrence of the functions described in Figs. 1 and 2 is

easy to recognize. The functions correspond directly to rela-tively independent pieces of code. They are computationalfunctions in the sense that they compute values that are eitherpart of the program's output or are used by other functions tocompute output values. Functions also occur in programs inmore subtle ways. Control functions are used in scientificprograms to choose between alternative computations and toterminate recursive or iterative processes. Fig. 3 describesthe role of a control function which is used to terminate an

approximation loop in an IMSL numerical analysis routinecalled ZBRENT. ZBRENT is a zero finding program whichis known to converge. ZBRENT is supposed to halt when

it has found an approximation which is correct to NSIGsignificant digits. It computes a sequence of values for twovariables B and C. The initial values of B and C are part ofthe input to ZBRENT and they must straddle a zero. Thecontrol function in ZBRENT should be tested over sequencesof values for B and C to determine if it correctly halts theapproximation process. The input to the control functionconsists of values for B and C, a value for NSIG, and a valuefor a fourth variable EP. The output consists of a sequenceof logical values which are all FALSE except for the last, whichiS TRUE. Each of the FALSE values should correspond tovalues for B and C which are not close enough to define anapproximation with the required degree of accuracy.The testing of design functions is important for several

reasons. Testing at the program level is not refined enoughto force the testing of all of the important computationalsubstructures in a program. Many programs involve variablesand data structures which are not functionally meaningfulwhen the program is viewed as a whole. Their functionalimportance is associated with particular design functions.Relevant test data involving these variables require the identi-fication of the design functions.Design testing is useful for discovering errors which might be

overlooked by black box functional testing. A program withan incorrect control function may operate "correctly," butinefficiently. Suppose, for example, that an iteration termina-tion function fails to halt an approximation procedure as soonas it should and allows many additional approximations to becomputed. The program may still give correct answers; itwill just operate more slowly than expected. Independenttesting of the iteration control function over appropriatelychosen sequences of values will reveal the error.

B. Intermediate Values and Test HarnessesFunctional testing of design functions requires that the

domains of all input and output variables for each design func-tion be fully documented. Sections of code which correspondto particular functions are normally preceded by comments ina program. The comments should be expanded to include thevariable domain information. Functions which do not corre-spond directly to sections of code may have to have theirvariable domains documented separately.

In order to test the design functions in a program, it is neces-sary to monitor the intermediate values of program variablesthat are used as input and output variables by the functions.Test harnesses like those provided by automatic test drivers[17] can be used both to monitor intermediate values and toartificially assign intermediate values to variables. It may bedifficult to select values for program input variables that resultin the computation of the correct intermediate variable valueswhich are required to test particular design functions. Testharnesses can be used to insert the required values into thefunction's input variables when the code associated with thatfunction is executed.

C. Testing Functions in ContextSuppose that a function f is part of the design of a program

P and that one of the input variables x ofP is an input variable

166


Method

1. Functional testing

black-box requirements functions

general computational design functions

detailed computational design functions

control design functions

total

2. Structural testing

branch testing

path testing

total

Fig. 4. Effectiveness of test data generation methods.

of the design function f. Suppose that f is associated with a

particular piece of code in P that is only executed when thepredicate x < 2 is true, and that the domain for x, as an inputvariable for P, is the interval (-oo,oo). 1f f is tested indepen-dently of its context in P, then the extremal or endpointvalues that will be chosen for x will be x = k and x = -k wherek is large. If f is tested "in context," then the predicate x < 2will influence the choice of functional test values for x. Theextremal or endpoint values defined by this predicate are

x = -k and x = 2. The predicate also defines an "illegal" valueof 2 + e for x where e is small.Some classes of errors will not be discovered unless the con-

text of a function is taken into account during the generationof test data for the function. If a design function is associatedwith a particular piece of code in a program, then its func-tional context is defined by the symbolically evaluated sys-tems of branch predicates that occur along the program pathswhich lead up to that piece of code. The predicates describethe subset of the input domain over which that function isused by the program [18]. If there is more than one path(there may be an infinite number), then it may not be possibleto take the complete functional context into account duringthe generation of test data for a function. There are manysituations, however, in which a single branch predicate in a

program is used to choose between alternative design functionsand it is sufficient to take the single branch condition intoaccount.

If the context for a function is defined by a complicatedpredicate involving several function input variables, then theextremal values of a subset of the variables may constrain thechoice of extremal values for the remaining variables. Thecontrol function in ZBRENT which is described in Fig. 3 isdefined in terms of two subfunctions fi and f2. Both fi andf2 are used to determine when the approximation procedurein ZBRENT should halt. The predicate T> 2 * EP * ABS(B)is used to choose between fi and f2. If the predicate istrue, then fi is selected; otherwise, f2 is selected. EP is con-

strained to lie in the interval [10-6, 10-1], Tin the interval[10-24, 10lVI, and B in (-oo, oo). Suppose that the extremalvalues "EP small" (i.e., 10-6) and "T large" (i.e., 10-') are

selected for EP and T. Then the largest value ofB that can beselected which allows the predicate to be satisfied is B ls0.The values EP 10-6, T 10-, and B - 10 should be usedin at least one test offi.In some examples, it may be necessary to choose between

conflicting functional testing requirements. The control func-tion f, in ZBRENT has an input variable C, in addition to theinput variables EP, T, and B. Recall that the termination ofZBRENT depends on the values ofB and C. In order to carryout a complete set of functional tests of fl, it is necessary togenerate a sequence of values for B and C which can be usedto verify that f, forces termination correctly. Suppose thatextremal values are selected for EP, T, and B. Then it maynot be possible to adequately test f, over an appropriatesequence of values for B and C and force C to take on ex-

tremal values at the same time.

IV. FUNCTIONAL VERSUS STRUCTURAL TESTINGFig. 4 describes the relative effectiveness of functional and

structural testing for the IMSL errors. The numbers withoutparentheses indicate the number of errors for which thecorresponding technique was the best or most appropriatemethod. Many errors could be detected by more then one

validation method. The numbers in parentheses indicate thetotal number of errors for which a technique was effective,including those errors for which some other technique was

better or more appropriateThe results in Fig. 4 distinguish between black box require-

ments functions, general computational design functions,detailed computational design functions, and control designfunctions. Black box functional testing is associated with thetesting of the requirements functions for a program. Generalcomputational design functions are associated with the majorcomputational components of a program. Detailed computa-tional design functions are similar to general computationaldesign functions, except that they are usually "smaller."Detailed computational design functions would not normallybe described in the design documents for a program, and theyare often constructed during the implementation rather thanthe design phase of program development.

Error Count

14 (20)

7 (9)

7 (9)

3 (5)

31 (38)

10 (13)

(3)

10 (16)

167


Functional testing was significantly more effective in dis-covering errors in the IMSL programs than structural testing.There were many examples of "partially correct" programpaths. This means that the paths compute correct values forsome, but not all, input data. Structural testing does notdistinguish between different values that cause the same pathto be traversed, and may result in the selection of input dataover which a partially correct path computes correct output.Functional testing forces the testing of programs over func-tionally important extremal and special values. In many cases,these are the types of values that are needed to force themanifestation of an error associated with a partially correctpath.Programs often involve design functions which correspond

to collections of program paths whose testing would not beforced by any practical problem-independent approach tostructural testing. Design functions exist that can only betested if program paths are executed which involve two, three,and four iterations of particular loops or combinations ofloops. An approach to structural testing which required theexecution of all paths which iterate loops less than or equalto four times would require the testing of an enormous num-ber of paths. The selection of the important subsets of thislarge number of paths cannot be effected without a considera-tion of the design functions in a program.The following two examples describe typical IMSL errors

for which functional testing, but not structural testing, isreliable. The first example involves a general computationaldesign function, and the second a detailed computationaldesign function.Example 1: BEPATS carries out a number of statistical

analyses of sample data from two random variables. Each anal-ysis corresponds to a separate program "subfunction." Theerror occurs in the variance test subfunction. The error in-volves an interchange of degrees of freedom in the computa-tion of a p-value. The error manifests itself in an incorrectoutput value from the subfunction, except in those caseswhere the degrees of freedom are equal. The error also fails toshow up in those cases where input values for the subfunctionare selected which, due to roundoff, cause the p-value to bezero or one.The input to the subfunction includes the two sample sizes

NI and N2 and the estimates S2 and S2 of the sample vari-ances. These variables have a close functional relationship.Functional tests should be constructed which involve com-binations of extremal values for these variables. One combina-tion of extremal values is that in which S' is large, S2 is small,NI is small, and N2 is large. Examination of F-statistic tablesreveals that an appropriate set of values having these propertieswouldbeoneinwhichS2/S2 = IO,NI = 1,andN2= 120. Theerror is revealed if these values are used to test the subfunction.The input values that are needed to reveal the error in

BEPATS are not associated with particular program branchesor logical paths. Structural testing may or may not resultin the discovery of the existence of the error. The error isguaranteed to manifest itself if tests are carried out in whichfunctionally important test values are assigned to the inputvalues for the design subfunction associated with the error.This is also true in the following example.

Example 2: ZX3LP is a linear programming routine. Theprogram receives its input data in an array A. The code ac-cesses A as though it has IR rows ofNcolumns each. In boththe calling programs and in ZX3LP, A is declared as an IA byN matrix where IA > IR.One of the first things ZX3LP does is to "pack" A so that

the data are stored in conformance with the use of A as anIR row matrix. This is necessary since the arrays are storedin column major format. One of the last things the programdoes is to "unpack" A back to its declared IA-row format.The unpacking process starts at the last (in terms of column

major ordering) element of the packed array and reads back-wards, unpacking one element at a time. The unpackingprocess uses a small detailed design function to compute therow location (A treated as an IA-row matrix) for the lastelement of A (A treated as a packed IR-row matrix). Therow location can be computed by finding the remainder whenIA is divided into N * IR. The answer will be in the range 0to IA-1. If it is zero, then the row index IA should be used.The function incorrectly uses zero.The error in this function is best revealed by considering the

range of output values which should be generated by the func-tion. The range is [1, IA] . If tests are constructed whichshould generate the extremal output values 1 and IA, thenthe presence of the error will be revealed. Zero is generatedinstead of IA.Although functional testing was more effective than struc-

tural testing for many of the IMSL errors, there were a numberof errors for which structural testing was not only reliable, butwas the most appropriate testing technique. There were threeclasses of situations in which branch testing was both effectiveand appropriate. The first is that in which a program errorcauses a program to generate incorrect output whenever aparticular branch is executed. The second occurs when a pro-gram error causes a section of code to become inaccessibleso that it is never executed. The third type of situation occurswhen errors cause unexpected branches to be traversed. Thefirst kind of error is detected by examining program outputvalues, and the second and third by looking at branch usageinformation.The first situation was fairly common in the IMSL programs

and was usually associated with error exit branches. Errorexit branches branch to sections of code that set error flagsand call error message subroutines. In some cases, the messagewas incorrect, and in others, an error flag was set incorrectly.There were five errors of this type, indicating that error pro-cessing had not been adequately tested. The following ex-ample describes an error situation of the second type.Example 3: VSORTM is a sorting routine that uses a quick-

sort type of algorithm for sorting vectors of numbers. Itcontains a small extra section of code for doing an insertionsort on vectors and subvectors whose length is less than somethreshold value. The error is due to an incorrect predicatewhich is used to decide when, to call on the insertion sort.It is called when the vector length is less than or equal to1 rather than when it is less than or equal to 1 1. The resultof this is that there is code in the insertion sort which cannever be executed. Branch testing will reveal the presenceof this error.

168


V. SUMMARY

There are three key concepts in the approach to functionaltesting which is outlined in this paper: the identification offunctionally important classes of input and output data, func-tional decomposition of data structures into design substruc-tures, and functional decomposition of programs into designfunctions. The identification of functionally important classesof input and output data requires that the allowable domainsof values for all variables be formally specified. Design sub-structures are undeclared subsets of formally declared datastructures which have a conceptually meaningful functionalidentity. The design functions in a program consist of theset of identifiable functions which were used to design andimplement the program. Some of the design functions in a

program may correspond to a specific piece of code. Others,such as control functions, may correspond to collections ofpaths or to pieces of code scattered throughout the program.In order to construct a set of functional tests for a program,

it is necessary to have a very deep understanding of how theprogram works. This understanding is necessary in order toisolate the design functions in the program. The same level ofunderstanding is not necessary for structural testing. All thatis required in structural testing is a mechanism for checkingthe validity of output values and a tool for monitoring branchand path coverage during program execution. The results ofthe IMSL experiments indicate that the extra effort and under-standing that is required to carry out functional testing willresult in increased testing effectiveness.Functional and structural testing should be viewed as com-

plementary rather than competing techniques. There were

a significant number of IMSL errors for which the combineduse of the methods was the most effective error discoverymethod. The combined method requires that each possiblebranch in the code associated with a function be tested foreach functional test value of each function input variable.Continuing research will include further study of these

programs as well as a study of the errors occurring in non-

numeric programs. The study of the nonnumeric programs

will be carried out to determine if different methods are

useful for nonnumeric programs than are useful for scientificprograms.

ACKNOWLEDGMENTThe author would like to thank IMSL, Inc. for its coopera-

tion in supplying listings for editions five and six of theirsubroutine library and for providing a complete set of main-tenance and recipients letters for both editions. He wouldparticularly like to thank L. L. Williams of IMSL for his kindassistance.

REFERENCES

[11 L. G. Stucki, "Automatic generation of self-metric software,"in Proc. 1973 IEEE Symp. Comput. Software Rel., 1973,pp. 94-100.

[21 J. C. Huang, "An approach to program testing," Comput. Surveys,vol. 7,pp. 113-128, 1975.

[31 E. F. Miller and R. A. Melton, "Automated generation of test

case data sets," in Proc. 1975 IEEE Int. Conf. Rel. Software,1975, pp.51-58.

[41 R. E. Fairley, "An experimental program testing facility," IEEETrans Software Eng., vol. SE-1, pp. 350-358, 1975.

[51 K. W. Krause, R. W. Smith, and M. A. Goodwin, "Optimal soft-ware test planning through automated network analysis," inProc. IEEE Symp. Comput. Software Rel., 1973, pp. 18-22.

161 S. R. Brown et al., "Automated software quality assurance," inProgram Test Methods, W. C. Hetzel, Ed. Englewood Cliffs,NJ: Prentice-Hall, 1973, pp. 181-204.

[71 C. V. Ramamoorthy and S. F. Ho, "FORTRAN automatic codeevaluation systems," Electron. Res. Lab., Univ. California,Berkeley, Rep. M-466, Aug. 1974.

[81 M. P. Page and J. P. Benson, "The use of software probes intesting FORTRAN programs," Computer, vol. 7, pp. 18-25,1974.

[91 W. E. Howden, "Methodology for the generation of programtest data," IEEE Trans. Comput., vol. C-24, pp. 554-560,1975.

[101 -, "Reliability of the path analysis testing strategy," IEEETrans. Software Eng., vol. SE-2, pp. 208-214, 1976.

[111 R. W. Wolverton, "The cost of developing large scale software,"in E. Horowitz, Ed., Practical Strategies for Developing LargeSoftware Systems. Reading, MA: Addison-Wesley, 1975, pp.73-100.

[121 E. Dijkstra, "Structured programming," in Software EngineeringTechniques, J. N. Buxton and B. Randell, Eds. Brussels, Belgium:NATO Science Committee, 1969.

[131 IMSL Library Reference Manual, Int. Math. and Statist. Libraries,Inc., Houston, TX, 1978.

[141 W. E. Howden, "A survey of dynamic analysis methods," inE. MiDler and W. E. Howden, Software Testing and ValidationTechniques, IEEE, 1978.

[151 -, "A survey of static analysis methods," in E. Miller andW. E. Howden, Software Testing and Validation Techniques,IEEE, 1978.

[161 -, "Empirical studies of software validation," in E. Miler andW. E. Howden, Software Testing and Validation Techniques,IEEE, 1978.

(171 D. J. Panzl, "Automatic software test drivers," Computer, vol. 11,pp.44-50,1978.

[181 W. E. Howden, "Symbolic testing and the DISSECT symbolicevaluation system," IEEE Trans. Software Eng., vol. SE-3,pp. 266-278, 1977.

[191 J. B. Goodenough and S. L. Gerhart, "Toward a theory of testdata selection," IEEE Trans. Software Eng., vol. SE-1, pp. 156-173, 1975.

William E. Howden received the Ph.D. degreein computer science from the University ofCalifornia at Irvine in 1973.He is currently an Associate Professor of Com-

puter Science at the University of Californiaat San Diego, on leave at the Department ofMathematics, University of Victoria, B.C., Can-ada. He has previously worked for AtomicEnergy of Canada and McDonnell DouglasAstronautics and has carried out extensive re-search projects in both the practice and theory

of program testing. He is the coauthor of the IEEE tutorial book Soft-ware Testing and Validation Techniques and has conducted professionalseminars on software validation in the United States, Japan, Canada,and Great Britain. He has published papers on a wide variety of topicsin program testing, and was recently invited to participate in the Uni-versity of Texas at Dallas' distinguished lecturers series. He has been aconsultant to branches of the U.S. and Canadian Governments and tolarge industrial organizations.

169

Functional Program Testing

Documents

Transcript of Functional Program Testing