Daniel Liu & Yigal Darsa - Presentation Early Estimation of Software Quality Using In-Process...

Daniel Liu & Yigal Darsa - PresentationDaniel Liu & Yigal Darsa - Presentation

Early Estimation of Early Estimation of Software Quality Using In-Software Quality Using In-

ProcessProcessTesting Metrics: A Testing Metrics: A

Controlled Case StudyControlled Case Study

Presenters:Presenters:

Yigal DarsaYigal Darsa

Daniel LiuDaniel Liu


ABSTRACTABSTRACT

o field quality of a product tends to field quality of a product tends to become available too late in the become available too late in the software development processsoftware development process

o A controlled case study conducted at A controlled case study conducted at North Carolina State UniversityNorth Carolina State University


ABSTRACT (cont’d)ABSTRACT (cont’d)

o Use of a suite of in-process metrics that Use of a suite of in-process metrics that leverages the software testing effort to leverages the software testing effort to provideprovide

an estimation of potential software field an estimation of potential software field quality in early software development quality in early software development phases,phases,

the identification of low quality software the identification of low quality software programsprograms


INTRODUCTIONINTRODUCTION

o True field quality cannot be measured True field quality cannot be measured before a product has been completed before a product has been completed and delivered to an internal or external and delivered to an internal or external customer. customer.

o Field quality is calculated using the Field quality is calculated using the number of failures found by these number of failures found by these customers.customers.


INTRO (cont’d)INTRO (cont’d)

• Because this information is available Because this information is available late in the process, corrective actions late in the process, corrective actions tend to be expensive.tend to be expensive.

• Software developers can benefit from an Software developers can benefit from an early warning regarding the quality of early warning regarding the quality of their product.their product.



• Early warning can be built from a Early warning can be built from a collection of internal metricscollection of internal metrics

• An internal metric, such as the An internal metric, such as the cyclomatic complexity (will be explained cyclomatic complexity (will be explained later), is a measure derived from the later), is a measure derived from the product itselfproduct itself



• An external measure is a measure of a product An external measure is a measure of a product derived from the external assessment of the derived from the external assessment of the behavior of the systembehavior of the system

i.e.: the number of defects found in test is an external i.e.: the number of defects found in test is an external measure.measure.

• Structural object-orientated (O-O) Structural object-orientated (O-O) measurements are being used to evaluate and measurements are being used to evaluate and predict the quality of softwarepredict the quality of software

i.e.: Chidamber-Kemerer and MOOD O-O metric suitesi.e.: Chidamber-Kemerer and MOOD O-O metric suites



The CK metric suite consists of six The CK metric suite consists of six metrics metrics

weighted methods per class (WMC), weighted methods per class (WMC), coupling between objects (CBO), coupling between objects (CBO), depth of inheritance tree (DIT), depth of inheritance tree (DIT), number of children (NOC), number of children (NOC), response for a class (RFC), response for a class (RFC), lack of cohesion among methods (LCOM).lack of cohesion among methods (LCOM).



o These metrics can be a useful early These metrics can be a useful early internal indicator of externally-visible internal indicator of externally-visible product quality in terms of fault-product quality in terms of fault-pronenessproneness



Section 2 outlines the STREW metric suite.Section 2 outlines the STREW metric suite.

Sections 3 discusses a controlled Sections 3 discusses a controlled

experiment, in which STREW metric suite experiment, in which STREW metric suite had been studied. had been studied.

Section 4 presents the experimental results. Section 4 presents the experimental results.

Finally, Section 5 presents the conclusions Finally, Section 5 presents the conclusions and future work.and future work.


Introduction Introduction STREW STREW

The metric used is called the Software Testing and Reliability Early Warning metric, also know as STREW.

STREW is a set of internal, in-process software metrics that are used to make an early estimation of post release field quality.

GOAL: early prediction of field quality.


Strew Metric SuiteStrew Metric Suite

Reasoning behind this metric suites:

Different from the traditional reliability estimation models, STREW puts a greater emphasis on internal software metrics, especially those involving the testing effort.



The use of the STREW metrics is based on the existence of an extensive collection of unit test cases being created as development proceeds.

During the initial stages of creating any project, such a unit test suite might not be available. In that case, historical data from a comparable project may be used.


Strew Metric SuiteStrew Metric Suite The STREW-J metric suite consists of nine

metric ratios.

The metrics are intended to cross-check each other and to triangulate upon an estimate of post-release field quality.

Each metric makes an individual contribution towards estimation of the post-release field quality but work best when used together.


Strew Metric SuiteStrew Metric Suite9 STREW metrics table. Categorized into three groups: test quantification

metrics, complexity and O-O metrics, and a size adjustment metric.



Test quantification metrics (SM1 to SM4):

Specifically intended to crosscheck each other to account for coding/testing styles.

i.e. One developer might write fewer test cases, each with multiple asserts checking various conditions. Another developer might test the same conditions by writing many more test cases, each with only one assert.



Test quantification metrics (SM1 to SM4):

Intended to provide useful guidance to each of these developers without prescribing the style of writing the test cases.



The complexity and O-O metrics (SM5 to SM8):

It examines the relative ratio of test to source code for control flow complexity and for a subset of the CK metrics.

These relative ratios for a product in development can be compared with the historical values for similar projects to indicate the relative complexity of the testing effort with respect to the source code.


Strew Metric SuiteStrew Metric Suite The complexity and O-O metrics (SM5 to SM8)

IN DETAILS:

The cyclomatic complexity metric: This software systems studies can be defined as the

number of linearly independent paths in a program.

It have shown that code complexity correlates strongly with program size measured by lines of code and is an indication of the extent to which control flow is used. The use of conditional statements increases the amount of testing required because there are more logic and data flow paths to be verified.


Strew Metric SuiteStrew Metric Suite The complexity and O-O metrics (SM5 to SM8)

IN DETAILS:

The larger the inter-object coupling, the higher the sensitivity to change. Therefore, maintenance of the code is more difficult. As a result, the higher the inter-object class coupling, the more rigorous the testing should be.

The number of methods and the complexity of methods involved is a predictor of how much time and effort is required to develop and maintain the class.

The larger the number of methods in a class, the greater is the potential impact on its children, since the children will inherit all the methods defined in the class.



The final metric is a relative size adjustment factor:

Defect density has been shown to increase with class size.

The difference of lines of code size is accounted for, because projects that uses the STREW metric prediction relative size adjustment factor will not all have the same LOC size.



Removal of Metrics:Removal of Metrics:

Some metrics were removed based on the lack of ability to contribute towards the estimation of post-release field quality.

Statement coverage Branch coverage Number of requirements/Source lines of code Number of childrentest/Number of childrensource

Lack of cohesion among methodstest/Lack of cohesion among methodssource


CONTROLLED EXPERIMENT CONTROLLED EXPERIMENT

o Research DesignResearch Design

o Case Study LimitationsCase Study Limitations


Research DesignResearch Design

To evaluate the predictive ability of STREW-To evaluate the predictive ability of STREW-J, a case study was carried out in a J, a case study was carried out in a junior/senior-level software engineering junior/senior-level software engineering course at NCSU in the fall 2003 semestercourse at NCSU in the fall 2003 semester


Research Design (cont’d)Research Design (cont’d) Students developed an open source EclipseStudents developed an open source Eclipse

• Eclipse is an open source integrated development Eclipse is an open source integrated development environmentenvironment

Plug-in in Java.Plug-in in Java.

Plug ins tested via:Plug ins tested via:• Unit test via JUnitUnit test via JUnit• Acceptance test via FIT toolAcceptance test via FIT tool

Groups of four or five junior and/or senior Groups of four or five junior and/or senior students submitted 22 projectsstudents submitted 22 projects


Research Design (cont’d)Research Design (cont’d)

SLOC: Source Lines of CodeTLOC: Test Lines of Code


Research Design (cont’d)Research Design (cont’d)

Evaluated by 45 black box test cases:Evaluated by 45 black box test cases:

• Exception checkingException checking

• Error handlingError handling

• Boundary test casesBoundary test cases

• Operational correctness of the plug inOperational correctness of the plug in


Case Study LimitationsCase Study Limitations

Students’ experiences vary. Not everyone is Students’ experiences vary. Not everyone is advancedadvanced

Experience is done academically and in ideal Experience is done academically and in ideal conditions; might not reflect industrial SW conditions; might not reflect industrial SW DevelopmentDevelopment

Eclipse plug ins are relatively small Eclipse plug ins are relatively small applications for industrial applicationsapplications for industrial applications


Limitations Limitations Experimental Results Experimental Results

Black box test failures/KLOC:

Approximated as the problems that would have been found by the customer had the project been released.


Experimental ResultsExperimental Results

Using the black box test failures/KLOC test quality obtained by running the 45 test cases, a multiple linear regression analysis was performed.

Difficulty with multiple linear regression analysis: There is multi-collinearity among the

metrics. This can lead to inflated variance in the prediction of post-release field quality.



To eliminate multi-collinearity, PCA was used:

PCA removed multi-collinearity and are orthogonal to each other. This means that changes in one component do not influence either of the other components, unlike the individual metrics.



Ability of STREW metrics: used to identify programs of low quality.

All programs having a Black box test failures/KLOC lower than the calculation from the equation below are of high quality and the remaining is of low quality.

Equation (lower bound) = µBlack box test failures – [(zα/2*Standard deviation of black box test failures/KLOC) / n ]


Experimental ResultsExperimental ResultsTable: Table: Overall classification of program quality

The estimate of the percentage correct classification is 90.9% (i.e. overall 20 of the 22 programs were correctly identified as high or

low quality programs, using STREW).


Results Results Conclusion Conclusion

From the experiment: 20 of the 22 programs were correctly identified as high

or low quality programs.

Feedback on potential field quality of software is very useful to developers because it helps identify weaknesses and faults in the software that require fixing.


CONCLUSIONCONCLUSION

• in most production environments, field in most production environments, field quality is measured too late to quality is measured too late to affordably guide significant corrective affordably guide significant corrective actionsactions

• in-process testing metric suite for in-process testing metric suite for providing an early warning regarding providing an early warning regarding post-release field quality measured by post-release field quality measured by black box test failures/KLOC, and for black box test failures/KLOC, and for identifying low quality programsidentifying low quality programs


CONCLUSION (cont’d)CONCLUSION (cont’d)

• STREW metric suite is a practical STREW metric suite is a practical approach to measuring software post-approach to measuring software post-release field qualityrelease field quality

• STREW-based logistic regression STREW-based logistic regression analysis is a feasible technique for analysis is a feasible technique for detecting low quality programs.detecting low quality programs.


ANY QUESTIONS!?ANY QUESTIONS!?

Why? What is it that you didn’t Why? What is it that you didn’t understand?understand?

Are you sure you read the article?Are you sure you read the article?

If you read and didn’t understand what If you read and didn’t understand what makes you think that we understood?makes you think that we understood?

Go easy on us!! Go easy on us!!

Daniel Liu & Yigal Darsa - Presentation Early Estimation of Software Quality Using In-Process...

Documents

Transcript of Daniel Liu & Yigal Darsa - Presentation Early Estimation of Software Quality Using In-Process...