Fault prediction metrices
-
Upload
yogendra-singh -
Category
Documents
-
view
35 -
download
0
description
Transcript of Fault prediction metrices
-
The Economics of Fault Prediction
Submitted in partial fulllment of the requirements for the degree of
Master of Technology
by
Deepak Banthia
(1010102)
under the guidance of
Dr. Atul Gupta
Computer Science & Engineering
INDIAN INSTITUTE OF INFORMATION TECHNOLOGY,
DESIGN AND MANUFACTURING JABALPUR, INDIA
2012
-
Approval Sheet
This thesis entitled The Economics of Fault Prediction submitted by
Deepak Banthia (1010102) is approved for partial fulllment of the re-
quirements for the degree of Master of Technology in Computer Science and
Engineering.
Examining Committee
................................................
................................................
................................................
Guide
................................................
................................................
................................................
Chairman
................................................
Date .......................... ................................................
Place ......................... ................................................
-
Certificate
This is to certify that the work contained in the thesis entitled, The Economics
of Fault Prediction, submitted by Deepak Banthia (Roll No. 1010102) in
partial fulllment of the requirements for the degree of Master of Technology in
Computer Science and Engineering, has been carried out under my supervision and
that this work has not been submitted elsewhere.
(Atul Gupta) ............ , 2012
Associate Professor
Computer Science & Engineering Discipline
Indian Institute of Information Technology, Design and Manufacturing Jabalpur
Jabalpur, India.
-
Acknowledgments
This thesis would not have been possible without the sincere help and contri-
butions of several people. I would like to use this opportunity for expressing
my sincere gratitude to them.
Firstly I would like to thank god with whose blessing I could turn my idea
into reality. I express my deep sense of gratitude towards my mentor and
thesis supervisor Dr Atul Gupta for his valuable guidance, moral support and
constant encouragement throughout the thesis. His approach towards software
engineering will always be a valuable learning experience for me. No words can
express my feelings towards him for taking such a keen interest in my academics
and personal welfare. His dedication, professionalism and hard work has been
and shall be a source of inspiration throughout my life.
The contributions of a mother to the success of her child can be neither mea-
sured nor directly repaid. To such a mother, who is but a manifestation of
the divine virtues of the Earth, this report is one petite oering. Thank you
parents for all the liberty, prosperity, condence and discipline showered on
me.This thesis would not have been completed without the motivation and
blessing of my parents. My ance (Nisha) , brought a light inside me and al-
ways lled me with enthusiasm and vigour to do my jobs with complete eort
and dedication. Thanks to her for accompanying me in all the way and for
uninching help and support for all my endeavours. I would like to thank, my
uncle Mr. Hem Kumar Banthia and Mr. Khagendra Kumar Banthia for their
encouragement throughout my studies.Along with them, I also receive energy
and motivation from my sisters for my career. I would also like to give my
sincere thanks to Mr. Amaltas Khan, Mr. Arpit Gupta, Mr. Ravindra Singh,
Mr. Santosh Singh Rathore and Mr. Saurabh Tiwari for their support and
being there always, no matter what.
-
I thank the CSE fraternity at IIITDM Jabalpur and my special thanks to my
batch mates.
Jabalpur Deepak Banthia
..........., 2012
IV
-
Abstract
Fault-prediction techniques aim to predict fault prone software modules in
order to streamline the eorts to be applied in the later phases of software
development. Normally the eectiveness of a fault-prediction technique is
demonstrated by educating it over a part of some known fault data and mea-
suring its performance against the other part of the fault data. There have
been many eorts comparing the performance of various fault-prediction tech-
niques on dierent project datasets. However, invariably most of these studies
have also recorded high misclassication rate (normally, 15 to 35%), besides
not so high accuracy gures (normally, 70 to 85%). This raises serious concerns
about the viability of these techniques. In this thesis, we rst present a brief
summary of the results of some of the earlier studies undertaken in fault pre-
diction and argue about their usefulness. As a follow up, we then investigate
two important and related research questions regarding the viability of fault
prediction. First, for a given project, are the fault prediction results useful? In
case of an armative answer, then we look for how to choose a fault-prediction
technique for an overall improved performance in terms of cost-eectiveness.
Here, we propose an adaptive cost evaluation framework that incorporates cost
drivers for various fault removal phases, and performs a cost-benet analysis
for the misclassication of faults. We then used this framework to investigate
the usefulness of various fault prediction techniques in two dierent settings.
The rst part of the investigation consisted of performance evaluation of ve
major fault-prediction techniques on nineteen public datasets. Here, we found
fault prediction useful for the projects with percentage of faulty modules less
than a certain threshold, and there was no single technique that could provide
the best results in all cases i.e for all nineteen project datasets. In the other
part of the investigation study, and as a practical use of the proposed frame-
work, we have demonstrate that the fault information of the previous versions
-
of the software can be eectively used to predict fault proneness in the cur-
rent version of the software. Here, we found the fault prediction useful when
the dierence between inter-version fault rate was below a certain threshold.
Also, the usability of fault prediction found to be reduced with the increase of
inter-version fault rate.
VI
-
Contents
Approval I
Certicate II
Acknowledgments III
Abstract V
List of Figures IX
List of Tables X
List of Symbols XII
Abbreviations XIII
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Related Work 6
2.1 Fault Prediction Models . . . . . . . . . . . . . . . . . . . . . 6
2.2 Public Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3 Evaluation Measures . . . . . . . . . . . . . . . . . . . . . . . 10
2.3.1 Numerical measures . . . . . . . . . . . . . . . . . . . . 10
2.3.2 Graphical evaluation measures . . . . . . . . . . . . . . 12
2.4 Fault Prediction Studies . . . . . . . . . . . . . . . . . . . . . 13
2.5 Estimating Cost of Fault Prediction . . . . . . . . . . . . . . . 16
-
2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3 Fault Prediction Results: How useful They Are? 20
3.1 Issues in Fault Prediction . . . . . . . . . . . . . . . . . . . . 20
3.2 A Proposed Model for Evaluating Fault Prediction Eciency . 21
3.2.1 General arguments . . . . . . . . . . . . . . . . . . . . 23
3.2.2 Evaluation model . . . . . . . . . . . . . . . . . . . . . 23
3.3 Revisiting Fault Prediction Results . . . . . . . . . . . . . . . 24
3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4 A Cost Evaluation Framework 29
4.1 The Evaluation Framework . . . . . . . . . . . . . . . . . . . . 30
4.2 Experimental Study . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2.1 Experimental setup . . . . . . . . . . . . . . . . . . . . 33
4.2.2 Experiment execution . . . . . . . . . . . . . . . . . . 34
4.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.4 Experiment ndings . . . . . . . . . . . . . . . . . . . 43
4.2.5 Threats to validity . . . . . . . . . . . . . . . . . . . . 45
4.2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5 An Application of Cost Evaluation Framework for Multiple
Releases 50
5.1 The Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2 Experimental Study . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2.1 Experimental setup . . . . . . . . . . . . . . . . . . . . 53
5.2.2 Experiment execution . . . . . . . . . . . . . . . . . . 54
5.2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2.4 Threats to validity . . . . . . . . . . . . . . . . . . . . 59
5.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6 Conclusions and Future Work 61
References 63
Publications 70
Index 71
VIII
-
List of Figures
1.1 Thesis structure . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3.1 Cost statistics for faulty modules . . . . . . . . . . . . . . . . . 22
3.2 Cost statistics for non-faulty modules . . . . . . . . . . . . . . . 22
4.1 Decision chart representation to evaluate the estimated Ecost . 36
4.2 Value of NEcost for category 1 when u = 0.25 and s = 0.5 . 38
4.3 Value of NEcost for category 2 when u = 0.25 and s = 0.5 . 41
4.4 Value of NEcost for category 3 when u = 0.25 and s = 0.5 . 43
4.5 Cost characteristics of used fault-prediction techniques when u
= 0.5 and s = 0.65 . . . . . . . . . . . . . . . . . . . . . . . . 44
4.6 Cost characteristics of used fault-prediction techniques when u
= 0.25 and s = 0.5 . . . . . . . . . . . . . . . . . . . . . . . . 45
4.7 Cost characteristics of used fault-prediction techniques when u
= 0.15 and s = 0.25 . . . . . . . . . . . . . . . . . . . . . . . 46
5.1 Decision chart representation to evaluate the estimated Ecost . 52
5.2 Value of Ecost for Jedit versions when u = 0.25 and s = 0.5 59
-
List of Tables
2.1 Datasets used in the study . . . . . . . . . . . . . . . . . . . . 9
2.2 Confusion matrix . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.3 Fault Prediction Studies . . . . . . . . . . . . . . . . . . . . . 13
3.1 NASA datasets . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Experiment results for dataset CM1 . . . . . . . . . . . . . . . 26
3.3 Experiment results for dataset kc1 . . . . . . . . . . . . . . . . 26
3.4 Experiment results for dataset kc2 . . . . . . . . . . . . . . . . 27
3.5 Experiment results for dataset pc1 . . . . . . . . . . . . . . . 27
4.1 Removal costs of test techniques (in sta-hours per defect) [52] 30
4.2 Fault identication eciencies of dierent test phases [26] . . . 31
4.3 Used projects from NASA [1] and PROMISE data repository [1] 34
4.4 Categorization of projects based on the fraction of faulty modules 34
4.5 Result of experiment for PC1 (1109) . . . . . . . . . . . . . . 37
4.6 Result of experiment for AR1 (121) . . . . . . . . . . . . . . . 37
4.7 Result of experiment for NW1 (403) . . . . . . . . . . . . . . . 37
4.8 Result of experiment for KC3 (458) . . . . . . . . . . . . . . . 38
4.9 Result of experiment for CM1 (498) . . . . . . . . . . . . . . . 38
4.10 Result of experiment for PC3 (1563) . . . . . . . . . . . . . . 39
4.11 Result of experiment for ARC (234) . . . . . . . . . . . . . . . 39
4.12 Result of experiment for PC4 (1458) . . . . . . . . . . . . . . 39
4.13 Result of experiment for KC1 (2109) . . . . . . . . . . . . . . 40
4.14 Result of experiment for AR4 (107) . . . . . . . . . . . . . . . 40
4.15 Result of experiment for JM1 (10885) . . . . . . . . . . . . . . 40
4.16 Result of experiment for KC2 (522) . . . . . . . . . . . . . . . 41
4.17 Result of experiment for Camel 1.6 (858) . . . . . . . . . . . . 41
4.18 Result of experiment for Ant 1.6 (351) . . . . . . . . . . . . . 42
-
4.19 Result of experiment for Ant 1.7 (493) . . . . . . . . . . . . . 42
4.20 Result of experiment for MC2 (161) . . . . . . . . . . . . . . . 42
4.21 Result of experiment for J-edit 3.2 (272) . . . . . . . . . . . . 42
4.22 Result of experiment for Lucene 2.0 (195) . . . . . . . . . . . 43
4.23 Result of experiment for J-edit 4.0 (274) . . . . . . . . . . . . 43
5.1 Used projects from PROMISE data repository [2] . . . . . . . 53
5.2 Prediction results for Ant 1.6 . . . . . . . . . . . . . . . . . . 55
5.3 Prediction results for Ant 1.7 when fault prediction model trained
using Ant 1.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.4 Results of experiment to calculate the Ecost for Ant 1.7 using
information of Ant 1.6 . . . . . . . . . . . . . . . . . . . . . . 56
5.5 Prediction results for Jedit4.0 (3 cross-validation) . . . . . . . 57
5.6 Results of experiment to calculate the Ecost for Jedit4.1 using
information of Jedit4.0 . . . . . . . . . . . . . . . . . . . . . . 57
5.7 Prediction results for Jedit4.0 and Jedit4.1 (3 cross-validation) 57
5.8 Results of experiment to calculate the Ecost for Jedit4.2 using
information of Jedit4.0 and 4.1 . . . . . . . . . . . . . . . . . 58
5.9 Prediction results for Jedit4.0, Jedit4.1 and Jedit4.2. (3 cross-
validation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.10 Results of experiment to calculate the Ecost for Jedit4.3 using
information of Jedit4.0, 4.1 and 4.2 . . . . . . . . . . . . . . . 58
A1 Details of used metrics . . . . . . . . . . . . . . . . . . . . . . 71
A2 Metrics used in datasets . . . . . . . . . . . . . . . . . . . . . 72
XI
-
List of Symbols
Cf Normalized fault removal cost in eld
Ci Initial setup cost of used fault prediction approach
Cs Normalized fault removal cost in system testing
Cu Normalized fault removal cost in unit testing
Mp Percentage of modules unit tested
s Fault identication eciency of system testing
u Fault identication eciency of unit testing
-
Abbreviations
Acc Accuracy
AUC Area Under the Curve
Ecost Estimated Fault Removal Cost of the software when we
use fault prediction
EFN Estimated number of False Positives
EFP Estimated number of False Positives
ETP Estimated number of True Positives
FN False Negative
FNR False Negative Rate
FP False Positive
FPR False Positive Rate
NEcost Normalized Estimated fault removal cost of the software
when we use fault prediction
NPV Negative Predictive Value
PD Probability of Detection
PF Probability of False Alarm
PPV Positive Predictive Value
PR Precision
Tcost Estimated fault removal cost of the software without the
use fault prediction
TN True Negative
TP True Positive
-
Chapter 1
Introduction
Software fault prediction has become an important area of research in the arena
of Software Development Life Cycle. It has the potential to aid in ensuring
the desired software quality as well as to achieve an economic development
process. The potential of fault prediction is backed by its ability to identify
the fault prone software modules before the actual testing process begins. This
helps in obtaining desired software quality in optimum time, with optimized
cost and eort.
Most of the major development organizations spend a lot of time and eorts
on the research in the eld of quality assurance activities. But the practical
usage of fault prediction is equivocal. It indicates that there is a need of
further research in this eld that would emphasize on how it is applicable in
the quality assurance process.
1.1 Motivation
Software quality assurance process focuses on the identication and removal of
faults quickly from the artifacts that are generated and subsequently used in
the development of software. Fault prediction can help in this by identifying
the fault-prone modules in the early stages of development life cycle, which,
then can lead to a more streamlined eort to be applied. The fault-proneness
information not only points to the need for increased quality monitoring during
the development but also provides an important advice to undertake suitable
verication and validation activities that eventually lead to improve the eec-
-
1.1 Motivation 2
tiveness and eciency of the fault nding process.
Fault prediction is a process to predict the fault prone software modules
without executing them. Conventionally, fault prediction is done by apply-
ing machine-learning techniques over project datasets. The eectiveness of
a fault-prediction technique is demonstrated by educating it over a part of
some known fault data and measuring its performance against the other part
of the fault data. Recently, several software project data-repositories became
publicly available such as NASA Metrics Data Program [1] and PROMISE
Data Repository [2]. Availability of these public datasets has encouraged
undertaking more investigations and their replications. A wide range of fault-
prediction techniques has been applied to demonstrate their eectiveness on
these datasets [19][8][28][49][38].
However, there are certain crucial issues, which are required to be resolved
before the results of such prediction can be incorporated in practice. An
important concern is related with the lack of suitable performance evaluation
measures that would assess the economics of fault prediction if adopted in soft-
ware development process [6]. Another concern is about the typical prediction
accuracy of a fault-prediction technique, which is found to be considerably
low, ranging 70-85 percent [32][19][20], compared to the high accuracy results
obtained in other elds like image recognition, spam lters, etc. Yet another
concern can be attributed to the unequal distribution of fault data that may
lead to biased learning. We know from our experience that fault distributions
typically emulate Pareto principle, and hence, the accuracy gures obtained
from fault prediction can be grossly misleading, as a fault-prediction technique
can produce high accuracy results by mostly classifying non-faulty modules as
non-faulty.
The key functionality of fault prediction is to identify the highest possible
number of faults with the least possible resources. However, the concerns
mentioned above, in fact, pose serious threats for the fault prediction results to
be used to streamline quality assurance activities undertaken during software
development. We need to investigate further, what these results mean and
whether they can be used economically in the software development process.
The Economics of Fault Prediction
-
1.2 Objectives 3
1.2 Objectives
The main objective of this thesis work is to propose a cost evaluation frame-
work that helps to put the results of a fault-prediction technique in proper
perspective. If the results of the fault prediction are to be used in the devel-
opment process, the framework can provide an estimate of the saving in the
eorts applied in subsequent phases of the software development. Specically,
we aim to answer that for a given project dataset, whether fault prediction
would help. And if yes, then how to choose a fault-prediction technique that
would yield the optimal results.
With this dissertation, we will investigate:
Q1: For a given project, whether fault prediction would economically help in
software development?
Q2: If yes, then how to select a fault-prediction technique for overall optimum
performance?
1.3 Thesis Organization
The overall structure of this thesis can be illustrated as shown in Figure 1.1.
The content can broadly be divided into three major sections, namely Back-
ground Research, Research Contribution and Research Prospects.
Figure 1.1: Thesis structure
.
Chapter 2 summarizes the concepts which are relevant to the study. In particu-
lar, Fault prediction models, details of public datasets used in our experimental
The Economics of Fault Prediction
-
1.3 Thesis Organization 4
study, model evaluation techniques and literature review of previous related
studies are given in this chapter.
In chapter 3, we present an insight towards the economy of fault prediction. In
particular, we rst revisited the results of some of the previous fault prediction
studies on the basis of the economics of fault. Then, we rene the criteria based
on fault misclassication, and again measure the performance of the above-
said fault-prediction techniques on the basis of cost eectiveness. We have used
four NASA MDP datasets to perform our study. Here, our results suggested
that simple techniques like IBK perform better over most of the datasets.
In chapter 4, we have proposed a cost evaluation framework that can help
to answer both of the questions, using limited fault data. Essentially, the
framework can provide an estimate of the saving in the eorts applied by
using the results of the fault prediction in subsequent phases of the software
development. To construct the cost evaluation framework, we accounted for
typical fault removal cost of dierent testing phases [52], along with their fault
identication eciency [26]. The rst question can be answered by comparing
the fault removal cost in both the cases, i.e. with or without use of fault
prediction.
Here, we investigated the usefulness of fault-prediction techniques based on the
proposed framework by using limited fault data. The investigation consisted
of performance evaluation of ve major fault-prediction techniques on nineteen
public datasets. Here, we have used ve well-known fault-prediction techniques
namely Random Forest, J48 (C4.5 decision tree), Neural Network, K-means
Clustering and IBK (K-nearest neighbors). These datasets provide a wide
range of percentage faulty modules (varying from 7 to 49 percentages). We
categorized these datasets based on the fault information into three categories.
We have used WEKA machine learning tool to perform all listed experiments.
The results of this study suggested that, the fault prediction can be useful for
the projects with percentage of faulty module less than a certain threshold (in
our case, it varied from 21% to 42% over the specied range of testing phases
eciency). Also, there was no single technique that could provide the best
results in all cases.
In chapter 5, we show the application of the proposed cost framework over
the multiple subsequent releases of the software. We evaluated fault removal
The Economics of Fault Prediction
-
1.4 Summary 5
cost of the current version of software using the fault information available
from its previous versions. Then, this estimated fault removal cost helps to
decide, whether fault prediction is useful or not for the current version. To
answer both the research questions, we have investigated the usefulness of
fault-prediction techniques based on the framework on successive versions of
the two dierent software's namely Ant and Jedit. Here, we found the fault
prediction useful when the dierence between inter-version fault rate was be-
low a certain threshold (in our case, it was 2%). Also, the usability of fault
prediction found to be reduced with the increase of inter-version fault rate.
Here, the dierence between inter-version fault rate depicts, the dierence
between percentage faulty modules present in successive versions.
Finally we concluded the contribution of our research in Chapter 6. The future
prospects of our research are also discussed in the same chapter.
1.4 Summary
Fault-prediction techniques are used to identify faults in the software code
without execution it. So it has the potential to help validation and veri-
cation process by accurately identifying the faults. It may also help in an
economic software development process. But most of the organizations still do
not consider fault-prediction techniques while its potential has been validated
in couple of researches. It indicates that there is a need of further research in
this eld that would emphasize on how it can improve the quality assurance
process. In this chapter we highlight the issues in fault prediction arena and
summarized our work, which tried to put fault prediction results in the correct
perspective i.e. cost eectiveness.
The Economics of Fault Prediction
-
Chapter 2
Related Work
In this chapter, we summarized the concepts which are relevant to the study.
In particular, Fault prediction models, detail of public datasets used in the
research study, model evaluation techniques and literature review of previous
related studies are given in this section.
2.1 Fault Prediction Models
Fault prediction allows the tester to manipulate their resources more eectively
and eciently, which would potentially result in higher quality products and
lower costs. Fault prediction is typically employed by applying various ma-
chine learning algorithms on known properties learned from the project fault
datasets. The typical way of predicting faults in software modules include use
of software metrics and fault data (collected from previous releases or similar
projects) to construct a fault-prediction model. Then this model is used to
predict their fault proneness. For Example, a module under the scanner of a
fault prediction technique is detected as faulty, if it has the similar features
(metrics) value, compared to a faulty module that has been used to train the
fault prediction technique.
Many techniques have been proposed to estimate the fault-proneness of a soft-
ware module. Some of the proposed techniques are clustering, Decision Tree,
Neural Networks, Dempster-Shafer Belief Networks, Random Forest and Quad
Tree based K-Means [19][20][8][27][28][9][49].
Dierent Approaches for Fault Prediction Models-
-
2.1 Fault Prediction Models 7
A project manager needs to make sure a project met its timetable and budget
plan without loss of quality. In order to help project managers to make a
decision, fault prediction models play an important role to allocate software
quality assurance resources. Existing research in software fault-prone models
focus on predicting faults from these two perspectives:
The number of faults or fault density: This technique predicts thenumber of faults (or fault density) in a module or a component. These
models typically use data from historical versions (or pre-release parts)
and predict the faults in the new version (or the new developed parts).
For example, the fault data from historical releases can be used to predict
faults in updated releases [46][33][50][23].
Classication: Classication predicts which modules (components) con-tain faults and which modules don't. The goal of this kind of prediction
distinguishes fault free subsystems from faulty subsystems. This allows
project managers to focus resources to x faulty subsystems.
To construct fault prediction models we used methods. There are two methods
to classify fault-prone modules from fault free modules: supervised learning
and unsupervised learning. Both of them are used in dierent situations.
When a new system without any previous release is built, for the new devel-
oped subsystems (modules, components, or classes), in order to predict fault-
prone subsystems, unsupervised learning needs to be adopted. After some
subsystems are tested and put into function, these pre-release subsystems can
be used as training data to build software fault prediction models to predict
new subsystems. This is the time when supervised learning can be used. The
dierence between supervised and unsupervised learning is the status of train-
ing data's class, if it is unknown, then the learning is unsupervised, otherwise,
the learning is supervised learning.
Supervised Learning Learning is called supervised because; the method
operates under supervision provided with the actual outcome for each of the
training examples. Supervised learning requires known fault measurement
data (i.e. the number of faults, fault density, or fault-prone or not) for training
data. Usually, Fault measurement data from previous versions [46], pre-release
The Economics of Fault Prediction
-
2.2 Public Datasets 8
[44], or similar project [29] can act as training data to predict new projects
(subsystems).
Most research reported in fault prediction is supervised learning including
experiments in this dissertation. The learning result of supervised is easier to
judge than unsupervised learning. This probably helps to explain why there
are abundant reports on supervised learning in the literature and there are
few reports on unsupervised learning. Like most research conducted in fault
prediction, a data with all known classes is divided into training data and
testing data: the classes for training data are provided to a machine algorithm,
the testing data acts as the validation set and is used to judge the training
models. The success rate on test data gives an objective measure of how
well the machine learning algorithm performs. When this process is repeated
multiple times with randomized divided training and testing sets, it is the
standard data mining practice, called cross-validation. Like other research in
data mining, randomization, cross-validation, and bootstrapping are often the
standard statistical procedures for fault prediction in software engineering.
Unsupervised Learning Sometimes we may not have fault data or we may
have very little modules having previous fault data. For example, if a new
project is developing or previous fault data is not collected, supervised learning
approaches do not work because we do not have labeled training data.
Therefore, unsupervised learning approaches such as clustering methods may
be applied. However, research for this approach is seldom reported. As far
as the author is aware, Zhong et al. [55][56] are the rst group who investi-
gate this in fault prediction. They use Neural-Gas and K-means clustering to
class software modules into several groups, with the help of human experts to
identify fault-prone or not fault-prone to each group. Their results indicate
promising potentials for this unsupervised learning method.
2.2 Public Datasets
There are several software project data repositories become publically available
such as NASA Metrics Data Program [1] and PROMISE Data Repository
[2]. NASA MDP is a software project metrics repository provided by NASA
and is available to users through their website. NASA MDP data stores and
The Economics of Fault Prediction
-
2.2 Public Datasets 9
organizes the software metrics data and associated fault data at the module
level. Currently, there are thirteen projects datasets available. All NASAMDP
datasets are also available in PROMISE public repository. There are ninety
four defect datasets available in PROMISE. Therefore, these datasets can be
used to validate the performance of the various fault-prediction techniques. In
the experiments of this thesis work, we used twenty three public datasets from
NASA and Promise Data Repositories.
Table 2.1: Datasets used in the study
Project Faulty
(%)
Number of
Modules
Language Source
Jedit 4.3 2.23 492 Java PROMISE
pc1 6.94 1109 C NASA MDP
ar1 7.44 121 C PROMISE
nw1 7.69 403 C NASA MDP
kc3 9.34 458 Java NASA MDP
cm1 9.84 498 C NASA MDP
pc3 10.24 1563 C NASA MDP
Arc 11.54 234 C++ PROMISE
pc4 12.21 1458 C NASA MDP
kc1 15.46 2109 C++ NASA MDP
Jedit 4.2 13.07 367 Java PROMISE
ar4 18.69 107 C PROMISE
jm1 19.35 10885 C NASA MDP
kc2 20.5 522 C++ NASA MDP
camel1.6 21.91 858 Java PROMISE
ant1.6 26.21 351 Java PROMISE
Jedit4.0 24.5 306 Java PROMISE
Jedit4.1 25.32 312 Java PROMISE
ant1.7 27.79 493 Java PROMISE
mc2 32.3 161 C++ NASA MDP
jedit 3.2 33.09 272 Java PROMISE
lucene2.0 46.67 195 Java PROMISE
jedit 4.0 m 48.9 274 Java PROMISE
The details of these datasets are tabulated in Table 2.1. These datasets corre-
The Economics of Fault Prediction
-
2.3 Evaluation Measures 10
Table 2.2: Confusion matrix
Defect Present No Yes
Defect Predicted
No TN=True Negative FN=False Negative
Yes FP=False Positive TP=True Positive
spond to dierent programming language and have dierent software metrics
varying in size from eight to forty. The description of used datasets along with
their metrics is given in Appendix.(Appendix Table A1 and Appendix Table
A2)
2.3 Evaluation Measures
In this section, we summarize the various evaluation measures used by var-
ious researchers to evaluate the performance of a fault-prediction technique.
These measures can be broadly classied in two major categories- Numerical
measures and Graphical measures.
2.3.1 Numerical measures
All numerical measures can be derived from the Confusion matrix. A Con-
fusion matrix contains information about actual and predicted classications
done by a fault-prediction technique. Table 2.2 shows the confusion matrix
for a two class classication.
Accuracy:
The prediction accuracy of a fault-prediction technique is measured as
Accuracy =TN + TP
TN + TP + FN + FP(2.1)
False positive rate (FPR):
It is measured as the ratio of modules incorrectly predicted as faulty module
to the entire non-faulty modules. False alarm and type-1 error are similar as
FPR.
The Economics of Fault Prediction
-
2.3 Evaluation Measures 11
FPR =FP
TN + FP(2.2)
False negative rate (FNR):
It is measured as the ratio of modules incorrectly predicted as non-faulty
module to the entire faulty modules. Type-2 error is similar as FNR.
FNR =FN
TP + FN(2.3)
Precision:
It is measured as the ratio of modules correctly predicted as faulty to the entire
modules predicted as faulty.
Precision =TP
TP + FP(2.4)
Recall:
It is measured as the ratio of modules correctly predicted as faulty to the entire
faulty modules. Probability of detection (PD) is similar to recall.
Recall =TP
TP + FN(2.5)
F-measure:
It is measured as the harmonic mean of precision and recall. [36]
F measure = 2 Precision RecallPrecision+Recall
(2.6)
G-mean:
It is the Geometric mean. G-mean indices are dened in expressions (7) and
(8). G-mean1 is the square root of the probability of detection (PD) and
precision. G-mean2 is the square root of the product of PD and specicity.
[35]
Gmean1 =pPD Precision (2.7)
Gmean2 =qPD Specificity (2.8)
J-coecient (J-coe):
It tells about the performance of the prediction techniques more eectively.
[51]
The Economics of Fault Prediction
-
2.3 Evaluation Measures 12
J coeff = PD PF (2.9)
When J-coe is 0, the probability of detecting a faulty module is equal to
the false alarm rate. When J-coe is greater than 0, PD is greater than PF.
Whereas J-coe=1 represents the perfect classication, while J-coe=-1 is the
worst case because all modules are predicted inaccurately.
2.3.2 Graphical evaluation measures
Graphical measures depict the relationship between two or more numerical
measures. As the numerical measures all the graphical measures can be also
derived from the confusion matrix.
ROC curve [54]:
An ROC curve provides visualization of the tradeo between the ability to
correctly predict fault-prone modules (PD) and the number of incorrectly pre-
dicted fault-free modules (PF). The area under the ROC curve (denoted AUC)
is a numeric performance evaluation measure to compare the performance of
fault-prediction techniques. In ROC curves, the best performance indicates
high PD and low PF.
PR curve [14]:
An ROC curve provides visualization of the tradeo between Precision and
Recall. In a PR curve, x-axis represents Recall and y-axis is Precision. Recall
is another term for PD. In PR curve, the best performance indicates high PD
and high Precision.
Cost curve [15]:
A Cost curve provides the visualization on the cost of misclassication. It
describes the performance of a fault-prediction technique on the basis of cost
of misclassication. Its y-axis represents normalized expected misclassication
cost. It indicates the dierence between the maximum and the minimum cost
of misclassifying faulty modules. The x-axis represents the probability cost
function.
The Economics of Fault Prediction
-
2.4 Fault Prediction Studies 13
2.4 Fault Prediction Studies
In this section, we present the brief summary of some of the fault prediction
studies which are relevant to our study. In particular we summarized some of
the studies on fault-prediction techniques, some of the useful review journals
and research papers relevant to cost eectiveness of fault prediction. The
summarized studies are shown in Table 2.3.
These studies show that a lot of researches have been done in the eld of fault
prediction. But it requires more specic studies showing the eect of fault
prediction on software quality and its economics. In this thesis, we address
one of the major and complex problem in software fault prediction studies
i.e. how to compare the performance of dierent fault-prediction techniques
eectively? And as a solution we proposed a cost evaluation framework which
compare the performance on the basis of resultant fault removal cost.
Table 2.3: Fault Prediction Studies
S.
no
Study Fault-Prediction
Techniques used
Evaluation
metrics
Datasets
Used
Conclusion
1 Victor R. Basili,Lionel C. Briand,
and Walcelio L.
Melo, (1996) [5]
logistic regression
(univari-
ate and multivari-
ate regression)
regression
coecient,
p-value
Private
Datasets (8
datasets)
1. They found that C and K met-
rics was useful to predict class fault-
proneness during the early phases of
the development life-cycle. 2. They
concluded that, on their dataset, C
and K metrics was better predictors
than traditional code metrics.
2 S. S. Gokhale and
M. R. Lyu (1997)
[17]
Regression tree,
density modeling
techniques
Accuracy,
Type I
and Type II
error
Private
Dataset
(med-
ical Imaging
System)
1. They found that Regression
tree based technique has high pre-
diction accuracy then density tech-
nique. 2. it has lower misclassi-
cation rate as compared to density
based technique.
3 T. Khoshgof-
taar and N. Seliya
(2002) [32]
CART-LS,
CART-LAD and
S-PLUS
average ab-
solute error
(aae)
and average
relative er-
ror (are)
Private
Dataset
(from a large
telecommu-
nication
system)
1. They concluded that perfor-
mance of CART-LAD was found
better than the other two tech-
niques. 2.S-PLUS trees had poor
predictive accuracy.
4 Lan Guo , Bojan
Cukic and
Harshinder Singh
(2003) [19]
Dempster-Shafer
(D-S) belief net-
work, logistic re-
gres-
sion and discrim-
inant analysis
Specicity,
Sensitivity,
Overall Pre-
diction Ac-
curacy,
Probabil-
ity of False
Alarm,
Eort
KC2 1. Accuracy of D-S belief networks
was found higher than logistic re-
gression and discriminant analysis.
The Economics of Fault Prediction
-
2.4 Fault Prediction Studies 14
5 Lan Guo, Yan
Ma, Bojan Cu-
kic, and Harshin-
der Singh (2004)
[20]
Logistic Re-
gression, Discrim-
inant
analysis, Decision
Tree, Rule Set,
Boosting, Logis-
tic, Ker-
nel Density, Nave
Bayes, j48, IBK,
IB1, Voted Per-
ceptron, Hyper
Pipes, ROCKY
Accuracy,
Probability
of Detection
CM1, JM1,
KC1, KC2
and PC1
1. Random Forest generally
achieves higher overall prediction
accuracy and defect detection rate
than other. 2. Compared dierent
machine learning models
6 T. Menzies, J.
DiStefano, A. Or-
rego, R. Chap-
man (2004) [42]
Naive Bayes and
J48
Accuracy,
Preci-
sion, Proba-
bility of De-
tection and
Probabil-
ity of False
alarm
CM1, JM1,
KC1, KC2
and PC1
1. They concluded that perfor-
mance of Naive Bayes is better than
J48 algorithm. 2. They stated that
accuracy is not a useful parameter
for evaluation. 3. They suggested
use of fault prediction in addition
to inspection for better quality as-
surance activity.
7 A. Koru
and Hongfang Liu
(2005) [34]
J48 and Kstar F-measure,
Precision
and Recall
CM1, JM1,
KC1, KC2
and PC1
1. They suggested, it is better
to perform defect prediction on the
data that belong to large modules.
2. They showed that when de-
fect prediction was performed us-
ing class level metrics, gives better
performance as compare to method
level metrics.
8 Venkata
U.B. Challagulla,
Farokh B. Bas-
tani, I-Ling Yen
(2005) [13]
Linear
Regression, Pace
Regression, Sup-
port Vector Re-
gression, Neural
Network for con-
tinuous goal eld,
Support Vec-
tor Logistic Re-
gression, Neural
Network for dis-
crete goal eld,
Logistic Regres-
sion, Nave Bayes,
Instance
Based Learning,
J48 Tree, and 1-
Rule
Mean Abso-
lute Error
CM1, JM1,
KC1 and
PC1
1. Evaluate performance of dif-
ferent prediction models. 2.
Shows that combination of 1R and
Instance-based Learning gives bet-
ter prediction accuracy. 3. Also
showed that Size and Complexity
metrics are not sucient for e-
cient fault prediction.
9 Tibor
Gyimothy, Rudolf
Ferenc, and Ist-
van Siket (2005)
[21]
logistic regression
(univari-
ate and multivari-
ate regression ) ,
decision tree and
neural network
Precision,
Cor-
rectness and
Completeness
Mozilla 1.0
to Mozilla
1.6
1. Presented a toolset to calcu-
late the OO metrics from C++
software. 2. Shown how fault-
proneness changed over seven ver-
sions of Mozilla
10 U.B. Challagulla,
B. Bastani, I. Yen
(2006) [12]
Memory
Based Reasoning
(MBR) technique
Accuracy,
Probability
of Detection
(PD) and
Probabil-
ity of False
alarm (PF)
CM1, JM1,
KC1 and
PC1
1. They conclude that if accu-
racy is the only criteria, then simple
MBR with Euclidean distance per-
form better than other used tech-
niques. 2. They proposed a frame-
work that can be used to derive the
optimal conguration which gives
best performance for the given de-
fect dataset.
The Economics of Fault Prediction
-
2.4 Fault Prediction Studies 15
11 Yan Ma, Lan Guo
and Bojan Cukic
(2006) [39]
Logistic
Regression, Dis-
criminant Analy-
sis, Decision Tree,
Rule Set, Boost-
ing, Kernel Den-
sity, Naive Bayes,
J48, IBK, IB1,
Voted Percep-
tron, VF1, Hyper
Pipes, ROCKY,
Random Forest,
Modied Random
Forest
Probability
of De-
tection, Ac-
curacy, Pre-
cision,
G-mean1,
G-mean2,F-
measure
CM1, JM1,
KC1, KC2
and PC1
1. Proposed a novel methodology
based on variants of the random for-
est algorithm which is more robust
than random forest. 2. Compared
dierent machine learning models.
12 T. Menzies, J.
Greenwald and A.
Frank (2007) [43]
Nave bayes, J48
and log ltering
techniques
Probability
of Detection
(PD) and
Probabil-
ity of False
alarm (PF)
CM1, KC3,
KC4, MW1,
PC1, PC2,
PC3 and
PC4
1. They showed that data mining of
static code attributes to learn de-
fect predictor techniques is useful.
2. They concluded that used pre-
dictors was found useful for priori-
tizing a resource-bound exploration
of code that has to be inspected.
13 S Kanmani, Rhy-
mend Uthariaraj,
Sankara-
narayanan, P.
Thambidurai
(2007) [27]
Back Propagation
Neural Net-
work, Probabilis-
tic Neural Net-
work, discrimi-
nant analysis and
logistic regression
Type I, type
II and over-
all misclas-
sication
rate
PC1, PC2,
PC3, PC4,
PC5 and
PC6
1. Probabilistic Neural Net-
works outperforms Back Propaga-
tion Neural Networks in predicting
the fault proneness of Object Ori-
ented Software.
14 Zhan Li, Marek
Reformat (2007)
[37]
Support
Vector Machine,
C4.5, Multilayer
Per-
ceptron and Nave
Bayes classier
Sensitivity,
Specicity
and
Accuracy
JM1 and
KC1
1. Performance of proposed
methodology i.e. SimBoost was
found better compared to conven-
tional techniques. 2. Authors pro-
posed fuzzy labels for classication
purposes.
15 Naeem Seliya,
Taghi M. Khosh-
goftaar (2007)
[47]
Expectation
Maximization,
C4.5
Type I, type
II and over-
all error rate
KC1, KC2,
KC3 and
JM1
1. EM-based semi-supervised clas-
sication improves the performance
of software quality models.
16 Yue Jiang, Bojan
Cukic and Yan
Ma (2008) [25]
Nave Bayes, Lo-
gistic, IB1, J48,
Bagging
All available
eval-
uation tech-
niques. In
addition in-
troduce
Cost curve
CM1, JM1,
KC1, KC2,
KC4, MC2,
PC1 and
PC5
1. Selection of the best pre-
diction model cannot be made
without considering software cost
characteristics.
17 Olivier Van-
decruys a, David
Martens
, Bart Baesens ,
Christophe Mues
b,
Manu De Backer
and Raf Haesen
(2008) [50]
Ant Miner+,
C4.5,
logistic regression
and support vec-
tor machine
accuracy,
specicity
and
sensitivity
KC1, PC1
and PC4
1. Authors argued that the intu-
itiveness and comprehensibility of
Ant Miner+ model found superior
then compared models.
18 B. Turhan and A.
Bener (2009) [48]
Nave Bayes Probability
of Detection
(PD) and
Probabil-
ity of False
alarm (PF)
CM1, KC3,
KC4, MW1,
PC1, PC2,
PC3 and
PC4
1. They showed that independence
assumption of Nave Bayes was not
harmful for the defect prediction
in datasets with PCA preprocess-
ing. 2. They showed that assigning
weights to static code attribute can
signicantly increase the prediction
performance.
The Economics of Fault Prediction
-
2.5 Estimating Cost of Fault Prediction 16
19 Huihua Lu, Bojan
Cukic, Mark Culp
(2011) [38]
Random forest,
FTF
Probability
of Detection
and the
Area Under
Re-
ceiver Oper-
ating Char-
acter-
istic Curve
(AUC),
JM1, KC1,
PC1, PC3
and PC4
1. Semi-supervised technique out-
performs corresponding supervised
technique.
20 P.S. Bishnu and
V. Bhattacherjee
(2011) [9]
K-Means, Catal
et al. Two stage
approach
(CT) Single stage
approach (CS),
Nave Bayes and
Linear discrimi-
nant analysis
False Posi-
tive Rate,
False Nega-
tive Rate
and Error
AR3, AR4,
AR5, SYD1
and SYD2
1. Overall error rate of QDK al-
gorithm was found comparable to
other compared techniques
2.5 Estimating Cost of Fault Prediction
Software fault prediction attracts signicant attention as it can oer guidance
to software verication and validation activities. Over the past few years, many
organizations have provided their datasets containing software metrics and re-
spective fault information, publicly. Availability of these datasets encourages
researches to validate the performance of various machine learning techniques
in predicting fault proneness of software modules. Many research studies have
also been performed to evaluate the performance of these fault-prediction tech-
niques. But it seems that they ignored the impact of fault misclassication
on the economics of software development. Certifying considerable number of
faulty modules to be non-faulty raises serious concerns as it may result in the
increment of development cost due to the increase in fault removal cost of the
same, in the later phases. Hence, a more viable evaluation measure will be to
favor techniques which tend to reduce the fault removal cost.
Many studies have used dierent criteria to evaluate the performance of various
fault-prediction techniques under investigation. Some of the used criteria are:
Accuracy, Precision, Recall, and Mean absolute error, but these criteria could
not consider the cost parameter of software development. Then, few of them
presented cost measures to evaluate the cost eectiveness of fault prediction
studies.In this section, we summarized the studies, which measures the cost
eectiveness of fault prediction and relate them with our work.
Jiang et al. [25] have used various metrics to measure the performance of
The Economics of Fault Prediction
-
2.5 Estimating Cost of Fault Prediction 17
fault-prediction techniques. Then, they introduced cost curve, a measure to
estimate the cost eectiveness of a classication technique, to evaluate the per-
formance of a fault-prediction technique. They drew out the conclusion that
cost characteristics must be considered to select the best prediction technique.
Jiang et al. [24] addressed a more general problem, in which they observed that
the cost implications of false positives and false negatives are dierent. They
analyzed the benets of fault-prediction techniques which incorporate misclas-
sication cost in the development of the prediction model. They performed
11 experiments with dierent cost for false positives and false negatives on
13 datasets. They concluded that a cost-sensitive modeling does not improve
the overall performance of fault-prediction techniques. Nevertheless, explicit
information about misclassication cost makes it easier for software managers
to select the most appropriate technique.
Mende et al. [41] pointed out that, traditional prediction techniques typically
ignore the eort needed to x the faults, i.e., they do not distinguish between
a predicted fault in a small module and a predicted fault in a large module.
Then, they introduced a performance measure (popt), that takes the size of
the modules into account to measure the performance of a fault-prediction
technique. They performed their study on thirteen NASA datasets. They
concluded that their drawn result indicates the need for further research to
improve existing prediction models, not only by more sophisticated classica-
tion algorithms, but also by searching for better performance measures.
Mende et al. [40] proposed two strategies namely AD (eort-aware binary
prediction) and DD (eort-aware prediction based on defect density) to in-
clude the notion of eort awareness into fault-prediction techniques. The rst
strategy, AD, is applicable to any probabilistic classier, while DD is appli-
cable only for regression algorithms. They evaluate these strategies on fteen
publicly available datasets. They concluded that both strategies improve the
cost eectiveness of fault{prediction techniques signicantly, in the statistical
and a practical sense.
Arisholm et al. [3] presented a study performed in an industrial setting where
they tried to build fault prediction models to eciently predict faults in a
JAVA system having multiple versions. They also proposed a cost performance
measure (CE), a variation of lift charts where the x-axis contains the ratio of
The Economics of Fault Prediction
-
2.6 Summary 18
lines of code instead of modules. They concluded that the popular confusion
matrix criterion is not clearly related to the cost-eectiveness.
Catal et al. [11] presented a literature review on fault-prediction studies from
1990 to 2009. They reviewed the results of previous studies as well as dis-
cussed the current trends. Bell et al. [6] presented a challenge paper and
discussed some important issues regarding the impact of fault-prediction stud-
ies on testing and other eorts. They concluded that till then no study existed
in literature which could investigate the impact of fault prediction in software
development process. They also highlighted that coming up with a method
that would assess the eectiveness of fault-prediction studies if adopted in
software project would be helpful for the software community.
Jiang et al. [25] used cost curve to show the cost eectiveness of fault-
prediction studies, but they assume the same misclassication costs for each
module, which might be unreasonable in practice. Mende et al. [41] intro-
duced a new performance namely popt, that account module size to evaluate
the performance of a fault-prediction technique, but in our framework the fault
removal cost of an particular phase is same for all modules. Jiang et al. [24]
experimented the cost impact on fault misclassications over eleven dierent
values (taken arbitrarily) for cost of false positives and false negatives. These
values were considered as same for all phases of software development which
is not a practical assumption. In this thesis, we proposed a new cost evalu-
ation framework, which overcome this limitation by using organization-wide
cost information and compute the estimated fault removal costs based on their
place of identication. Wagner et al. [52] summarized the fault removal cost
for dierent testing stages. Jones et al [26] summarized the fault identication
eciency of dierent testing phases. We have used these parameters to com-
pute estimated fault removal cost for a specic fault-prediction technique and
that eventually helped us to decide its applicability in a more precise way.
2.6 Summary
In this chapter, we presented a brief summarization of concepts related to our
study. In particular, we have shown the conventional way of performing fault
prediction, the measures used to evaluate the performance of fault-prediction
The Economics of Fault Prediction
-
2.6 Summary 19
technique and the brief summary of available public dataset repositories. Here,
we also summarized the studies, which are related to my thesis work and frame
a background for the same.
The Economics of Fault Prediction
-
Chapter 3
Fault Prediction Results: How
useful They Are?
In this chapter, we give an insight towards the cost economy of fault prediction.
In particular, we revisited the results of some of the earlier fault prediction
studies to account for fault misclassication. Here, we rst investigate how
dierent authors measure the performance of their presented fault-prediction
techniques. Then, we rene the performance evaluation criteria based on fault
misclassication, and revisited the outcomes of the above-said fault-prediction
techniques.
In our study, we used fteen research papers based on public datasets along
with their outcomes and measurement criteria (see Table 2.3). The remainder
of this chapter is organized as follows. Section 3.1 discusses the issues in fault
prediction. Section 3.2 presents a new model for evaluating fault prediction
performance of a technique base on cost economics. Section 3.3 presents re-
vision of fault prediction results based on presented evaluation model, and
Section 3.4 summarize our ndings.
3.1 Issues in Fault Prediction
Economic software development process requires identication and removal of
faults in the early stages of software development process. Fault-prediction
techniques are used to predict fault-prone modules in the software. Predicting
faults correctly may help in reducing the eorts applied in the later stages of
-
3.2 A Proposed Model for Evaluating Fault Prediction Eciency 21
testing.
But building an accurate prediction model is a challenging task because the
dataset being used may have noisy content and may contain outliers [7]. It
is hard to nd suitable measure that can provide reliable estimation for the
various characteristics of the software system [6]. This makes the study of
fault prediction much more involved, as we are dealing with many alternative
and imprecise measures to compute the same software characteristic.
It has been found that the number of faulty modules represents only a small
fraction of the total number of modules in the software. This observation
in particular, is critically important to put the results obtained by the fault-
prediction technique in a correct perspective. Having fewer faulty modules
in the dataset, a high value of prediction accuracy may result due to the
classication of majority of non-faulty modules as non-faulty. However, our
main concern is identication of faulty modules rather than non-faulty ones.
Simply considering accuracy might be misleading, sometimes.
Many eorts have been made to evaluate the performance of fault-prediction
techniques. However, it seems that they tend to ignore the impact of the
fault misclassication on the economics of software development. For instance,
if there is high number of false positives, then it will require extra eorts
unnecessarily to scan those modules which are non-faulty. On the other hand,
if there is more number of false negatives, then leaving out too many faulty
modules under the scanner, the technique doesn't seem to help either. This
call for choosing a technique that would predict lesser number of false negatives
even if it tends to be less accurate and/or higher value of false positives.
Therefore, we revisited the results of previous fault prediction studies on the
basis of fault misclassication.
3.2 A Proposed Model for Evaluating Fault
Prediction Eciency
Here, we present a performance evaluation model, which evaluates the perfor-
mance of fault-prediction techniques in the context of economics.
The Economics of Fault Prediction
-
3.2 A Proposed Model for Evaluating Fault Prediction Eciency 22
Figure 3.1 and Figure 3.2 shows the cost statistics for both faulty and non-fault
modules respectively. If a faulty module predicted as faulty it requires unit
level testing eorts but, if it is predicted as non-faulty it requires extra eorts
paid in later development stages to remove the same fault (see Figure 3.1).
However, if a non-faulty module incorrectly predicted as faulty module it re-
quires extra eort paid at the time of unit testing (see Figure 3.2). We used
both of the above said observation to compare the performance of the fault-
prediction techniques in our presented evaluation model.
True Positive
Not Discovered
False Negative
Discovered
Predicted asnon-Faulty
Predicted asFaulty
Faulty Module Fault PredictionTechniqueRequire UnitTesting Cost
System Testing
On Field Testing
Require SystemTesting Cost
Require FieldTesting Cost
Figure 3.1: Cost statistics for faulty modules
.
False PositiveFalse Negative
True Negative
Predicted asnon-Faulty
Predicted asFaulty
non-FaultyModule
Fault PredictionTechnique
Require UnitTesting Cost
No TestingRequired
Figure 3.2: Cost statistics for non-faulty modules
.
The Economics of Fault Prediction
-
3.2 A Proposed Model for Evaluating Fault Prediction Eciency 23
3.2.1 General arguments
Based on above investigations and observations, we found a need to use those
prediction techniques which tries to minimize false negatives, even at the cost
of increasing false positive and compromising some accuracy. Accordingly, we
present a model to evaluate the performance of fault-prediction techniques.
The presented model tends to prioritize the performance of a fault-prediction
technique based on three criteria, namely, false negative rate, false positive
rate and prediction accuracy.
The general arguments to measure the performance of a fault-prediction tech-
nique are
1. False negatives are critically important for the overall reduction in the
testing and maintenance cost of the system and hence to be minimized.
2. False positives are to be reduced but can be compromised if they help to
reduce false negatives.
3. Similarly, prediction accuracy can also be compromised if it helps to reduce
false negatives.
3.2.2 Evaluation model
We quantify our arguments towards nding best technique. Here we discuss,
how we conclude a technique as the best one in the perspective of economic
software development. The dened model is given below-
1. Choose a technique as the best technique, having least FNR value but the
dierence between the FPR should be with in thresholds.
2. If two or more techniques have nearly same FNR value, then choose a
technique as the best technique, having least FPR value.
3. If two or more techniques have nearly equal FNR and FPR values, then
choose a technique as the best technique, having maximum accuracy.
We dene above three step evaluation model to compare the performance of
fault-prediction techniques so that selected technique requires minimum eort
for fault removal.
The Economics of Fault Prediction
-
3.3 Revisiting Fault Prediction Results 24
Wagner et al. [52] presented the quality economics of defect-detection tech-
niques and the impact of the uncovered faults on software cost as well as its
quality. This study support our presented model. Use of this evaluation model
helps to determine the impact of fault prediction on the software cost due to
undetected faults.
3.3 Revisiting Fault Prediction Results
There have been various studies done in the eld of software fault prediction.
In our analysis, we used the studies performed on public datasets. Table 2.3
summarizes detailed study of dierent authors with their used evaluation mea-
sures and their drawn conclusions. We analyzed that authors used various
evaluation measures to compare the performance of dierent fault-prediction
techniques, which made the comparison even more complicated. Moreover,
the performance of a technique varies with the used dataset. Therefore, we
revisited the results of earlier fault-prediction studies (Table 2.3) over four
NASA MDP [1] datasets(Table 3.1), incorporating above mentioned perfor-
mance measures i.e. false negative and false positive. All reported experi-
ments utilized technique implementations from WEKA data-mining tool [53].
All performance measurements are generated by threefold cross-validation of
classication.
Table 3.1: NASA datasets
Project # modules % with defects Language
CM1 496 9.80% C
KC1 2,109 15.50% C++
KC2 520 20.40% C++
PC1 1,109 6.90% C
As we know that high FNR shows that many faults remain undetected under
the scanner of fault-prediction technique, so it has a high impact on the soft-
ware quality as well as the testing and maintenance cost. At the same time
high value of FPR requires more eort for unit testing.
Overall, this suggests that for the development of economic and high quality
The Economics of Fault Prediction
-
3.3 Revisiting Fault Prediction Results 25
software, we should choose a technique that predicted less number of false
negatives even if it tends to be less accurate and/or predicted high number of
false positives. For our analysis, we combined the results of various authors
(mentioned in Table 2.3) with results of our presented model (FNR and FPR)
in Appendix Tables 3.2 to 3.5. Then, we interpret the performance of these
techniques in accordance to our model.
We have evaluated the performance of these techniques over four NASA datasets
viz. CM1, KC1, KC2 and PC1. We used WEKA [2] data mining tool to run
all the experiments. The interpretation is as follow:
For dataset CM1 (Table 3.2), techniques IBK, IB1 and Nave Bayes have similar
false negative rate (FNR) values but Nave Bayes have higher false positive rate
(FPR) than other two. Since Step 2 of our model compares the FPR values,
we found that IBK and IB1 had similar FPR values so both were equally good
when compared to other techniques.
For dataset KC1 (Table 3.3), techniques IBK, IB1 and Classication via Clus-
tering have similar FNR values but IBK has least FPR value, hence it outper-
forms all other techniques and can be consider the best for this dataset.
For dataset KC2 (Table 3.4), techniques Bayesian Logistic Regression and
Voted Perceptron have least FNR values but their FPR values are very high,
so they are not eective because almost all modules are predicted as faulty.
Hence, we consider Decision Stump technique as the best technique.
For dataset PC1 (Table 3.5), techniques IBK and IB1 have similar FNR values
but IB1 has slightly more false positives, so IBK is considered to be the best
one for the dataset PC1.
Now generalizing the best technique for these four datasets, our result shows
that IBK technique is found as the best technique among all the available
techniques.
The Economics of Fault Prediction
-
3.3 Revisiting Fault Prediction Results 26
Table 3.2: Experiment results for dataset CM1
Technique name Acc TP TN FP FN FNR FPR PrecisionRecall F-
measure
Neural Network 87.55 3 433 16 46 0.94 0.04 0.16 0.06 0.09
Simple Logistic 89.76 1 446 3 48 0.98 0.01 0.25 0.02 0.04
SMO 89.76 0 447 2 49 1 0 0 0 0
Voted Perceptron 89.96 0 448 1 49 1 0 0 0 0
IBK 87.95 15 423 26 34 0.69 0.06 0.37 0.31 0.33
IB1 87.95 15 423 26 34 0.69 0.06 0.37 0.31 0.33
Bagging 89.96 0 448 1 49 1 0 0 0 0
Classication via
Regression
89.56 3 443 6 46 0.94 0.01 0.33 0.06 0.1
Dagging 89.96 0 448 1 49 1 0 0 0 0
Stacking 90.16 0 449 0 49 1 0 0 0 0
Hyper pipes 89.56 0 446 3 49 1 0.01 0 0 0
Decision Table 90.16 0 449 0 49 1 0 0 0 0
PART 89.96 1 447 2 48 0.98 0 0.33 0.02 0.04
Jrip (RIPPER) 89.56 1 445 4 48 0.98 0.01 0.2 0.02 0.04
J 48 89.96 4 439 10 45 0.92 0.02 0.29 0.08 0.13
Random Forest 89.76 6 441 8 43 0.88 0.02 0.43 0.12 0.19
Decision Stump 90.16 0 449 0 49 1 0 0 0 0
BF tree 89.96 1 447 2 48 0.98 0 0.33 0.02 0.04
Nave Bayes 83.53 15 401 48 34 0.69 0.11 0.24 0.31 0.27
Bayesian Logistic
Regression
90.16 0 449 0 49 1 0 0 0 0
Logistic 88.15 8 431 18 41 0.84 0.04 0.31 0.16 0.21
Classication via
Clustering
84.14 13 406 43 36 0.73 0.1 0.23 0.27 0.25
Grading 90.16 0 449 0 49 1 0 0 0 0
Zero r 90.16 0 449 0 49 1 0 0 0 0
Table 3.3: Experiment results for dataset kc1
Technique name Acc TP TN FP FN FNR FPR PrecisionRecall F-
measure
Neural Network 85.78 69 1740 43 257 0.79 0.02 0.62 0.21 0.32
Simple Logistic 85.63 66 1740 43 260 0.8 0.02 0.61 0.2 0.3
SMO 84.64 9 1776 7 317 0.97 0 0.56 0.03 0.05
Voted Perceptron 81.79 117 1608 175 209 0.64 0.1 0.4 0.36 0.38
IBK 84.45 134 1647 136 192 0.59 0.08 0.5 0.41 0.45
IB1 83.36 134 1624 159 192 0.59 0.09 0.46 0.41 0.43
Bagging 85.92 78 1734 49 248 0.76 0.03 0.61 0.24 0.34
Classication via
Regression
85.4 63 1738 45 263 0.81 0.03 0.58 0.19 0.29
Dagging 84.83 12 1777 6 314 0.96 0 0.67 0.04 0.07
Stacking 84.54 0 1783 0 326 1 0 0 0 0
Hyper pipes 85.07 13 1781 2 313 0.96 0 0.87 0.04 0.08
Decision Table 84.73 43 1744 39 283 0.87 0.02 0.52 0.13 0.21
PART 85.02 50 1743 40 276 0.85 0.02 0.56 0.15 0.24
Jrip (RIPPER) 84.68 84 1702 81 242 0.74 0.05 0.51 0.26 0.34
J 48 85.21 96 1701 82 230 0.71 0.05 0.54 0.29 0.38
Random Forest 85.25 92 1706 77 234 0.72 0.04 0.54 0.28 0.37
Decision Stump 84.54 0 1783 0 326 1 0 0 0 0
BF tree 85.25 40 1758 25 286 0.88 0.01 0.62 0.12 0.2
Nave Bayes 82.46 120 1619 164 206 0.63 0.09 0.42 0.37 0.39
Bayesian Logistic
Regression
84.73 13 1774 9 313 0.96 0.01 0.59 0.04 0.07
Logistic 85.3 70 1729 54 256 0.79 0.03 0.56 0.21 0.31
Classication via
Clustering
81.79 129 1596 187 197 0.6 0.1 0.41 0.4 0.4
Grading 84.54 0 1783 0 326 1 0 0 0 0
Zero r 84.54 0 1783 0 326 1 0 0 0 0
The Economics of Fault Prediction
-
3.3 Revisiting Fault Prediction Results 27
Table 3.4: Experiment results for dataset kc2
Technique name Acc TP TN FP FN FNR FPR PrecisionRecall F-
measure
Neural Network 83.14 39 395 20 68 0.64 0.05 0.66 0.36 0.47
Simple Logistic 82.95 40 393 22 67 0.63 0.05 0.65 0.37 0.47
SMO 83.52 26 410 5 81 0.76 0.01 0.84 0.24 0.38
Voted Perceptron 24.52 106 22 393 1 0.01 0.95 0.21 0.99 0.35
IBK 79.12 50 363 52 57 0.53 0.13 0.49 0.47 0.48
IB1 76.25 51 347 68 56 0.52 0.16 0.43 0.48 0.45
Bagging 83.72 50 387 28 57 0.53 0.07 0.64 0.47 0.54
Classication via
Regression
82.57 45 386 29 62 0.58 0.07 0.61 0.42 0.5
Dagging 81.8 17 410 5 90 0.84 0.01 0.77 0.16 0.26
Stacking 79.5 0 415 0 107 1 0 0 0 0
Hyper pipes 81.99 19 409 6 88 0.82 0.01 0.76 0.18 0.29
Decision Table 82.57 45 386 29 62 0.58 0.07 0.61 0.42 0.5
PART 80.84 32 390 25 75 0.7 0.06 0.56 0.3 0.39
Jrip (RIPPER) 83.52 58 378 37 49 0.46 0.09 0.61 0.54 0.57
J 48 81.42 46 379 36 61 0.57 0.09 0.56 0.43 0.49
Random Forest 81.8 48 379 36 59 0.55 0.09 0.57 0.45 0.5
Decision Stump 78.93 80 332 83 27 0.25 0.2 0.49 0.75 0.59
BF tree 82.57 50 381 34 57 0.53 0.08 0.6 0.47 0.52
Nave Bayes 83.52 45 391 24 62 0.58 0.06 0.65 0.42 0.51
Bayesian Logistic
Regression
20.88 107 2 413 0 0 1 0.21 1 0.34
Logistic 82.38 47 383 32 60 0.56 0.08 0.59 0.44 0.51
Classication via
Clustering
81.03 70 353 62 37 0.35 0.15 0.53 0.65 0.59
Grading 79.5 0 415 0 107 1 0 0 0 0
Zero r 79.5 0 415 0 107 1 0 0 0 0
Table 3.5: Experiment results for dataset pc1
Technique name Acc TP TN FP FN FNR FPR PrecisionRecall F-
measure
Neural Network 93.6 18 1020 12 59 0.77 0.01 0.6 0.23 0.34
Simple Logistic 92.79 5 1024 8 72 0.94 0.01 0.38 0.06 0.11
SMO 93.15 1 1032 0 76 0.99 0 1 0.01 0.03
Voted Perceptron 91.61 0 1016 16 77 1 0.02 0 0 0
IBK 92.43 34 991 41 43 0.56 0.04 0.45 0.44 0.45
IB1 92.25 34 989 43 43 0.56 0.04 0.44 0.44 0.44
Bagging 92.88 6 1024 8 71 0.92 0.01 0.43 0.08 0.13
Classication via
Regression
92.79 3 1026 6 74 0.96 0.01 0.33 0.04 0.07
Dagging 93.06 1 1031 1 76 0.99 0 0.5 0.01 0.03
Stacking 93.06 0 1032 0 77 1 0 0 0 0
Hyper pipes 92.52 2 1024 8 75 0.97 0.01 0.2 0.03 0.05
Decision Table 92.7 5 1023 9 72 0.94 0.01 0.36 0.06 0.11
PART 92.43 1 1024 8 76 0.99 0.01 0.11 0.01 0.02
Jrip (RIPPER) 92.88 7 1023 9 70 0.91 0.01 0.44 0.09 0.15
J 48 92.7 11 1017 15 66 0.86 0.01 0.42 0.14 0.21
Random Forest 92.9666 20 1011 21 57 0.74 0.02 0.49 0.26 0.34
Decision Stump 92.88 2 1028 4 75 0.97 0 0.33 0.03 0.05
BF tree 92.7 4 1024 8 73 0.95 0.01 0.33 0.05 0.09
Nave Bayes 89.36 24 967 65 53 0.69 0.06 0.27 0.31 0.29
Bayesian Logistic
Regression
93.06 0 1032 0 77 1 0 0 0 0
Logistic 92.06 8 1013 19 69 0.9 0.02 0.3 0.1 0.15
Classication via
Clustering
89.81 19 977 55 57 0.75 0.05 0.26 0.25 0.25
Grading 93.06 0 1032 0 77 1 0 0 0 0
Zero r 93.06 0 1032 0 77 1 0 0 0 0
The Economics of Fault Prediction
-
3.4 Summary 28
3.4 Summary
Software fault prediction attracts signicant attention as it can oer guidance
to software verication and validation activities. Over the past few years,
many organizations have provided their datasets describing module metrics
and their fault content publicly. The availability of these datasets encour-
age researchers to perform their fault prediction studies using several machine
learning techniques. In this chapter, we studied the outcome of some of the
earlier studies undertaken in this area. We found that they have used various
criteria to evaluate the performance of a given technique. In most of the cases,
these studies have used prediction accuracy to show how good a technique is.
However, they seem to be ignoring the impact of fault misclassication rate
in judging the overall performance of the various fault-prediction techniques.
Certifying considerable number of faulty modules to be non-faulty raises seri-
ous concerns, where faulty modules themselves are small in number compared
to non-faulty modules. A more viable evaluation criterion will be to favor
techniques which tend to reduce false negatives even if compromise on false
positives and/or prediction accuracy.
We have performed re-analysis on the results of earlier studies and rene their
outcomes based on our presented model. Our contribution in this chapter is
to rene the way of selection of best technique. Here, we also identify the
need of an evaluation measure which provides the specic information about,
how cost economic fault-prediction techniques are and what their fundamental
limitations are?
The Economics of Fault Prediction
-
Chapter 4
A Cost Evaluation Framework
In the previous chapter, we investigated the impact of fault misclassication
on software economics and quality. In this chapter, we quantify the fault
removal cost in dierent stages of software development when we are using
fault prediction and answered both research questions.
Specically, we propose a cost evaluation framework that can help to put
the results of fault prediction in proper usability context. Essentially, the
framework can provide an estimate of the saving in the eorts applied by
using the results of the fault prediction in subsequent phases of the software
development. To construct the framework, we accounted for realistic fault
removal cost of dierent testing phases [52], along with their fault identication
eciency [26]. We have used this framework to investigate two important
and related research questions that for a given project dataset, whether fault
prediction would help? And if yes, then how to choose a fault-prediction
technique that would yield the optimal results. The rst question can be
answered by comparing the fault removal cost in both the cases, i.e. with or
without use of fault prediction.
The remainder of this chapter is organized as follows. In Section 4.1, we present
our proposed cost evaluation framework. Section 4.2 presents an experimental
study to investigate the usefulness of fault-prediction techniques using our
proposed framework. We discuss the implications of using our framework in
Section 4.3 and summarization is given in Section 4.4.
-
4.1 The Evaluation Framework 30
4.1 The Evaluation Framework
In the previous chapter, we highlighted the need of a cost evaluation measure,
which compare the performance of a fault-prediction technique on the basis of
their economics. Jones [30] states that 30-40 percent of the development cost is
spent for quality assurance and fault removal. Since fault-prediction techniques
are used to predict fault prone modules in early development life cycle, hence
it can help in reducing the cost incurred on testing and maintenance.
Here, we construct a cost evaluation framework, which accounts for realistic
cost required to remove a fault and computes the estimated fault removal cost
for a specic fault-prediction technique. The constraints, which we accounted
for our framework include:
(1) Fault removal cost varies with testing phases.
(2) It is not possible to identify 100 % faults in specic testing phase.
(3) It is practically not feasible to perform unit test on all modules.
We have used normalized fault removal cost suggested by Wagner et al. [52]
to formulate our cost evaluation framework, but these costs may vary from
one organization to another and also depends on the various characteristics
of the project. The normalized costs are summarized in Table 4.1. The fault
identication eciencies for dierent testing phases are taken from the study
of Jones [26]. The eciencies of testing phases are summarized in Table 4.2.
Wilde et al. [45] stated that more than fty percent of modules are very small
in size, hence unit testing on these modules is unfruitful. We have included
this value (0.5) as the threshold for unit testing in our framework.
Table 4.1: Removal costs of test techniques (in sta-hours per defect) [52]
Type Lowest Mean Median Highest
Unit 1.5 3.46 2.5 6
System 2.82 8.37 6.2 20
Field 3.9 27.24 27 66.6
The Economics of Fault Prediction
-
4.1 The Evaluation Framework 31
Table 4.2: Fault identication eciencies of dierent test phases [26]
Type Lowest Median Highest
Unit 0.1 0.25 0.5
System 0.25 0.5 0.65
Figure 3.1 and Figure 3.2 shows the cost statistics for both faulty and non-
fault modules respectively. Software modules which are predicted as faulty
(true positives and false positives) by the fault-prediction technique requires
some verication and testing cost at module level i.e. require cost equal to
the unit testing cost (Cu, specically for our study). As it is stated that 100
% identication of faults in specic testing phase is not possible, so some of
the correctly predicted faulty modules (true positive) remain undetected in
the unit testing. Faulty modules, which are predicted as non-faulty (false
negatives) and the correctly predicted faulty modules which are remain unde-
tected in unit testing, are probably detected in later stages that require the
fault removal cost equal to either system testing or eld testing (cu and Cs
respectively in our case). The used testing techniques in our framework can
also, along with the respective fault removal cost and eciency can vary from
organization to organization. Equation 4.1 shows the proposed cost evaluation
framework to estimate the overall fault removal cost. Equation 4.2 shows the
minimum fault removal cost without the use of fault prediction. Normalized
fault removal cost and its interpretation is shown in equation 4.3.
Ecost = Ci + Cu (FP + TP ) + s Cs (FN + (1 u) TP )+(1 s) Cf (FN + (1 u) TP ) (4.1)
Tcost =MpCu(TM)+sCs(1u)FM+(1s)Cf(1u)FM(4.2)
NEcost =Ecost
Tcost
8>>>>>:< 1; Fault Prediction is useful
=> 1; Use Unit Testing
(4.3)
Where, Ecost - Estimated fault removal cost of the software when we use fault
prediction.
The Economics of Fault Prediction
-
4.1 The Evaluation Framework 32
Tcost- Estimated fault removal cost of the software without the use fault
prediction.
NEcost- Normalized Estimated fault removal cost of the software when we use
fault prediction.
Ci - Initial setup cost of used fault-prediction technique.
Cu - Normalized fault removal cost in unit testing.
Cs - Normalized fault removal cost in system testing.
Cf - Normalized fault removal cost in eld testing.
Mp - Percentage of modules unit tested.
FP - Number of false positives.
FN - Number of false negatives.
TP - Number of true positives.
TM - Total modules.
FM - Total number of faulty modules.
u - Fault identication eciency of unit testing.
s - Fault identication eciency of system testing.
Our cost evaluation framework consider more practical scenario where the un-
detected faults are traced in all the later testing phases and the corresponding
fault removal cost is evaluated based on the organization specic statistics.
It makes the proposed framework more viable performance measure then the
other measures.
In our experiment, we used values of Cu, Cs and Cf as summarized in Ta-
ble 4.1. u and s show the fault identication eciency of unit testing and
system testing, respectively. We have used the values of u and s from the
survey report "Software Quality in 2010" of Caper Jones [26]. Mp shows the
fraction of modules unit tested. Its value is taken from the study of Wilde
[45]. We have generalized the framework so that it can be applied to any sort
of organization/software with their specic values of Cu, Cs, Cf , Mp,u and
s. Our aim is to provide the bench mark to approximate the overall fault
removal cost. This is clear from our framework that if a technique is having
high false negatives and/or high false positive, then it results in higher fault
removal cost. When this approximated cost exceeds the unit testing cost, we
suggest testing all the modules at unit level instead of using fault prediction.
(equation 4.3)
The Economics of Fault Prediction
-
4.2 Experimental Study 33
4.2 Experimental Study
In this section, we presented an experimental study to investigate the useful-
ness of fault-prediction techniques using our cost evaluation framework. In our
study, we used ve popular fault-prediction techniques [19][20][27][25][22] on
19 projects from NASA MDP [1] and PROMISE [2] repositories to investigate
our study. As these nineteen projects cover signicant range of percentage
faulty modules in the project (varying from 7 to 49 percentages), it is su-
cient for our investigation. We used WEKA machine learning tool to perform
all listed experiments.
4.2.1 Experimental setup
We have used NASA MDP [1] and PROMISE [2] datasets, listed in Table
4.3, to evaluate the impact of fault-prediction technique over the fault removal
cost using our proposed framework (Ecost). The metrics in these datasets
describe projects, which vary in size as well as in complexity. These datasets
have dierent software metrics varying in size from eight to forty. We further
classify these datasets on the basis of percentage of faulty modules present as
shown in Table 4.4.
To illustrate eectiveness of our framework, we have used ve well-known
fault-prediction techniques. Our goal is to demonstrate the cost evaluation
framework and suggest when to use fault prediction, rather than identifying
the "best" fault-prediction technique. For this reason, the choice of fault-
prediction technique is orthogonal with respect to the intended contribution.
The fault-prediction techniques which we selected for our study are Random
Forest, J48 (C4.5 decision tree), Neural Network, K-means Clustering and IBK
(K-nearest neighbours). These algorithms represent a broad range of machine
learning techniques. All reported experiments utilized technique implemen-
tations from a well-known software package WEKA [53]. All performance
measurements are generated by threefold cross-validation of classication.
The Economics of Fault Prediction
-
4.2 Experimental Study 34
Table 4.3: Used projects from NASA [1] and PROMISE data repository [1]
Project Faulty (%) Number of Modules
pc1 6.94 1109
ar1 7.44 121
nw1 7.69 403
kc3 9.34 458
cm1 9.84 498
pc3 10.24 1563
Arc 11.54 234
pc4 12.21 1458
kc1 15.46 2109
ar4 18.69 107
jm1 19.35 10885
kc2 20.5 522
camel1.6 21.91 858
ant1.6 26.21 351
ant1.7 27.79 493
mc2 32.3 161
jedit 3.2 33.09 272
lucene2.0 46.67 195
jedit 4.0 m 48.9 274
Table 4.4: Categorization of projects based on the fraction of faulty modules
Category Faults (%