Object-Oriented Class Maintainability Prediction Using ... · Keywords: internal and external...
Transcript of Object-Oriented Class Maintainability Prediction Using ... · Keywords: internal and external...
1
Object-Oriented Class Maintainability Prediction Using Internal
Quality Attributes
Jehad Al Dallal
Department of Information Science
Kuwait University
P.O. Box 5969, Safat 13060, Kuwait
Abstract
Context: Class maintainability is the likelihood that a class can be easily modified. Before
releasing an object-oriented software system, it is impossible to know with certainty
when, where, how, and how often a class will be modified. At that stage, this likelihood
can be estimated using the internal quality attributes of a class, which include cohesion,
coupling, and size. To reduce the future class maintenance efforts and cost, developers
are encouraged to carefully test and well document low maintainability classes before
releasing the object-oriented system.
Objective: We empirically study the relationship between internal class quality attributes
(size, cohesion, and coupling) and an external quality attribute (class maintainability).
Using statistical techniques, we also construct models based on the selected internal
attributes to predict class maintainability.
Method: We consider classes of three open-source systems. For each class, we account
for two actual maintainability indicators, the number of revised lines of code and the
number of revisions in which the class was involved. Using 19 internal quality measures,
we empirically explore the impact of size, cohesion, and coupling on class
maintainability. We also empirically investigate the abilities of the measures, considered
both individually and combined, to estimate class maintainability. Statistically based
prediction models are constructed and validated.
Results: Our results demonstrate that classes with better qualities (i.e., higher cohesion
values and lower size and coupling values) have better maintainability (i.e., are more
likely to be easily modified) than those of worse qualities. Most of the considered
measures are shown to be predictors of the considered maintainability indicators to some
degree. The abilities of the considered internal quality measures to predict class
maintainability are improved when the measures are combined using optimized
multivariate statistical models.
Conclusion: The prediction models can help software engineers locate classes with low
maintainability. These classes must be carefully tested and well documented.
Keywords: internal and external quality attributes, quality measures, class cohesion,
class coupling, class size, class maintainability, class revisions, object-oriented software.
1. Introduction
Maintenance is a key stage in the software life cycle, and it starts when the software
product is delivered. During the maintenance stage, the software product is modified to
correct faults, improve performance or other attributes, or adapt the product to a modified
2
environment (Mamone 1994). Software maintenance constitutes the largest share of the
total cost of producing software applications. Some studies estimated that maintenance
requires up to 80% of the total cost (Ahn et al., 2003). To reduce the maintenance costs,
earlier development stages must be implemented to enable software code implementation
results to be easily understood and modified (Erdil et al., 2003). Maintainability is
defined as "the ease with which a software system or component can be modified" (IEEE
1990).
One reason for the shift in software development toward the use of object-oriented (OO)
technology is the belief that OO code has high quality and maintainability (Briand et al.
1997b). Because the central construct of OO development is the class, classes are
expected to be high-quality units that can be easily maintained. Maintainability aspects
are measurable after performing maintenance tasks. Once a class is revised, the time and
cost of this specific revision are measurable. Alternatively, measures correlated to
maintenance time and cost can be applied to estimate the maintenance time or cost. In
this paper, we consider two such existing maintenance measures: the number of revised
lines of code (LOC) (Li and Henry 1993) and the number of revisions in which the class
was involved during the maintenance history (Dagpinar and Jahnke 2003). We selected
these two measures for two main reasons. The first reason is that the number of revisions
and revised LOC indicate two different maintenance aspects that are of interest to
software engineers. The former measure quantifies the maintenance rate. Software
engineers and practitioners prefer classes with lower maintenance rates over those with
higher rates because code that undergoes more revisions becomes less organized, less
understandable, and more fault-prone (Erdil et al., 2003). The number of revised LOC is
found to correlate with both maintenance cost (Granja-alvarez and Barranco-garcia 1997)
and maintenance effort measured in units of time (Hayes et al., 2004), where both cost
and effort are key factors for software engineers. The second reason for selecting these
two maintenance measures is that they are measurable in software systems with reported
maintenance histories, which occurs for some systems available on-line. This allows us to
perform the required empirical study.
The two considered maintenance measures might be somehow correlated, but they are
different. A class can be involved in many revisions, but it may have relatively few
revised LOC. In contrast, a class can be involved in few revisions but have many revised
LOC.
Class maintainability, i.e., the likelihood that a class may be easily modified, is a key
class quality attribute, and classes should be designed to be maintainable. Based on the
above discussion, classes with many revisions or revised LOC are less maintainable (i.e.,
expected to be more difficult to modify) than those with few revisions or revised LOC.
The two considered maintenance measures are thus actually class maintainability
indicators. Maintainability, like other important class qualities (e.g., reusability and
reliability) belongs to a set of software attributes known as external software attributes.
These attributes are directly relevant to users and practitioners (Fenton and Pfleeger 1997,
Morasca 2009).
3
Classes with low maintainability must be carefully tested to reduce their fault-proneness
and well documented to improve their understandability when future maintenance tasks
are performed. In addition, software developers may refactor classes with low
maintainability to enhance their maintainability before releasing the system. The
refactoring of classes by their developers at earlier stages is better than having other
maintenance programmers refactor them at later stages because system developers are
generally more knowledgeable about the system than external maintenance programmers;
however, the quality of documentation is an important factor to be considered with regard
to this. Therefore, the refactoring of classes by their developers potentially reduces the
maintenance time and cost. Software engineers are thus interested in identifying classes
with low maintainability when the system is complete and not yet released. As with other
external attributes, however, many factors may affect class maintainability in addition to
the factors that depend on knowledge of the class artifacts (e.g., source code). These
factors are typically unknown at the class development stage and cannot be measured
solely based on knowledge of the class or even the software system to which the class
belongs. For example, it is impossible to foresee the evolutions that the system will
undergo and the future modifications to the environment to which the system will be
adapted. Therefore, class maintainability cannot be measured before the actual
maintenance process is performed, but it can be estimated.
Class size, cohesion, and coupling are internal software attributes that can be measured
after the system is developed and before it is released, and they may be related to various
external software quality attributes, including maintainability, reusability, and reliability
(Lee and Chang 2000). If a relationship exists between these internal quality attributes
and maintainability, developers could rely on measures that quantify the internal quality
attributes of classes to estimate class maintainability.
Without empirical validation or with a little empirical support using a few classes, which
raises questions about the generality of the obtained results, several researchers (e.g.,
Briand et al. 1993, Li and Henry 1993, Lee and Chang 2000, Sheldon et al. 2002,
Chaumun et al. 2002, Dagpinar and Jahnke 2003, Aggarwal et al. 2006, Zhou and Leung
2007, Li-jin et al. 2009, Elish and Elish 2009) suggested using existing or newly
proposed internal quality measures to predict maintainability. Some related empirical
studies (e.g., Li and Henry 1993, Dagpinar and Jahnke 2003, Zhou and Leung 2007, Li-
jin et al. 2009, Elish and Elish 2009) considered a few internal quality attributes or did
not investigate the impact of individual measures on maintainability; therefore, their
results cannot be used to decide whether the cohesion, coupling, and size quality
attributes each has a negative or positive impact on class maintainability. Instead of
examining actual revisions performed on the considered systems during their
maintenance history, some related empirical studies (e.g., Kabaili et al. 2001, Chaumun et
al. 2002) were based on experimentally revising the considered systems and exploring the
relationship between some internal quality measures and artifacts based on the
experimental revisions. The key limitation of such studies is the fact that they depend on
the experimental revisions, which might not be representative of the actual revisions.
Finally, some empirical studies (e.g., Briand et al. 1999b, Briand et al. 2001, Gyimothy et
al. 2005, Olague et al. 2007, and Marcus et al. 2008) investigated the abilities of some
4
quality measures to predict a specific aspect related to maintenance, namely, fault-
proneness, but they did not consider other maintenance types, including those performed
to enhance system performance or adapt the system to a modified environment.
In this paper, we extend these studies by empirically investigating relationships between
19 internal class attribute measures, including some of the most common size, cohesion,
and coupling measures in the literature, and the two indicated maintainability indicators.
In the empirical study, we used classes selected from three open-source Java systems, and
we collected their actual maintenance data, which was available on-line. We discuss how
to apply logistic regression analysis (Hosmer and Lemeshow 2000), a widely applied
statistical technique in experiment-based research, to predict class maintainability. Using
this technique, we propose models to estimate maintainability. Some models are based on
individual measures, and others are based on combinations of measures. These models
can be applied to predict class maintainability in advance, i.e., after the system is
developed but before it is released. Classes determined to be potentially less maintainable
should be carefully tested, documented, and possibly refactored.
Our results show that most considered quality measures are statistically significant
predictors of the two considered class maintainability indicators. Specifically, the results
suggest that there is a negative relationship between the maintainability external quality
attribute and each of the size and coupling internal quality attributes. In other words, the
results indicate that classes with larger sizes and higher coupling values are less
maintainable (i.e., more difficult to modify) than those with smaller sizes and lower
coupling values. The results also suggest that there is a positive relationship between the
maintainability external quality attribute and the cohesion internal quality attribute;
namely, the classes with higher cohesion values are more maintainable (i.e., easier to
modify) than those with lower cohesion values. When we considered the quality
measures in combination, the statistically constructed maintainability prediction models
show that the different quality attributes are complementary in predicting class
maintainability. As expected, when considering the measures in combination, the
constructed models become more statistically stable and provide better maintainability
indicators than most models based on individual measures.
The major contributions of this paper include the following:
1. Explaining how to apply logistic regression analysis to predict class
maintainability.
2. Investigating the relationship between size, cohesion, and coupling quality
attributes and class maintainability.
3. Exploring the abilities of several size, cohesion, and coupling measures,
considered individually, to predict class maintainability.
4. Constructing models based on combinations of measures, that have practical
abilities to predict class maintainability.
This paper is organized as follows. Section 2 reviews the basic concepts regarding
internal and external software attributes, and Section 3 reviews related work. Section 4
provides an overview of the software systems and the descriptive statistics that
characterize them. Section 5 describes the statistical techniques used in the data analyses.
5
Sections 6 and 7 report and discuss the univariate and multivariate regression analyses
and their results. Section 8 discusses validity threats to the empirical study. Finally,
Section 9 concludes the paper and outlines possible future work.
2. Internal vs. external attributes
Researchers have divided class quality attributes into internal and external categories
(Morasca 2009). External quality attributes are those that indicate class quality based on
factors that cannot be measured using only knowledge of the software artifacts (Fenton
and Pfleeger 1997). In addition to the artifacts, the software engineer must consider their
environment and the interactions between the artifacts and environment. For example, the
maintainability of an OO class depends on many elements, such as the class itself, the
experience of the team in charge of system maintenance, system age, and the
environment to which the system is modified to adapt. For example, an older system is
likely to require more maintenance effort than a younger system because the system size
is likely to increase with age and the modified code is likely to be less organized and
understandable. Therefore, class knowledge alone is insufficient to quantify its
maintainability, it is difficult to anticipate the future effort required to maintain the class,
and the class maintainability cannot be measured unless the class is actually maintained.
Conversely, internal class quality attributes, e.g., size, cohesion, and coupling, can be
measured based only on class artifact knowledge. Quantifying internal attributes is much
easier than quantifying external attributes; for example, class size can be measured by
counting the number of LOC. However, software practitioners are not interested in
internal quality attributes unless they are used to indicate external quality attributes such
as maintainability and reusability (Morasca 2009). For example, class cohesion is worth
measuring only if it is believed or has been shown to be related to (1) an external attribute
of the same artifact (e.g., the maintainability of the class) or some other related artifacts,
such as the class test suite, or (2) a software process attribute (e.g., the cost required to
develop the class).
In this empirical study, we consider the measurement of one external attribute, namely,
maintainability, and three internal attributes, namely, size, cohesion, and coupling.
Because of their different natures, the considered attributes are quantified differently as
follows.
2.1. Internal attributes: size, cohesion, and coupling
Software size has been typically considered as a key attribute of several software
products, including OO classes (Briand et al. 1999b, Briand et al. 2001). Software size is
empirically found to influence several qualities of interest, such as a software product’s
fault-proneness (e.g. Briand et al., 2001, Gyimothy et al., 2005, Aggarwal et al., 2007)
and reusability (e.g. Al Dallal and Morasca 2012).
Class cohesion is an intra-class property that refers to the extent to which class members
are related. The literature has proposed several class cohesion measures (Briand et al.
1998). These measures use different formulas applied during either the high- or low-level
design phases. High-level design (HLD) measures, such as those proposed by Briand et al.
6
(1999b), Bansiya et al. (1999), Counsell et al. (2006), and Al Dallal and Briand (2010),
require information available during the HLD phase, e.g., the types of attributes and
method parameters. Low-level design (LLD) measures, such as those proposed by
Chidamber and Kemerer (1991), Bieman and Kang (1995), Chen et al. (2002), Badri and
Badri (2004), Wang (2005), Bonja and Kidanmariam (2006), Fernández and Peña (2006),
Al Dallal (2012b), and Al Dallal and Briand (2012), require information available during
the LLD phase, e.g., the attributes referenced by the methods.
Coupling is an inter-class property that refers to the degree to which a class is related to
other classes. The literature has proposed several measures to determine class coupling
(Briand et al. 1999a), and these measures consider different class aspects such as the
types of attributes (Li and Henry 1993), the types of parameters (Briand et al. 1997), and
invoked methods (Chidamber and Kemerer 1991, Chidamber and Kemerer 1994, Li and
Henry 1993, Lee et al. 1995, Gui and Scott 2009).
The usual notion of measure as defined in measurement theory, i.e., a function that
associates a value with each entity (Krantz et al. 1971, Roberts 1979), can be used for the
above internal attributes (Morasca 2009). Moreover, the corresponding measures must
comply with the representation condition of measurement theory or weaker conditions,
such as those defined in axiomatic approaches (Weyuker 1988, Briand et al. 1996,
Morasca 2008). Section 3.1 presents further details about the specific internal software
attribute measures used in this study.
2.2 An external attribute: maintainability
Morasca (2009) discussed several reasons for the unsuitability of using the same notion
of measure as that defined in measurement theory to quantify external attributes such as
maintainability. The first reason is that the quantification of external attributes depends
on the entity under study and several additional factors. For example, it is not true that
software product maintainability is a function only of the software product itself. The
definition of a measure given in measurement theory (Krantz et al. 1971, Roberts 1979),
which states that a measure is a function that associates a value with an entity, is thus not
suitable for external attributes. The second reason why using measurement theory is
unsuitable for external attributes is that defining attributes with their measures causes a
logical problem. This problem occurs because attributes exist prior to when and
independent of how they are measured, and defining a measure logically follows the
defining of the attribute it purports to measure; otherwise, attribute classification
inconsistencies may occur. The third reason for the inappropriateness of using
measurement theory for external attributes is that measurement theory is applied to define
deterministic measures, which do not exist for external attributes, as many variables
affect them (the “environment”) in addition to the specific entity.
In this paper, we therefore follow the suggestion of Morasca (2009) using probabilities
and probabilistic models to estimate our external attribute of interest, namely, OO class
maintainability. These estimation models use size, cohesion, and coupling measures as
independent variables and estimate the probability that an OO class will be frequently
revised or will require costly revisions (indicated by the number of revised LOC).
7
Building these probability estimation models requires the collection of data regarding
actual revisions performed on OO classes during the maintenance phase. We thus use the
number of revisions in which the class was involved and the number of revised (i.e.,
added, deleted, or changed) LOC during the maintenance phase as measures to indicate
class maintainability.
3. Related work
In this section, we provide an overview of several existing internal quality class measures
for OO systems and other related work on the measurement of software quality. We also
review several existing maintainability indicators and provide an overview of the existing
research that has theoretically or empirically discussed and studied the relationship
between internal class attributes and maintainability.
3.1. Internal class attributes
Researchers have proposed several measures to assess different internal class quality
attributes, such as size, cohesion, and coupling. The proposed size measures measure
different size aspects. For example, Number of Methods (NOM) measures the amount of
functionality that a class provides, Number of Attributes (NOA) measures the amount of
data necessary for the class to function, and the lines of code (LOC) parameter measures
the size in terms of statements. Chidamber and Kemerer (1994) proposed the Weighted
Methods per Class (WMC) parameter as a complexity measure whose value is obtained
by summing the complexities of all methods defined in a class. Most authors who
previously used WMC assumed that each method has a complexity of one and thus
assumed that NOM and WMC are equivalent (Chaumun et al. 2002).
Cohesion refers to the extent to which components in a software module are related
(Bieman and Kang 1998). Several cohesion measures have been proposed for functions
in procedural programs (e.g., Bieman and Kang 1998, Meyers and Binkley 2007, Sarkar
et al. 2007, Al Dallal 2009) and classes in object-oriented programs (e.g., Chidamber and
Kemerer 1991, Li and Henry 1993, Chidamber and Kemerer 1994, Bieman and Kang
1995, Briand et al. 1998, Briand et al. 1999b, Bansiya et al. 1999, Chen et al. 2002, Yang
2002, Chae et al. 2004, Etzkorn et al. 2004, Badri and Badri 2004, Wang 2005,
Fernandez and Pena 2006, Counsell et al. 2006, Badri et al. 2008, Al Dallal and Briand
2010, Al Dallal and Briand 2012, Al Dallal 2012b). Based on a justified criterion, as
discussed in Section 4.3, we consider eight cohesion measures, including Coh, CAMC,
TCC, LCC, LSCC, SCOM, PCCC, and OLn, as defined in Table 1. The selected cohesion
measures are well studied, both theoretically and empirically (Briand et al. 1999b, Briand
et al. 2001, Al Dallal 2010, Al Dallal 2011a, Al Dallal 2011b, Al Dallal 2012a, Al Dallal
2012c, Al Dallal 2013).
8
Table 1: Definitions of the considered class cohesion measures (adapted from Al Dallal
2012a)
Coupling refers to the relatedness among system components. In OO systems, coupling
can be measured at the class or system level. At the class level, developers are concerned
with measuring the extent to which the class is coupled with other classes. At the system
level, developers are concerned with measuring the total coupling of a software system
(Gui and Scott 2009). Researchers have proposed several coupling measures to assess
class coupling (e.g., Briand et al. 1997, Chidamber and Kemerer 1991, Chidamber and
Class Cohesion Measure Definition/Formula
Coh (Briand et al. 1998) Coh = a/kl, where a, k, and l have the same definitions as above.
Cohesion Among Methods in
a Class (CAMC) (Counsell et
al. 2006)
CAMC = a/kl, where l is the number of distinct parameter types, k is the number
of methods, and a is the summation of the number of distinct parameter types of
each method in the class. Note that this formula is applied in the model that does
not include the self-parameter type used in all methods.
Tight Class Cohesion (TCC)
(Bieman and Kang 1995)
TCC = Relative number of directly connected pairs of methods, where two
methods are directly connected if they are both directly connected to the same
attribute. A method m is directly connected to an attribute when the attribute
appears within the method's body or within the body of a method invoked by
method m, either directly or transitively.
Loose Class Cohesion (LCC)
(Bieman and Kang 1995)
LCC = Relative number of directly or transitively connected pairs of methods,
where two methods are transitively connected if they are both directly or indirectly
connected to the same attribute. A method m, directly connected to an attribute j, is
indirectly connected to an attribute i when there is a method directly or transitively
connected to both attributes i and j.
Low-level Design Similarity-
based Class Cohesion
(LSCC) (Al Dallal and
Briand 2012)
LSCC(C)
0 if k 0 or l 0,
1 if k 1,
x i(x i 1)i1
l
lk(k 1) otherwise.
,
Where l is the number of attributes, k is the number of methods, and xi is the
number of methods that reference attribute i.
Class Cohesion Metric
(SCOM) (Fernandez and
Pena 2006)
SCOM = Ratio of the sum of the similarities between all pairs of methods to the
total number of pairs of methods. The similarity between methods i and j is
defined as:
l
II
II
IIjiSimilarity
ji
ji
ji
),min(),(
, where l is the number of attributes
Low-level design Similarity-
based Class Cohesion
(LSCC) (Al Dallal and
Briand 2012)
LSCC(C)
0 if k 0 or l 0,
1 if k 1,
x i(x i 1)i1
l
lk(k 1) otherwise.
,
Where l is the number of attributes, k is the number of methods, and xi is the
number of methods that reference attribute i.
Path Connectivity Class
Cohesion (PCCC) (Al Dallal
2012b)
otherwise. )(
)(
0, and 0 if 1
1, and 0 if 0
)(
c
c
FGNSP
GNSP
kl
kl
CPCCC
,
where NSP is the number of simple paths in graph Gc, FGc is the corresponding
fully connected graph, and a simple path is a path in which each node occurs once
at most.
OLn (Yang 2002) OLn= The average strength of the attributes, wherein the strength of an attribute is
the average strength of the methods that reference that attribute. The strength of a
method is initially set to 1 and is computed, in each iteration, as the average
strength of the attributes that it references, where n is the number of iterations that
are used to compute OL.
9
Kemerer 1994, Li and Henry 1993, Lee et al. 1995, Kabaili et al. 2001). Based on the
types of coupling considered by the measures, as in Section 4.3, we consider eight
coupling measures, including CBO, CBO_IUB, CBO_U, RFC, MPC, DAC1, DAC2, and
OCMEC, defined in Table 2. Briand et al. (1999a) studied the theoretical validation of
most of these coupling measures.
Table 2: Definitions of the considered class coupling measures
3.2. Indicating object-oriented maintainability
Software maintenance is categorized into four types: corrective, adaptive, perfective, and
preventive (Erdil et al. 2003). Corrective maintenance addresses the correction of faults
when the system does not behave according to its specifications. Adaptive maintenance is
applied to a system to adapt it to new environments without affecting its functionality.
Perfective maintenance extends system functionality and improves the provided services.
Preventive maintenance performs activities such as code refactoring to enhance system
maintainability. Corrective maintenance is considered the traditional maintenance type,
whereas the other three types of maintenance are referenced as software evolution.
Several measures have been proposed to measure different maintenance aspects,
including the number of revised LOC (Li and Henry 1993), number of revisions
(Dagpinar and Jahnke 2003), and pieces of code potentially affected by the revised code
(Kabaili et al. 2001, Chaumun et al. 2002, Xia and Srikanth 2004). Some researchers
studied the correlation between some maintenance measures and maintenance effort.
Among several measures empirically considered, Hayes et al. (2004) found that the
number of revised LOC is strongly correlated to maintenance effort measured in units of
time. Granja-alvarez and Barranco-garcia (1997) showed that the cost of maintenance is
correlated with the number of revised LOC.
Several researchers have discussed or empirically investigated the correlations between
maintenance and internal quality measures. Without validation, Briand et al. (1993)
Class Coupling Measure Definition/Formula
Coupling Between Object
Classes (CBO) (Chidamber
and Kemerer 1994)
CBO = Number of classes, excluding the inherited classes, to which the class is
coupled. A class A is coupled to another class B if the methods of class A use
attributes or methods of class B, or vice versa. Thus, CBO = CBO_IUB + CBO_U.
CBO is Used by (CBO_IUB)
(Kabaili et al. 2001)
CBO_IUB of class A = Number of classes, excluding the inherited classes, that
use the attributes or methods of class A.
CBO Using (CBO_U)
(Kabaili et al. 2001)
CBO_U of class A = Number of classes, excluding the inherited classes, that are
used by the methods of class A.
Response for a class (RFC)
(Chidamber and Kemerer
1994)
RFC of class A = Number of methods in class A + Number of distinct methods of
the other classes directly invoked by the methods of class A.
Message Passing Coupling
(MPC) (Li and Henry 1993)
MPC of class A = Number of method invocations in class A.
Data Abstraction Coupling
(DAC1) (Li and Henry 1993)
DAC1 of class A = Number of attributes, in class A, whose types are of other
classes.
DAC2 (Li and Henry 1993) DAC2 of class A = Number of distinct classes used as types of the attributes of
class A.
OCMEC (Briand et al. 1997) OCMEC of class A = Number of distinct classes used as types of the parameters of
the methods in class A.
10
suggested a set of high-level design-based cohesion and coupling measures that can be
used to estimate OO system maintainability.
Li and Henry (1993) used two Classic-Ada systems with 39 and 70 classes to investigate
quality measures that predict maintainability. They used depth of inheritance (DIT),
number of children classes (NOC), MPC, lack-of-cohesion (LCOM), RFC, DAC, WMC,
NOM, number of semicolons, and NOA as quality measures, and the number of revised
LOC per class during its maintenance history was used as a maintenance measure.
Adding or deleting a line is counted as a single line change, and a change in line content
is counted as both a deletion and an addition. The results obtained by applying the linear
regression statistical technique confirmed that there is a strong correlation between the
quality measures, considered in combination, and class maintainability, indicated by the
number of revised LOC. The empirical study has two main limitations. First, it considers
few classes, which raises questions about the generality of the obtained results. The
second limitation is that their work did not investigate the abilities of the individual
measures to predict maintainability; therefore, the results cannot be used to determine
whether each of the cohesion, coupling, and size quality attributes has a negative or
positive impact on class maintainability.
Without validation, Lee and Chang (2000) proposed an equation that uses existing
complexity measures to estimate the maintainability of OO software.
Using three C++ systems, Kabaili et al. (2001) investigated whether cohesion can predict
the changeability of an OO system (i.e., the ability of the system to absorb changes).
They identified the possible changes that can be performed on an OO system, performed
some of these changes on the considered systems, and analyzed the impact of these
changes on the systems. The impact of a change is defined as the number of classes that
the change affects. Finally, they studied the correlation between the impact of the change
and the LCC and LCOM values. They found a weak correlation between the cohesion
values and impact of change. They argued that the unexpected correlation result occurred
because LCC and LCOM are frequently misleading cohesion indicators. Chaumun et al.
(2002) performed a similar analysis to explore the correlation between the method
signature change and the WMC measure, and they concluded that the correlation is weak.
The studies by Kabaili et al. (2001) and Chaumun et al. (2002) have two main limitations.
First, they considered few internal quality measures, which raises questions about their
generality with regard to the correlation between each of the cohesion and size quality
attributes and changeability. Second, they did not account for the actual changes
performed on the considered systems during their maintenance history; therefore, the
performed experimental changes may not be representative of the actual changes.
Sheldon et al. (2002) extended the NOC and number of descendant classes (NOD)
measures to better estimate the maintainability of the class inheritance hierarchy.
However, they did not empirically validate the extended measures.
Dagpinar and Jahnke (2003) investigated the prediction of maintainability using quality
measures of OO systems. In their empirical study, two size measures (corresponding to
11
LOC and NOM), two inheritance measures (DIT and NOC), a cohesion measure (LCC),
and a set of coupling measures were applied to two Java systems with 27 and 180 classes.
They categorized the coupling measures as either import or export coupling, where
import coupling measures the coupling of a class using instances of other classes coupled
to the class of interest, and export coupling measures the coupling of a class using
instances of the class of interest coupled to other classes. They collected logs reporting
three years of maintenance history for each considered system and considered the number
of class revisions during its maintenance history as the maintenance measure. They
applied univariate linear regression analysis to explore the abilities of the individual
measures to predict maintainability and multivariate linear regression analysis on
combinations of measures to construct a maintainability prediction model. Their results
indicated that size and import coupling measures are significant maintainability
predictors, while inheritance, cohesion, and export coupling measures are not. This study
has a generality limitation due to the relatively low number of selected classes and
measures. That is, the considered systems, with low number of classes, do not represent
real projects. To obtain truly conclusive results, researchers must consider data from real
projects (Genero et al. 2005). In addition, it might be inaccurate to generalize the
conclusion regarding the relationship between maintainability and an internal quality
attribute using a single or low number of measures. For example, to get conclusive results
regarding the relationship between cohesion and maintainability, researchers must
consider multiple measures that consider different cohesion aspects and follow different
cohesion measuring approaches.
Aggarwal et al. (2006), Zhou and Leung (2007), Li-jin et al. (2009), and Elish and Elish
(2009) applied different statistical techniques to construct maintainability prediction
models using the same data collected by Li and Henry (1993), and they reached the same
conclusions. Consequently, these studies share the same limitations indicated for the Li
and Henry study.
Rizvi and Khan (2010) performed an empirical study that investigates the relationship
between class diagram maintainability and class diagram understandability and
modifiability. The study involved values, previously collected through controlled
experiments on 28 class diagrams, of understandability, modifiability, maintainability,
and eleven size and structural complexity measures. They applied multivariate linear
regression to construct models to estimate class diagram understandability and
modifiability using the eleven measures and to estimate class diagram maintainability
using understandability and modifiability attributes. Class diagram maintainability found
to be positively and strongly correlated to each of understandability and modifiability and
a corresponding significant maintainability model was constructed. The study indirectly
explored the relationship between maintainability on one side and size and complexity
attributes on the other side, but it did not investigate the relationship between
maintainability and other quality attributes such as cohesion and coupling.
Several measures have been proposed to predict the maintainability of non-object-
oriented systems such as service-oriented (Perepletchikov et al. 2007), Web (Chae et al.
2007), and functional systems (Ahn et al. 2003). Some researchers (Briand et al. 1998,
12
Briand et al. 2001, Gyimothy et al. 2005, and Marcus et al. 2008) were also interested in
investigating the abilities of some size, cohesion, and coupling quality measures to
predict a specific aspect related to maintenance, namely, fault proneness. Benestad et al.
(2006) performed a survey to research the assessment of OO class maintainability.
In this paper, we use the same maintainability indicators proposed by Li and Henry (1993)
and Dagpinar and Jahnke (2003), and we rely on the results of the existing empirical
studies (Granja-alvarez and Barranco-garcia 1997, Hayes et al. 2004) regarding the
relationship between the considered maintainability indicators and the maintenance cost
and effort. The number and sizes of the systems considered in this paper are larger than
those considered in similar studies (e.g., Li and Henry 1993, Dagpinar and Jahnke 2003,
Aggarwal et al. 2006, Zhou and Leung 2007, Li-jin et al. 2009, Elish and Elish 2009). In
addition, in this paper, we considered number of internal quality measures greater than
those considered in similar studies (e.g., Li and Henry 1993, Kabaili et al. 2001,
Chaumun et al. 2002, Dagpinar and Jahnke 2003, Aggarwal et al. 2006, Zhou and Leung
2007, Li-jin et al. 2009, Elish and Elish 2009). Opposite to the studies performed by
Kabaili et al. (2001) and Chaumun et al. (2002), in this paper, we accounted for actual
changes performed on the considered systems during their maintenance history. Finally,
this paper shows the application of logistic regression, a statistical technique which was
not applied by any of the surveyed papers, to predict several maintainability aspects.
4. Descriptive statistics
The empirical study considered three systems to explore the prediction of internal quality
measures for class maintainability. In this section, we describe the considered systems
and the data collection process. We also provide descriptive statistics of the considered
internal and external quality measures.
4.1. The software systems
In this empirical study, we considered three open-source Java software systems from
different domains, including Art of Illusion version 2.4.1 (Illusion 2012), FreeMind
version 0.8.0 (FreeMind 2012), and JabRef version 1.8 (JabRef 2012). The first system,
Art of Illusion, is a 3D modeling, rendering, and animation studio system. The second
system, FreeMind, is a hierarchical editing system. The third system, JabRef, is a
graphical application for managing bibliographical databases. These systems were
selected from http://sourceforge.net. Regarding the selection criteria, these systems had to
(1) be implemented using Java, (2) be relatively large in terms of the number of classes,
(3) be from different domains, (4) have available source code and maintenance
repositories, and (5) be relatively old versions that were actively maintained over a
considerable period of time. The variety of sizes and domains of the systems allows
commenting on the generality of the obtained results.
4.2. Maintenance data collection
In this empirical study, we considered two actual class maintenance measures: the
number of revisions in which the class was involved and the number of LOC revised
during the considered maintenance history. As Li and Henry (1993) suggested, a line
addition or deletion is considered a single line modification, and a change in line content
13
is counted as a deletion and an addition. We considered only concrete classes and ignored
abstract classes and interfaces because they do not have defined values for most of the
considered measures.
We collected maintenance data for classes in the considered software systems from
publicly available revision repositories in which their maintenance histories are
maintained and managed. The developers of the considered systems used two different
on-line Version Control System (VCS) tracking systems to track source code changes.
The changes, called revisions, are due to either detected faults or required evolutions. In
this empirical study, we did not differentiate between the different maintenance types, as
we are concerned with all maintenance tasks.
The VCS system revisions for the Art of Illusion and JabRef systems are organized using
the revision time stamp, whereas the VCS system revisions for the FreeMind system are
organized using the system package hierarchy. For the former organization method, each
revision is associated with the revision date and a report that includes the revision
description and a list of classes involved in this revision. For each class, the system
reports the revised code for that revision and identifies the differences between the
previous and current class versions, including the added, changed, and deleted lines of
code. For the latter organization method, the tracking system provides the package tree
hierarchy in which classes represent leaf nodes and provides the maintenance history of
any selected class. The history reports all class revisions, and, for each revision, the
history reports revision identification, date, description, and revised pieces of code (i.e.,
added, changed, and deleted lines of code).
The selected Art of Illusion version was issued four and a half years ago, and it was the
most recent system among those systems considered in this empirical study. The second
and third systems were maintained for six and a half years and eight years, respectively.
For each system, we collected the maintenance data reported during the entire
maintenance period, starting from the issuing date and ending on the date on which the
data were collected. Because of the different system ages, we performed an empirical
analysis on each system alone and could thus provide a general comment about the
impact of system age on the analysis results.
We created three corresponding empty files (henceforth called maintenance repositories)
for each considered class in each software system to include the added, changed, and
deleted lines of code during the selected maintenance history period. For each software
system, we manually traced each revision reported in the VCS tracking system; copied
the added, changed, and deleted lines of code during that revision to the corresponding
files of the maintenance repository; and headed the pasted lines of code with a comment
that indicates the revision identifier (for reference purposes). While tracing the VCS
tracking systems, we accumulatively collected the added, changed, and deleted lines of
code for each considered class and built our own maintenance repository.
Tracing the time-stamp-based VCS requires browsing the reported modifications for each
class involved in each individual revision. The time required to collect maintenance data
14
for systems that use time-stamp-based VCS thus depends on the numbers of revisions,
classes involved in each revision, and revised LOC in each class revision. Conversely,
tracing the package-hierarchy-based VCS requires browsing the tree-organized package
hierarchy from the root package to each leaf node representing a class in the considered
system. After reaching the class link, one must trace the list of class revisions and obtain
the reported modifications for each revision. The time required to collect maintenance
data for systems that use package-hierarchy-based VCS thus depends on the package
hierarchy complexity (i.e., the lengths of the paths from the root to the leaves of the
hierarchical tree and the number of nodes in the tree), the total number of revisions
performed on each class, and the number of revised LOC in each class revision.
A research assistant with a B.Sc. in computer science and nine years of experience in
software development activities manually traced the VCS tracking systems of the three
considered software systems and collected the data in the maintenance repository. The
author of the current paper randomly selected 10% of the classes, checked the correctness
of the work performed by the research assistant, and found that the maintenance data
collection was performed properly for the selected classes, thus increasing the confidence
that the collected data match what is reported in the VCS tracking system.
We developed our own Java tool to parse the three maintenance repository files that
included the added, changed, and deleted lines of code in each considered class. The tool
counted and reported, in an Excel sheet, the number of revisions and the number of
added, changed, and deleted lines of code for each class. A single class revision can
include some added, changed, and deleted lines of code. The data associated with such a
class revision are thus distributed among the three files in the maintenance repository. As
a result, our maintenance repository includes the same maintenance data that are reported
in the VCS tracking system, but organizes the maintenance data differently in a way that
simplifies the required maintenance data collection process. For each class in the original
version of a considered system (i.e., the version identified in Section 4.1), our
maintenance repository includes every added, deleted, and changed line of code during
the history of the class, from the date at which the original version of the class was issued
until the date at which empirical study was performed. In other word, for each class in an
original version of a considered system, our maintenance repository reports all detailed
changes in every subsequent version of the system up to the most recent one. To avoid
mistakenly counting this revision as three revisions, our tool compared the revision
identifiers added as comments in the maintenance repository files and counted such a
revision as a single revision. We followed the convention that adding or deleting a line is
counted as a single line change, and a change in line content is counted as both a deletion
and an addition (Zhou and Leung 2007, Elish and Elish 2009).
For each considered system, Table 3 reports the number of concrete classes, number of
LOC, and number and percentage of revised classes. Table 4 lists descriptive statistics for
each actual maintenance measure. The mean numbers of revisions and revised LOC,
shown in Table 4, indicate that FreeMind, with a maintenance age of six and a half years,
was the most actively maintained system among the three systems, although JabRef was
maintained longer (i.e., maintenance age of eight years). Figure 1 presents the number of
15
classes that feature each value of the number of revisions maintenance measure, and
Figure 2 shows the number of classes that exhibit each percentage range of the number of
revised LOC maintenance measures. For example, Figure 1 shows that 327 of the Illusion
classes, 253 of the FreeMind classes, and 191 of the JabRef classes were not involved in
any revision. Figure 2 shows that 73 of the Illusion classes had a percentage p of the
maximum number of revised LOC among the Illusion classes (i.e., 259 LOC, as shown in
Table 4), where 0%<p≤10%. As 10% of the maximum number of revised LOC among
the Illusion classes is 25.9, 73 of the Illusion classes had a number n of revised LOC,
where 0<n≤25.9. Figure 2 does not show the number of classes that did not have any
revised LOC during the selected maintenance history; such a number can be obtained
from Figure 1. For example, the number of Illusion classes that did not have any revised
LOC is the same as the number of Illusion classes that were not involved in any revisions
(327 classes).
Table 3: The descriptions of the Java systems in the dataset
Table 4: Descriptive statistics for the actual maintenance measures
Figure 1: Class distribution of the number of revisions measure
Metric System Min Max 25% Med 75% Mean Std. Dev.
No. of
revisions
Illusion 0 9 0 0 0 0.44 1.09
FreeMind 0 55 0 0 1 2.69 7.18
JabRef 0 31 0 0 2 1.30 3.22
No. of revised
LOC
Illusion 0 259 0 0 0 7.89 30.00
FreeMind 0 1852 0 0 3 55.06 184.74
JabRef 0 918 0 0 6 16.01 64.53
System No. of concrete classes LOC No. of revised classes
Illusion 430 72 K 103 (24%)
FreeMind 363 64 K 110 (30%)
JabRef 306 41 K 115 (38%)
16
Figure 2: Class distribution of the percentage of revised LOC measure
Figure 1 shows that a relatively high percentage of classes were not involved in any
revision and that a low percentage of classes were involved in relatively many revisions.
Figure 2 shows that a low percentage of classes had a large percentage of revised LOC
during their selected maintenance history period. There are several practical implications
of these observations. For example, the results indicate that, instead of spending equal
documentation efforts for the classes under development, software developers can
provide more detailed documentations for relatively low percentages of classes that are
expected to be highly revised during the maintenance history period. Such detailed
documentation is expected to reduce the code understanding effort required during the
maintenance stage. To achieve this goal, software developers need models to predict
classes with low maintainability before performing actual maintenance. These models
must be constructed based on the artifacts available during the software development
stage. This paper investigates the abilities of selected software internal quality measures
to predict classes with low maintainability. These measures are the independent variables
used to construct the required models. The measures are summarized in Section 2, and
their descriptive statistics are provided below.
4.3. Independent variables
Researchers have proposed many measures to quantify the internal quality attributes that
are considered in this empirical study. For the size attribute, we considered the three
measures that similar studies most commonly consider: LOC, NOM, and NOA. Existing
cohesion measures consider different cohesion aspects and apply different approaches to
measure class cohesion. We identified four main approaches, including (1) measuring
cohesion based on counting the number of distinct attributes accessed using the methods
of the class of interest, (2) measuring cohesion based on counting the number of cohesive
method pairs, (3) measuring cohesion based on quantifying the similarity degree between
each pair of methods according to the number of commonly accessed attributes, and (4)
measuring cohesion based on the connectivity degree between the methods and attributes
of the class of interest. To more comprehensively address the cohesion measuring
approaches, we selected two existing cohesion measures for each identified measuring
approach. That is, we selected Coh and CAMC to address the first approach, TCC and
LCC to address the second approach, LSCC and SCOM to address the third approach,
17
and PCCC and OL2 to address the fourth approach. Most of the selected measures satisfy
the necessary cohesion measure properties (Al Dallal 2010, 2012b, Al Dallal and Briand
2012).
We selected eight coupling measures that address different coupling aspects. We selected
CBO and its two extensions, CBO_IUB and CBO_U, because they consider the coupling
caused by accessing attributes and methods across different classes. RFC and MPC were
selected because they consider the coupling caused by method invocations. DAC1 and
DAC2 were selected because they account for the coupling caused by the attribute types
of the class of interest. Finally, we selected OCMEC because it considers the coupling
caused by the parameter types. CBO_IUB considers the coupling caused by the use of the
elements of the class of interest by other classes (i.e., import coupling), CBO_U
considers the coupling caused by using the class of interest to elements of other classes
(i.e., export coupling), and CBO accounts for both import and export coupling. We
selected measures to cover different measuring approaches to interpret the results when
some measures are found to be significant maintainability predictors and others are not.
We developed our own Java tool (QMT 2013) to automate the size, cohesion, and
coupling measurement processes using the selected measures. For each class in the
considered systems (the versions that are identified in Section 4.1), the tool analyzed the
Java source code; extracted the required data; calculated the size, cohesion, and coupling
values using the 19 considered measures; and reported the results in an Excel
spreadsheet. Some selected measures had undefined values for some classes; for
example, TCC and LCC were originally undefined when the class of interest had a single
method. For all such cases, the tool set the measure value according to the
recommendations proposed by Al Dallal (2011a), which modified the measures such that
they were always applicable. This modification allowed us to apply the empirical study to
all considered classes, which made the results more general. We applied the considered
measures on the original versions of the classes because our goal was to investigate
whether the collected values of the measures are statistically related to the number of
revisions and number of revised lines of code in the future history of the classes. This
application allowed us to explore the prediction abilities of the measures.
We applied the boxplot statistical technique (Rousseeuw et al., 1999) to the collected
quality data to detect outliers. A few outliers were detected for the size and coupling
measures. However, we did not exclude any collected data because we found that
removing outliers did not lead to significant differences in the final analysis results.
For Illusion classes, Tables 5 lists descriptive statistics for the cohesion, coupling, and
size measures, including the minimum, 25% quartile, mean, median, 75% quartile,
maximum value, and standard deviation. The corresponding results for the FreeMind and
JabRef classes are reported in Appendix A (Tables A.1 and A.2).
18
Table 5: Descriptive statistics of the 19 considered independent measures for Illusion
classes
5. Data analysis techniques used in the empirical study We performed an empirical study to investigate the practical ability of the size, cohesion,
and coupling measures to predict class maintainability. Based on the problems addressed,
we applied univariate and multivariate logistic regression statistical techniques to analyze
the collected data and build the maintainability prediction models. Logistic regression
(Hosmer and Lemeshow 2000) is a standard and mature statistical method based on
maximum likelihood estimation. This method is widely applied to predict other OO class
external quality attributes, such as class fault-proneness (e.g., Briand et al. 1998, Briand
et al. 2001b, Gyimothy et al. 2005, Marcus et al. 2008, Al Dallal and Briand 2012) and
class reusability (Al Dallal and Morasca 2012). Although we could have used other
analysis methods, including those discussed by Briand and Wust (2002), Subramanyam
and Krishnan (2003), and Arisholm et al. (2010), they are outside the scope of this paper.
The logistic regression model is univariate if it features only one independent variable
and multivariate if it includes several independent variables. In this case study, we
explored the abilities of the 19 considered quality measures to predict several
maintenance-dependent variables. Univariate regression is applied to study the
maintenance prediction capability of each measure separately, whereas multivariate
regression is applied to study the combined maintenance prediction of several measures.
An overview of the applied statistical techniques and model factors is provided in this
section.
Quality
attribute
Measure Min Max 25% Med 75% Mean Std. Dev.
Siz
e
NOM 1 91 4.00 8.00 15.00 11.74 11.73
NOA 0 114 2.00 6.00 12.00 8.53 9.97
LOC 6 2767 51.25 106.50 219.75 189.60 270.42
Co
hes
ion
Coh 0 1 0.21 0.38 0.74 0.48 0.32
CAMC 0 1 0.00 0.00 0.27 0.24 0.41
TCC 0 1 0.20 0.49 1.00 0.52 0.37
LCC 0 1 0.27 0.70 1.00 0.61 0.39
LSCC 0 1 0.05 0.15 0.56 0.33 0.37
SCOM 0 1 0.12 0.32 0.88 0.45 0.37
PCCC 0 1 0.00 0.00 1.00 0.34 0.46
OL2 0 1 0.00 0.00 0.24 0.24 0.41
Co
up
lin
g
CBO 0 208 3.00 7.00 12.00 11.86 19.31
CBO_IUB 0 207 0.00 1.00 3.00 6.05 18.60
CBO_U 0 26 2.00 4.00 8.00 5.82 5.18
RFC 0 413 8.00 21.00 46.75 32.90 38.81
MPC 0 1739 12.00 41.50 94.00 81.26 144.58
DAC1 0 55 1.00 2.00 5.00 4.20 6.21
DAC2 0 19 1.00 2.00 4.00 2.79 3.15
OCMEC 0 22 2.00 4.00 7.00 4.94 3.90
19
5.1. Dependent Variables
In logistic regression, explanatory or independent variables are used to explain and
predict dependent variables. A dependent variable can only take discrete values and is
binary when we predict the classes expected to be involved in revisions or exhibit
considerable revised LOC. In this empirical study, we considered three practical
problems: (1) predicting the classes expected to be involved in at least one revision, (2)
predicting the classes expected to be frequently revised, and (3) predicting the classes
expected to exhibit a considerable amount of revised LOC. During the software
development phase, developers spend considerable time and effort testing and
documenting the software. It should be more efficient for software engineers to spend
more testing and documenting time and effort on the classes expected to be frequently
revised than on those expected to be less frequently revised. Software engineers are thus
advised to concentrate less on testing and documenting the classes predicted not to
require revision; they are instead advised to focus more on testing and documenting the
classes predicted to be involved in many revisions and require considerable maintenance
costs. Because the average value is a typical representative statistical value, we
considered the class to be involved in a considerable number of revisions if the actual
number of revisions performed on the class during the considered maintenance history
period is greater than the average number of revisions among all classes involved in
revisions. Similarly, the class is considered to require considerable maintenance costs (in
terms of the number of revised LOC) if the actual number of revised LOC during the
considered maintenance history is greater than the average number of revised LOC
among all classes involved in revisions.
Based on the three considered practical problems, we considered the following three
dependent variables in the logistic regression analysis:
A revised class (RC) is a class that was involved in at least one revision during the
considered maintenance effort. The RC value was set to "1" when the class was
involved in one or more revisions; otherwise, the RC value was set to "0".
A frequently revised class (FRC) is a class that was involved in a number of
revisions that is greater than the average number of revisions among all revised
classes. The FRC value of such a class was set to "1"; otherwise, the FRC value
was set to "0".
A costly revised class (CRC) is a class whose number of revised LOC during the
maintenance history was greater than the average number of revised LOC among
all revised classes. The CRC value of such a class was set to "1"; otherwise, the
CMC value was set to "0". We defined the classes with relatively high numbers
of revised LOC as costly revised classes because these classes are expected to
require costly maintenance (Granja-alvarez and Barranco-garcia 1997).
Table 6 shows the distribution of classes that have different values for the three
dependent variables in the three considered systems. For example, the table shows that
the RC values of 327 (76%) Illusion classes were set to "0" (i.e., they were not involved
in any revision) and the RC values of the rest of the Illusion classes (i.e., 103 classes)
were set to "1" (i.e., they were involved in some revisions).
20
Table 6: Number and percentage of classes for each value of our discretized maintenance
variables
5.2. Classification performance
The logistic regression analysis results in a prediction model that uses the following
equation:
)...(21221101
1),...,,(
nn XCXCXCCne
XXX
In our context, π represent the probability that the class is expected to be revised, be
frequently revised, or require costly revisions when using the RC, FRC, or CRC,
respectively, as the dependent variable. Xis were quality measures, and the Ci coefficients
were estimated by maximizing a likelihood function (i.e., obtained using logistic
regression analysis) (Hosmer and Lemeshow 2000). In univariate regression analysis,
only a quality measure is used as an independent variable, and the prediction equation
becomes as follows:
)( 101
1)(
XCCe
X
For each model in our analyses, we report:
Intercept (denoted by c0 in the tables): intercept value estimated using logistic
regression analysis.
Coefficients (denoted by c1 in the tables): coefficient values of the independent
variables estimated using logistic regression analysis.
In practice, the π of each class of interest must be calculated. The software engineer must
pay more attention to classes with relatively high π values because these classes are
candidates for being revised, being frequently revised, or requiring costly maintenance. A
threshold t must be set for π to classify the classes accordingly and assess the
classification performance of a probability estimation model. All classes whose estimated
probability is less than or equal to t are classified as having an estimated value of the
discretized maintenance variable of Y = 0, and Y = 1 otherwise. This classification allows
us to obtain a 2×2 contingency table, as shown in Table 7. For example, cell [0,0]
contains the number of classes that were estimated to have estimated and actual Y values
of 0. The sums across the rows provide the number of Estimated Negatives and Positives,
and the sums across the columns produce the number of Actual Negatives and Positives.
System Value RC FRC CRC
Illusion 0 327 (76%) 388 (90.2%) 406 (94.4%)
1 103 (24%) 42 (9.8%) 24 (5.6%)
FreeMind 0 253 (69.7%) 324 (89.3%) 334 (92%)
1 110 (30.3%) 39 (10.7%) 29 (8%)
JabRef 0 191 (62.4%) 278 (90.8%) 279 (91.2%)
1 115 (37.6%) 28 (9.2%) 27 (8.8%)
21
Table 7: Contingency table
The determination of a threshold t for classification purposes is a subjective choice, and it
may dramatically change the classification results. Setting t to be "0" results in the
classification of all classes as having a discretized estimated value of 1; therefore, no
class is classified as Estimated Negative. Conversely, setting t to "1" causes all classes to
be classified as having a discretized estimated value of 0; therefore, no class is classified
as Estimated Positive. Table 6 shows that the distributions of the three independent
maintenance variables are concentrated (Morasca 2004). This observation clearly
demonstrates the inadequacy of using a 50% classification threshold because this
threshold would be too far from the true proportion of actual positives. The proportion of
actual positives may be a better choice than the default classification threshold (0.5) for
assessing the actual classification strength of an estimation model because the former
threshold uses information available from the field instead of relying on the arbitrary
threshold of 0.5. In our analyses, we set t to be the proportion of actual positives in the
considered data set, i.e., a class is classified as having a discretized maintenance variable
value of 1 if its estimated probability is greater than this proportion. For example, when
considering the Illusion classes and RC as a dependent variable, according to the values
given in Table 6, we set t to 0.24.
Based on this contingency table, we considered the following classification performance
indicators in our empirical study:
Precision (denoted in the results tables as P) = True Positives/Estimated Positives;
this indicator is not defined if there are no estimated positives;
Recall (denoted in the tables of results as R) = True Positives/Actual Positives;
Inverse precision (denoted IP) = True Negatives/Estimated Negatives; this
indicator is not defined if there are no estimated negatives;
Inverse Recall (denoted IR) = True Negatives/Actual Negatives.
Precision and recall (Olson and Delen 2008) are used to indicate the performance of the
model in correctly predicting the discretized maintenance variable Y = 1, whereas inverse
precision and inverse recall (Powers 2007) are applied to demonstrate the performance of
the model in correctly predicting the discretized maintenance variable Y = 0. These four
classification performance indicators depend on the value of the probability threshold
selected for classification. To evaluate the performance of a prediction model regardless
of any particular threshold, we instead used the receiver operating characteristic (ROC)
Actual Y
0 1
Estimated Y
0 True
Negatives
False
Negatives Estimated Negatives
1 False
Positives
True
Positives Estimated Positives
Actual
Negatives
Actual
Positives
22
curve (Hosmer and Lemeshow 2000). In this study, the ROC curve is a graphical plot of
the ratio of classes correctly classified with a maintenance variable of 1 versus the ratio
of classes incorrectly classified with a maintenance variable of 1 at different thresholds.
The area under the ROC curve (AUC) represents the ability of the model to correctly
rank classes based on the considered maintenance variable. A 100% ROC area represents
a perfect model that correctly classifies all classes, and larger ROC areas indicate that the
model is better at classifying classes. The AUC is often considered a better evaluation
criterion than standard precision and recall, as selecting a threshold is always somewhat
subjective. We applied the following general rules to assess the classification
performance according to the AUC value (Hosmer and Lemeshow 2000): AUC=0.5
means that the classification is not good, 0.5<AUC<0.6 means that the classification is
poor, 0.6≤AUC<0.7 means that the classification is fair, 0.7≤AUC<0.8 means that the
classification is acceptable, 0.8≤AUC<0.9 means that the classification is excellent, and
AUC≥0.9 means that the classification is outstanding. Thresholds based on the ROC
analysis for the selected measures are considered practical if they fall at least within the
acceptable range (Shatnawi et al., 2010). A measure might be found to be a statistically
significant maintainability predictor (p-value<0.05), but it could be determined to be an
impractical predictor according to the AUC.
5.3. Goodness-of-fit
To explore the goodness-of-fit of the constructed univariate and multivariate regression
models, we used the following indicators:
R2 = (L0-LL)/L0 (Hosmer and Lemeshow 2000), where LL is the log-likelihood
of the data in the model and L0 is the log-likelihood of the data in a model with
no independent variables. R2 represents the proportion of the “unexplained” log-
likelihood of a model, including independent variables in the log-likelihood of a
constant model. The value of R2 ranges between 0 and 1. For technical reasons,
high values of R2 are rare, even for accurate models.
Mean squared error (MSE), which is the average of the squared differences
between the probability values estimated by a logistic regression model and the
actual values of the discretized maintenance variable used.
5.4. Model Validation
To more realistically assess the constructed models' predictive capacities, we used V-
cross-validation, a procedure in which a data set is partitioned into k sub-samples. The
regression model is then built and evaluated k times. Each time, a different sub-sample is
used to evaluate the classification performance, and the remaining sub-samples are then
used as training data to build the regression model. We applied the V-cross-validation
technique to build each univariate and multivariate regression model considered in this
empirical study. When building each model, we applied 10-times 10-fold cross-validation.
To provide evidence that the multivariate models have, in practice, reasonable
performance in predicting the considered maintenance factors, in addition to applying V-
cross-validation, we also validated the multivariate regression models by exploring their
performances when they are applied to classes other than those used to build the models.
23
6. Univariate Regression Analysis Results
For each considered system, we constructed three families of univariate regression
models, one for each discretized maintenance variable: RC, FRC, and RCR. Each model
family contains a single univariate model for each independent variable that was found to
be statistically significant. This analysis aims to investigate the relation between the
maintainability external quality attribute, indicated by the three maintenance variables,
and the three considered internal quality attributes, i.e., size, cohesion, and coupling,
indicated by the 19 considered measures. The results of the univariate regression analysis
are presented in tables, and their characteristics are discussed in the following order:
Statistically significant independent variables: We present the independent
variables found to be statistically significant.
Direction of independent variable impact: We discuss whether the statistically
significant independent variables influence the estimated probability positively or
negatively, where independent variables with a positive (negative) coefficient
influence the estimated probability positively (negatively).
Goodness-of-fit: We discuss the obtained MSE and R2.
Classification performance: We present the precision, recall, inverse precision,
inverse recall, and AUC.
6.1. Univariate analysis of the RC
Table 8 reports the results of the univariate RC prediction models based on the Illusion
classes. Appendix B provides the results based on classes in the other two systems
(Tables B.1 and B2). The reported results lead to the following observations.
Table 8: Univariate results for the RC using Illusion classes
Measure c0 c1 MSE R2 p-value P R IP IR AUC
NOM -1.743 0.046 0.173 0.051 < 0.0001 0.345 0.476 0.813 0.716 0.646
NOA -1.636 0.052 0.173 0.042 < 0.0001 0.319 0.447 0.801 0.700 0.591
LOC -1.660 0.002 0.171 0.061 < 0.0001 0.418 0.495 0.831 0.783 0.687
Coh -0.660 -1.106 0.179 0.018 0.004 0.300 0.718 0.842 0.471 0.626
CAMC -0.484 -1.901 0.179 0.026 0.001 0.294 0.660 0.824 0.502 0.610
TCC -0.587 -1.191 0.178 0.031 <0.001 0.290 0.583 0.807 0.550 0.620
LCC -0.782 -0.633 0.182 0.010 0.027 0.285 0.515 0.795 0.593 0.596
LSCC -0.933 -0.724 0.181 0.011 0.030 0.273 0.757 0.826 0.364 0.638
SCOM -0.652 -1.243 0.176 0.030 <0.001 0.292 0.709 0.833 0.459 0.643
PCCC -0.856 -1.084 0.177 0.034 <0.001 0.296 0.816 0.870 0.388 0.668
OL2 -0.968 -0.955 0.179 0.021 0.004 0.276 0.854 0.865 0.294 0.570
CBO -1.393 0.019 0.178 0.024 0.002 0.371 0.379 0.803 0.798 0.665
CBO_IUB -1.236 0.012 0.184 0.010 0.033 0.295 0.175 0.770 0.869 0.510
CBO_U -1.810 0.102 0.173 0.050 < 0.0001 0.356 0.563 0.831 0.679 0.670
RFC -1.891 0.020 0.166 0.086 < 0.0001 0.422 0.602 0.855 0.740 0.724
MPC -1.591 0.005 0.170 0.064 < 0.0001 0.430 0.476 0.829 0.801 0.713
DAC -1.535 0.082 0.172 0.045 < 0.0001 0.370 0.456 0.815 0.755 0.609
DAC2 -1.715 0.179 0.169 0.058 < 0.0001 0.393 0.447 0.818 0.783 0.606
OCMEC -1.777 0.116 0.176 0.037 < 0.0001 0.353 0.524 0.823 0.697 0.614
24
Statistically significant independent variables: Except for the TCC measure
applied to FreeMind classes, the univariate results show that each of the
considered size, cohesion, and coupling measures was found to be a statistically
significant RC predictor.
Direction of independent variable impact: Each of the considered size and
coupling measures appears to have a positive impact on RC, and a class with
higher values of these measures has a higher probability of being revised during
the maintenance phase. The considered cohesion measures seem to negatively
affect RC, which implies that a class with higher cohesion measure values has a
lower probability of being revised during the maintenance phase.
Goodness-of-fit: The MSE values are relatively small. Such an observation is
expected because of the concentration of the RC distribution (Table 6), which
implies that the probability estimates are also concentrated. The R2 values are
often relatively low, which implies that the models have low goodness-of-fits.
Classification performance: Due to the concentration of the distribution on RC =
0, most models exhibit low precision and recall values, i.e., the models have a low
percentage of True Positives among the observations that actually or are estimated
to have RC = 1. Conversely, the percentage of True Negatives among the
Estimated Negatives and Actual Negatives is relatively high for most models.
This observation indicates that most models are good predictors for classes that
have low probabilities of being revised during the maintenance phase. Most AUC
values do not exceed the “fair” category, which indicates that most obtained
models are not practical classifiers for the revised and unrevised classes; a few
models, such as those based on MPC and RFC, were always found to be
acceptable classifiers.
6.2. Univariate analysis of the FRC
Table 9 presents the results of the univariate models based on Illusion classes using the
FRC as the discretized maintenance variable. Appendix B (Tables B.3 and B.4) provides
the corresponding results for the classes of the other two considered systems. The results
lead to the following observations.
Statistically significant independent variables: All considered size measures
appear to be statistically significant predictors of FRC. In addition, most
considered cohesion and coupling measures are always found to be statistically
significant FRC predictors. The Coh, LSCC, and CBO_U measures appear to be
statistically significant FRC predictors for classes in two of the three considered
systems. Finally, TCC and LCC are always found to be nonsignificant FRC
predictors.
Direction of independent variable impact: As in RC, all statistically significant
size and coupling measures positively affect FRC, and all statistically significant
cohesion measures negatively affect FRC.
Goodness-of-fit: The MSE values are smaller than those obtained for RC, and the
R2 values are mostly higher than those obtained for RC. This observation implies
that the FRC models exhibit better goodness-of-fits than RC models.
Classification performance: The precision values of the statistically significant
models are quite low, and they are always lower than those of the RC models. In
25
many cases, the models' recall values are lower than those of the RC models.
Conversely, the inverse precision values of the statistically significant models are
quite high, and they are always higher than those of the RC models. In many
cases, the inverse recall values are higher than those of the RC models. These
observations imply that the statistically significant FRC models can be trusted
more when they estimate that a class will not be frequently revised than when
they predict that a class will be frequently revised. Several models have AUC
values in the “excellent” category, and the remaining statistically significant
models have AUC values in either the "fair" or "acceptable" ranges. In many
cases, the obtained models are practical classifiers for classes that are frequently
revised and those that are not.
Table 9: Univariate results for the FRC using Illusion classes
6.3. Univariate analysis of the CRC
Table 10 contains the results of the univariate models based on Illusion classes using the
CRC as the discretized maintenance variable. Appendix B (Tables B.5 and B.6) provides
the corresponding results based on classes in the other two systems . The reported results
lead to the following observations.
Measure c0 c1 MSE R2 p-value P R IP IR AUC
NOM -3.010 0.053 0.082 0.083 < 0.0001 0.222 0.524 0.940 0.802 0.696
NOA -3.029 0.074 0.079 0.100 < 0.0001 0.218 0.571 0.944 0.778 0.657
LOC -2.982 0.003 0.079 0.129 < 0.0001 0.262 0.524 0.942 0.840 0.773
Coh -1.749 -1.091 0.087 0.014 0.057 0.124 0.690 0.934 0.472 0.636
CAMC -1.069 -3.669 0.085 0.061 <0.001 0.162 0.762 0.957 0.575 0.698
TCC -1.873 -0.730 0.088 0.010 0.103 0.101 0.476 0.905 0.539 0.580
LCC -2.203 -0.033 0.088 0.000 0.936 NA 0.000 0.902 1.000 0.554
LSCC -2.042 -0.601 0.088 0.006 0.216 0.109 0.714 0.923 0.369 0.633
SCOM -1.774 -1.156 0.087 0.021 0.023 0.131 0.738 0.943 0.469 0.638
PCCC -1.904 -1.377 0.086 0.038 0.005 0.127 0.857 0.959 0.361 0.734
OL2 -2.022 -1.201 0.087 0.023 0.030 0.116 0.881 0.955 0.276 0.628
CBO -2.584 0.024 0.085 0.054 <0.001 0.184 0.333 0.921 0.840 0.700
CBO_IUB -2.402 0.020 0.085 0.038 0.001 0.173 0.214 0.913 0.889 0.616
CBO_U -2.897 0.097 0.087 0.044 <0.001 0.176 0.571 0.939 0.711 0.696
RFC -3.235 0.023 0.081 0.141 < 0.0001 0.243 0.643 0.953 0.784 0.804
MPC -2.867 0.006 0.081 0.125 < 0.0001 0.280 0.548 0.945 0.848 0.811
DAC -2.904 0.117 0.079 0.120 < 0.0001 0.247 0.429 0.933 0.858 0.697
DAC2 -3.202 0.254 0.075 0.137 < 0.0001 0.197 0.548 0.939 0.758 0.693
OCMEC -3.263 0.171 0.083 0.084 < 0.0001 0.182 0.524 0.935 0.745 0.699
26
Table 10: Univariate results for the CRC using Illusion classes
Statistically significant independent variables: As in FRC, the TCC and LCC
models always appear to be statistically nonsignificant CRC predictors. A size
measure, NOA, and a coupling measure, DAC, were found to be statistically
significant CRC predictors using only classes in Illusion systems. The cohesion
measures, PCCC and OL2, and coupling measures, CBO, CBO_IUB, and CBO_U,
were found to be statistically significant CRC predictors using classes of two of
the three considered systems. The remaining size, cohesion, and coupling
measures (10 measures) always appear to be statistically significant CRC
predictors.
Direction of independent variable impact: As with both the RC and FRC, all
statistically significant size and coupling measures positively affect the CRC, and
all statistically significant cohesion measures negatively affect the CRC.
Goodness-of-fit: With few exceptions, the MSE values are somewhat smaller than
those obtained for the RC and FRC. In many cases, the R2 value of the models
obtained for the CRC appear to be slightly higher than those obtained for the RC
and FRC. These observations indicate that most statistically significant models
obtained for the CRC have better goodness-of-fit values than those obtained for
the RC and FRC.
Classification performance: The precision values of the statistically significant
models are quite low for all considered measures, most recall and inverse recall
values are greater than 50%, and the inverse precision values of the statistically
significant models are always high (i.e., greater than 90%). These observations
imply that the statistically significant CRC models can be trusted more when they
estimate the classes that do not require costly revisions than when they predict the
classes that do. Most obtained models have AUC values in either the "acceptable"
Measure c0 c1 MSE R2 p-value P R IP IR AUC
NOM -3.556 0.047 0.051 0.074 0.000 0.115 0.458 0.961 0.791 0.744
NOA -3.838 0.083 0.045 0.147 < 0.0001 0.141 0.458 0.963 0.835 0.725
LOC -3.737 0.003 0.047 0.182 < 0.0001 0.221 0.625 0.975 0.869 0.885
Coh -1.773 -2.838 0.051 0.062 0.004 0.097 0.792 0.979 0.564 0.716
CAMC -1.745 -3.464 0.052 0.049 0.008 0.095 0.708 0.972 0.601 0.680
TCC -2.411 -0.898 0.053 0.013 0.126 0.053 0.375 0.942 0.603 0.586
LCC -2.726 -0.170 0.053 0.001 0.751 0.043 0.167 0.940 0.778 0.563
LSCC -2.305 -2.336 0.052 0.046 0.021 0.086 0.875 0.984 0.448 0.703
SCOM -1.971 -2.716 0.051 0.071 0.003 0.099 0.833 0.982 0.549 0.706
PCCC -2.400 -2.832 0.051 0.077 0.015 0.082 0.958 0.993 0.367 0.764
OL2 -2.467 -762.6 0.051 0.090 <0.001 0.078 1.000 1.000 0.303 0.639
CBO -2.956 0.009 0.053 0.007 0.200 0.091 0.167 0.948 0.901 0.686
CBO_IUB -2.866 0.005 0.053 0.002 0.533 0.091 0.125 0.947 0.926 0.611
CBO_U -3.458 0.090 0.052 0.035 0.008 0.110 0.625 0.969 0.702 0.686
RFC -3.956 0.023 0.049 0.169 < 0.0001 0.167 0.625 0.974 0.815 0.840
MPC -3.624 0.006 0.048 0.178 < 0.0001 0.246 0.583 0.973 0.894 0.877
DAC -3.612 0.117 0.047 0.152 < 0.0001 0.182 0.500 0.967 0.867 0.734
DAC2 -3.803 0.236 0.047 0.131 < 0.0001 0.133 0.458 0.963 0.823 0.697
OCMEC -4.147 0.201 0.050 0.117 < 0.0001 0.140 0.708 0.977 0.744 0.782
27
or "excellent” categories, and the remaining statistically significant models have
AUC values in the "fair" range. This observation implies that most obtained
statistically significant CRC models are practical classifiers for classes that
require or do not require costly revisions.
6.4. Univariate analyses: discussion
The above results show that the considered independent variables can be used to
construct statistically significant univariate predictor models. The statistically significant
size and coupling measures were consistently found to positively affect all three
discretized maintenance variables, whereas the statistically significant cohesion measures
appeared to negatively affect the considered maintenance variables. These observations
indicate that classes with better quality (i.e., smaller sizes, lower coupling, and higher
cohesion) potentially have higher maintainability (i.e., are less frequently revised and
have a fewer number of revised LOC) than those with worse quality.
Beside the importance of the univariate analyses in exploring the direction and strength
of the relationship between the considered internal quality attributes and maintainability
factors, we made use of the univariate analyses results in selecting the measures to be
included in the construction process of the multivariate prediction models. As will be
illustrated in the next section, in the forward model construction approach, we made use
of the AUC values obtained in the univariate analysis to decide the order in which the
measures are included in the constructed models. The measures of higher AUC values
were given the higher priority to be included in the constructed models.
7. Multivariate analysis
This multivariate analysis aims to construct practical and optimized models that include
multiple quality measures to predict the three considered dependent maintenance
variables, namely, RC, FRC, and CRC. The analysis explores how well the models of
measures used in combination can predict the required dependent maintenance variables.
The inclusion of all measures in the model can produce the best result for the AUC value.
However, this strategy results in a model with a relatively high MSE, which means that
the model is highly dependent on the data set. This result contradicts our goal of having a
general model that can perform well with any given data set. In addition, a model that
includes all considered measures is difficult to use in practice because it exhausts the
software engineer’s time by applying all measures. To solve these problems, a set of
measures must be selected to construct an optimized model. These measures must be
selected to reduce the multicollinearity in the model, i.e., the existence of highly
correlated measures (Hosmer and Lemeshow 2000). Therefore, the constructed model
does not necessarily include the measures found to be the best individual predictors;
depending on the correlations between them, the measures existing together may increase
the multicollinearity in the model. Two general approaches are followed to construct an
optimized model: forward and backward selection (Hosmer and Lemeshow 2000). In the
forward selection approach, the model construction process starts with no variables in the
model, and the variables are added individually if they are statistically significant in the
prediction model. In the backward selection approach, the model construction process
starts with a model that includes all independent variables, and the statistically
28
nonsignificant variables are sequentially removed from the model. Different researchers
have used different statistical criteria to select the variables to add or delete.
In this research, we tried both construction approaches. The first experiment was based
on the backward selection approach. We selected the p-value to be the criterion for
removing measures from the model. In each step, we applied multivariate regression
analysis and removed the measure that had the highest p-value from the model. The
process continued until each of the remaining measures had a p-value above α (0.05).
This backward-based experiment does not take the AUC values into account. The second
experiment was based on the forward selection approach. First, we ordered the measures
in a descending fashion according to the AUC reported in Section 6 (last column in
Tables 8, 9, and 10). On the basis of this order, in each step, a measure was added to the
model and the regression analysis was performed again using the measures that existed in
the model at that moment. If the added measure was found to have a p-value greater than
the 0.05, the measure would be removed from the model. In addition, if the added
measure caused a measure already in the model to become insignificant (p-value>0.05),
the measure would be deleted from the model. Because our forward-based model-
construction process considered the AUC values of the individual measures, we noticed
that the resulting models had AUC values that were better than those of the models
constructed using the backward approach. At the same time, the multicollinearities in the
forward models were relatively low due to the consideration of the p-values while
constructing the models. Therefore, because of space limitations, in this section we only
report, discuss, and compare the results of the forward-based models.
We tested the multicollinearity in the models by obtaining the variance inflation factor
(VIF) (O’Brien 2007), a widely used measure of the multicollinearity of a variable with
other variables in the model. We used the rule of thumb that a value of four indicated a
multicollinearity problem. We used the Mahalanobis Distance (Barnett and Lewis 1994)
to detect outliers in the models that include multiple independent variables. However, we
found that the removal of outliers did not lead to significant differences in the final
multivariate regression analysis results.
We constructed three models, one for each considered variable, namely, RC, FRC, and
CRC. The models were constructed using classes in the Illusion system. This system had
the largest number of considered classes; therefore, its models should be more general
than those constructed using the other systems. However, we constructed the prediction
models using the classes in each of the other two systems and reported the results in
Appendix C. As discussed in Section 5.4, we applied two validation approaches. In the
first approach, we applied the V-cross-validation statistical technique. In the second
approach, we explored the classification performance of the three constructed models
when applied to classes in the other two systems. In this case, we applied the equation of
the multivariate model constructed using the Illusion classes to each class in the other two
systems. We used the threshold suggested for the Illusion system on the π values obtained
for the classes in the other two systems to determine whether each class was estimated to
have low maintainability. This process resulted in the 2×2 contingency table, as in Table
7, for the classes of each system. Therefore, we could identify the precision, recall,
29
inverse precision, and inverse recall values for each system. These values were used to
evaluate the classification performance of the model constructed using the Illusion
classes when applied to the classes of the other two systems. To simplify this evaluation,
we calculated the weighted average precision (WAP) and weighted average recall (WAR)
using the following formulas:
.)(*_)(*
and )(*_)(*
TPFNFPTN
FPTNrecallinverseTPFNrecallWAR
TPFNFPTN
FPTNprecisioninverseTPFNprecisionWAP
The WAP and WAR values are still dependent on the selected threshold for the model,
but they are more representative of the confusion matrix values than the traditional
precision and recall values. Finally, we calculated the FMeasure, defined as the harmonic
mean of WAP and WAR, and used it as a single representative value to compare the
classification performance of the model when applied to the original data set (Illusion
classes) to that obtained when applying the model to the validation data sets.
7.1. Multivariate analysis of the RC
Tables 11 and 12 present the results for the statistically significant multivariate model
using RC as the discretized maintenance variable. The model included three measures.
The first row in Table 12 shows the classification performance of the model when applied
to the original data set used to construct the model. The second and third rows provide
the classification performance of the model when applied to the classes of the other two
systems.
Table 11: RC-based forward multivariate regression model.
Table 12: Multivariate analysis of the RC: classification performance and validation
The results in Tables 11 and 12 lead to the following observations.
Statistically significant independent variables: The model has three statistically
significant measures. Two of them are coupling measures (RFC and CBO), and
one is a cohesion measure (TCC). However, the coefficient values show that the
contribution of the TCC measure to the model is much higher than those of the
coupling measures. No statistically significant independent variable is a size
System P R IP IR FMeasure
Illusion 0.412 0.680 0.873 0.694 0.725
FreeMind 0.511 0.209 0.726 0.913 0.680
JabRef 0.679 0.461 0.728 0.869 0.713
Metric Coefficient p-value VIF MSE R2 AUC
Intercept -1.484 < 0.0001 -
0.164 0.117 0.747 RFC 0.018 < 0.0001 1.069
CBO 0.014 0.022 1.047
TCC -1.079 0.003 1.026
30
measure. The model is free of multicollinearity problems (i.e., all values of VIF
are less than the value “4”).
Direction of independent variable impact: The coupling measures still positively
affect RC, and the cohesion measure still negatively affects RC.
Goodness-of-fit: The MSE (0.164) is slightly smaller than the values obtained for
the univariate models, which were already small. The R2 value (0.117) is much
larger than those obtained in the univariate models, which implies a much better
goodness-of-fit.
Classification performance: The model still features a low precision value.
However, the classification performance values of the model are better than most
of those obtained using univariate models. The model has an AUC of 0.747
(acceptable), which is higher than any AUC values obtained using RC univariate
models.
Validation: The model has higher precision and inverse recall values and lower
recall and inverse precision values when applied to other data sets. However, it is
normal to have worse classification performance results when applying the model
to FreeMind and JabRef classes, as these classes are not within the data set used
to construct the model. As discussed in Section 5.2, these results depend on the
selected classification threshold. Overall, the FMeasure values show that the
model, when applied to other classes, has a classification performance similar to
that obtained when the model is applied to its original data set. This observation
provides confidence that the model behaves well when applied in practice to
predict the classes estimated to be involved in revisions.
7.2. Multivariate analysis of the FRC
Tables 13 and 14 present the results of the statistically significant multivariate model
using FRC as the discretized maintenance variable. The model included four measures.
Table 13: FRC-based forward multivariate regression model.
Table 14: Multivariate analysis of the FRC: classification performance and validation
The results in Tables 13 and 14 lead to the following observations.
System P R IP IR FMeasure
Illusion 0.304 0.667 0.944 0.785 0.855
FreeMind 0.278 0.128 0.901 0.960 0.852
JabRef 0.341 0.536 0.950 0.896 0.878
Metric Coefficient p-value VIF MSE R2 AUC
Intercept -4.690 < 0.0001 -
0.076 0.209 0.836 RFC 0.021 0.001 2.029
CBO 0.022 0.002 1.065
DAC 0.079 0.008 1.925
Coh 1.742 0.022 1.342
31
Statistically significant independent variables: The model has four statistically
significant measures. Three of them are coupling measures (RFC, CBO, and
DAC), and one is a cohesion measure (Coh), which had a much higher
contribution to the model than the coupling measures. As in the RC multivariate
model, no statistically significant independent variable in the FRC model is a size
measure, and the model is free of multicollinearity problems.
Direction of independent variable impact: As in the corresponding univariate
models, the coupling measures positively affect FRC in the multivariate model.
However, the cohesion measure features different impact directions than the
univariate models, which is likely due to the association between the Coh and
coupling measures in the multivariate model.
Goodness-of-fit: The MSE value (0.076) is smaller than most values obtained in
the univariate models. There is a sharp increase in the R2 value (0.209), which
implies that this model has a much better goodness-of-fit than the univariate
models.
Classification performance: The multivariate model has a better precision value
than any univariate model and better recall, inverse precision, and inverse recall
values than most univariate models. The model also has an AUC value of 0.836
(excellent), which is higher than any obtained using univariate FRC models.
Validation: The validation results show that the FRC prediction model
constructed using the Illusion class data set has lower precision, recall, and
inverse precision values, and it has a higher inverse recall value when applied to
FreeMind classes than to its original data set. However, the prediction model was
found to have better precision, inverse precision, and inverse recall values for
JabRef classes than for the original data set. The FMeasure values show that the
constructed model has a relatively high classification performance when applied
to data sets other than the data set from which it was constructed. This
observation suggests that the constructed model is potentially useful in practice.
7.3. Multivariate analysis of the CRC
Tables 15 and 16 report the results of the statistically significant multivariate model using
CRC as the discretized maintenance variable. The model included two measures.
Table 15: CRC-based forward multivariate regression model.
Table 16: Multivariate analysis of the CRC: classification performance and validation
System P R IP IR FMeasure
Illusion 0.200 0.667 0.977 0.842 0.880
FreeMind 0.300 0.103 0.926 0.979 0.892
JabRef 0.276 0.296 0.931 0.925 0.871
Metric Coefficient p-value VIF MSE R2 AUC
Intercept -3.991 < 0.0001
0.047 0.212 0.871 LOC 0.002 0.001 1.619
DAC 0.072 0.014 1.619
32
The results in Tables 15 and 16 lead to the following observations.
Statistically significant independent variables: The model has two statistically
significant measures. One is a size measure, LOC, and the other is a coupling
measure, DAC, with the LOC measure having a higher contribution to the model
than the DAC measure. No statistically significant independent variable in the
CRC model is a cohesion measure. The model is free of multicollinearity
problems.
Direction of independent variable impact: As in the corresponding univariate
models, LOC and DAC positively affect the CRC in the multivariate model.
Goodness-of-fit: The MSE value (0.047) is smaller than most values obtained in
the univariate models. The R2 value (0.212) is much higher than those obtained in
the univariate models, which implies a much better goodness-of-fit.
Classification performance: The multivariate model has better precision, recall,
inverse precision, and inverse recall values than most univariate models. The
model also has an AUC value of 0.871 (excellent), which is higher than any value
obtained using univariate FRC models, with the exception of the MPC model.
Validation: As usually expected, the precision, recall, inverse precision, and
inverse recall values are sometimes higher when the CRC model is applied to the
original data set than to other systems. However, the FMeasure values indicate
that the classification performance of the model, when applied to systems other
than the system used to construct the model, is potentially high and can be trusted.
7.4. Multivariate analyses: discussion
The multivariate regression analysis shows that it is possible to construct statistically
significant multivariate models with better goodness-of-fit values and classification
performances than those obtained using univariate analysis. The constructed multivariate
models are more general than the univariate models because the former models consider
different quality aspects related to the problem of interest, as well as because a single
measure cannot capture all such aspects. Using Illusion classes, each RC and FRC
prediction model included cohesion and coupling measures, and the CRC model included
size and coupling measures. The RC prediction models that were constructed using
FreeMind and JabRef classes included size, cohesion, and coupling measures, and the
FRC and CRC prediction models included cohesion and coupling measures. This
observation shows that the considered quality attributes are complementary for predicting
the considered maintainability variables. Most excluded measures were found to be
statistically significant predictors when used alone to predict the maintenance variables in
the univariate analysis. However, these measures were excluded because of the
associations between them and the measures left in the models. Although all models have
coupling measures and do not always include size and cohesion measures, the coupling
measures had much lower contributions than the other measures left in the model.
Practically, this observation implies that software developers must pay attention to three
quality attributes, namely, size, coupling, and cohesion, to improve different class
maintainability aspects when developing OO classes.
33
Typically, multivariate models are general in their applicability. This expectation is
confirmed by the validation results, which demonstrated that the constructed models had
reasonable classification performances for the classes of systems other than those used to
construct the models.
8. Threats to validity The empirical study presented has several threats to the internal and external validity for
the selected systems to which the study was applied and the selected measures. These
threats may restrict the generality and limit the interpretation of our results.
8.1. Internal validity
The collected maintenance data greatly depend on the considered ages of the systems.
The chances for a class to be revised are expected to increase with time because systems
typically evolve and more faults are detected over time. However, one of our criteria for
selecting systems is that the systems must have been actively revised over a reasonable
maintenance history. We selected systems of different ages and found that their results
led to the same general conclusions, which gave us confidence that the collected
maintenance data were reliable. The maintenance data, which were available online for
the three selected systems, were collected from the CVS systems under the assumption
that all revisions performed during the maintenance history were reported in the CVS
systems. Although this empirical study does not consider many existing size, cohesion,
and coupling measures, the selected measures consider many measuring approaches and
quality aspects. Although different lines of code can have different maintenance costs and
efforts, we considered the modified lines of code equally because it is difficult to measure
the exact maintenance cost and effort for each modified line of code. It is important to
note that, in this paper, we did not use the number of revised lines of code and number of
revisions to measure the maintenance cost and effort but to estimate them. We followed
the convention that an addition or a deletion of a line of code is considered a single line
change, and a change in line content is considered both a deletion and an addition.
8.2. External validity
The first external threat to validity is that all considered systems are implemented in Java.
Other object-oriented programming languages, e.g., C++, have features that differ from
those in Java, such as allowing multiple inheritance and destructor declarations. No
considered measure accounts for inherited attributes and methods; therefore, the
inheritance issue is not expected to alter the results and conclusions drawn in this paper.
However, the inclusion of destructors can affect the class size, cohesion, and coupling
values, which can thus also affect the ability of measures to predict class maintainability.
However, the effect of including or excluding special methods (i.e., constructors,
destructors, and access methods) on the quality measurement is outside the scope of this
paper. In this paper, we instead investigated the impact of the considered existing
measures, as originally defined, on class maintainability.
The second external threat to validity is that all three considered systems are open-source
systems, which may not be representative of all industrial domains. However, the use of
open-source systems in empirical studies is a common practice in the research
34
community (Mockus et al. 2002, Lavazza et al. 2012). Although differences in design
quality and reliability between open-source and industrial systems have been investigated
(e.g., Samoladas et al. 2003, Samoladas et al. 2008, Spinellis et al. 2009), there is no clear
and general result on which we can rely. To compare maintainability of the selected
systems with those of other systems, including proprietary and open-source systems, one
can use the Software Improvement Group (SIG) system, which includes a repository of
hundreds of systems (Baggen et al. 2012). The SIG system compares a system of interest
with those in the repository in terms of certain aspects of maintainability and ranks the
system accordingly. However, the application of SIG system may require an interaction
with the developers of the system of interest, which might be infeasible for the selected
open-source systems.
The third external threat to validity is that the selected systems may not be representative
in terms of their class numbers and sizes. However, the selected systems are not artificial
examples. The number of considered systems and their sizes in terms of LOC and the
number of classes are also comparable with those considered in similar empirical studies
(Briand et al., 2002, Counsell et al., 2006, Marcus et al., 2008). In our empirical analyses,
we paid considerable attention to factors related to the significance of the collected data
and results, as discussed in Sections 5, 6, and 7. The small p-values obtained for most
considered measures indicate that, using the considered classes, there is actually enough
evidence to draw the obtained results.
The fourth external threat to validity is applying our own developed tool to analyze Java
classes and obtain the quality values instead of using an existing tool. We did not find an
existing tool that automates all of the measures considered in this paper. Therefore, using
an existing tool requires reverse engineering the tool and extending it to consider the
additional measures, which might require more development time than developing a tool
from scratch. However, we applied other existing tools such as CKJM (CKJM 2013) and
compared the results of the common measures with ours. We found most of the
corresponding values identical, which gave us confidence about the results of our tool.
To generalize our results, different systems written in different programming languages,
selected from different domains, and including both real-life and large-scale systems
should be considered in similar large-scale evaluations.
9. Conclusions and future work This paper investigates the relationships between three key class internal quality
attributes, i.e., size, coupling, and cohesion, and class maintainability, which is an
external quality attribute of interest to software practitioners. We empirically explored the
abilities of 19 selected size, coupling, and cohesion quality measures, considered both
individually and in combination, to predict two class maintainability aspects, namely, the
number of revisions performed on the class and the number of revised LOC during the
maintenance phase. The empirical study involved classes from three open-source Java
systems. We applied univariate logistic regression analysis to explore both the abilities of
the individual measures to predict class maintainability and the relationships between the
individual measures and considered maintainability aspects. We also used multivariate
35
logistic regression analysis to study the abilities of measure combinations to predict class
maintainability and construct the corresponding practical models.
With different abilities to predict maintainability, ranging from poor to excellent, the
univariate regression analysis results showed that most considered measures are
significant predictors of both considered maintainability aspects. The results generally
provided empirical evidence that, with regard to both considered class maintainability
aspects, the size and coupling quality attributes have positive impacts, and the cohesion
quality attribute has a negative impact.
From a practical perspective, the results indicate that developers can enhance the
maintainability (i.e., reduce the maintenance efforts and cost) of their developed classes
by reducing the classes' sizes and coupling and increasing their cohesion. When applying
the constructed multivariate models, developers must pay more attention to increasing
cohesion and decreasing size than decreasing coupling because, as discussed in Section 7,
size and cohesion have higher contributions to the constructed models in terms of their
considered maintainability aspects than coupling.
The multivariate regression analysis results showed that the combination of measures that
measure different quality attributes resulted in stable, optimized prediction models. The
constructed multivariate models were better than most univariate models in their abilities
to predict both class maintainability aspects. The reported results indicate that, in practice,
applying the constructed multivariate models is more appealing than applying the
univariate models.
In practice, the constructed prediction models can be automated and integrated using OO
programming editors to estimate class maintainability after the system is developed. That
is, the modules, in our tool, that are responsible for obtaining the values of measures that
are included in the multivariate model can be integrated with a Java editor. For each class
in a newly developed system, the modified editor can obtain the values of the measures
and the corresponding probability that the class will require costly and frequent revisions
by applying the equations of the multivariate models. In this case, developers can revise
the code of the classes with low maintainability. Software engineers can also spend more
time testing classes with low maintainability to reduce the chances of detecting faults in
these classes during the maintenance phase. Finally, software developers are encouraged
to document classes with low maintainability well to reduce the time required during the
maintenance phase to understand the code and perform the required revisions.
This empirical study can be extended by considering other direct maintenance aspects
such as the actual maintenance time and cost. More and larger industrial systems can be
used in a similar empirical study to validate or invalidate the obtained results. In previous
empirical studies, we explored the impact of including or excluding special methods (e.g.,
constructors and access methods) (Al Dallal 2012a) and transitive relationships caused by
method invocations (Al Dallal 2013) in cohesion measurement on the abilities of
cohesion measures to predict faulty classes. In future work, we plan to perform a similar
36
empirical study to investigate the impact of the same factors on the abilities of the
cohesion measures to predict class maintainability.
Acknowledgments
The author would like to acknowledge the support of this work by Kuwait University
Research Grant WI03/11. In addition, the authors would like to thank Anas Abdin for
assisting in collecting the required data.
References
K. Aggarwal, Y. Singh, A. Kaur, and R. Malhotra, Application of artificial neural
network for predicting maintainability using object-oriented metrics, Proceedings of
World Academy of Science, Engineering and Technology, 2006, Vol. 15, pp. 285-289.
K. Aggarwal, Y. Singh, A. Kaur, and R. Malhotra, Investigation effect of design metrics
on fault proneness in object-oriented systems, Journal of Object Technology, 2007, 6(10),
pp. 127-141.
Y. Ahn, J. Suh, S. Kim, and H. Kim, The software maintenance project effort estimation
model based on function points, Journal of Software Maintenance Evolution: Research
and Practice, 2003, Vol. 15, pp. 71-85.
J. Al Dallal, Software similarity-based functional cohesion metric, IET Software, 2009,
3(1), pp. 46-57.
J. Al Dallal, Mathematical validation of object-oriented class cohesion metrics,
International Journal of Computers, 4(2), 2010, pp. 45-52.
J. Al Dallal, Improving the applicability of object-oriented class cohesion metrics,
Information and Software Technology, 2011a, 53(9), pp. 914-928.
J. Al Dallal, Measuring the discriminative power of object-oriented class cohesion
metrics, IEEE Transactions on Software Engineering, 2011b, 37(6), pp. 788-804.
J. Al Dallal, The Incorporating transitive relations in low-level design-based class
cohesion measurement, Software—Practice & Experience, 2013, 43(6), pp. 685-704.
J. Al Dallal, The impact of accounting for special methods in the measurement of object-
oriented class cohesion on refactoring and fault prediction activities, Journal of Systems
and Software, 2012a, 85(5), pp. 1042-1057.
J. Al Dallal, Fault prediction and the discriminative powers of connectivity-based object-
oriented class cohesion metrics, Information and Software Technology, 2012b, 54(4), pp.
396-416.
J. Al Dallal, Constructing models for predicting extract subclass refactoring opportunities
using object-oriented quality metrics, Information and Software Technology, 2012c,
54(10), pp. 1125-1141.
J. Al Dallal and L. Briand, An object-oriented high-level design-based class cohesion
metric, Information and Software Technology, 2010, 52(12), pp. 1346-1361.
J. Al Dallal and L. Briand, A Precise method-method interaction-based cohesion metric
for object-oriented classes, ACM Transactions on Software Engineering and
Methodology (TOSEM), 2012, 21(2), pp. 8:1-8:34.
J. Al Dallal and S. Morasca, Predicting object-oriented class reuse-proneness using
internal quality attributes, Empirical Software Engineering, in press, 2012.
37
E. Arisholm, L. Briand, and E. Johannessen. A systematic and comprehensive
investigation of methods to build and evaluate fault prediction models, Journal of
Systems and Software, 2010, 83(1), pp. 2-17.
L. Badri and M. Badri, A Proposal of a new class cohesion criterion: an empirical study,
Journal of Object Technology, 3(4), 2004, pp. 145-159.
L. Badri, M. Badri, and A. Gueye, Revisiting class cohesion: an empirical investigation
on several systems, Journal of Object Technology, 2008, 7(6), pp. 55-75.
R. Baggen, J. Correia, K. Schill, and J. Visser, Standardized code quality benchmarking
for improving software maintainability, Software Quality Journal, 2012, 20(2), pp. 287-
307.
J. Bansiya, L. Etzkorn, C. Davis, and W. Li, A class cohesion metric for object-oriented
designs, Journal of Object-Oriented Program, 11(8), 1999, pp. 47-52.
V. Barnett and T. Lewis, Outliers in Statistical Data, John Wiley and Sons, 3rd
e, 1994,
pp. 584.
H. Benestad, B. Anda, and E. Arisholm, Assessing software product maintainability
based on class-level structured measures, 7th International Conference on Product-
Focused Software Process Improvement (PROFES), 2006, pp. 94-111.
J. Bieman and B. Kang, Cohesion and reuse in an object-oriented system, Proceedings of
the 1995 Symposium on Software reusability, Seattle, Washington, United States, 1995,
pp. 259-262.
J. Bieman and B. Kang, Measuring design-level cohesion, IEEE Transactions on
Software Engineering, 24(2), 1998, pp. 111-124.
C. Bonja and E. Kidanmariam, Metrics for class cohesion and similarity between
methods, Proceedings of the 44th Annual ACM Southeast Regional Conference,
Melbourne, Florida, 2006, pp. 91-95.
L. C. Briand, C. Bunse, J. W. Daly, and C. Differding, An experimental comparison of
the maintainability of object-oriented and structured design documents, Empirical
Software Engineering, 1997b, Vol 2, pp. 291-312.
L. C. Briand, J. Daly, and J. Wust, A unified framework for cohesion measurement in
object-oriented systems, Empirical Software Engineering - An International Journal,
3(1), 1998, pp. 65-117.
L. C. Briand, J. Daly, and J. Wust, A unified framework for coupling measurement in
object-oriented systems, IEEE Transactions on Software Engineering, 25(1), 1999a, pp.
91-121.
L. Briand , P. Devanbu, and W. Melo, An investigation into coupling measures for C++,
Proceedings of the 19th International Conference on Software Engineering, Boston,
Massachusetts, United States, 1997, p.412-421.
L. Briand, S. Morasca, V. Basili, Property-based software engineering measurement,
IEEE Transactions on Software Engineering, 1996, 22(1), pp. 68–86.
L. C. Briand , S. Morasca , and V. R. Basili, Defining and validating measures for object-
based high-level design, IEEE Transactions on Software Engineering, 25(5), 1999b, pp.
722-743.
L. C. Briand , S. Morasca , and V. R. Basili, Measuring and assessing maintainability at
the end of high level design, IEEE Conference on Software Maintenance, Montreal,
Canada, 1993, pp. 88-97.
38
L. Briand and J. Wust, Empirical studies of quality models in object-oriented systems,
Advances in Computers, Academic Press, 2002, pp. 97-166.
L. C. Briand, J. Wüst, and H. Lounis, Replicated Case Studies for Investigating Quality
Factors in Object-Oriented Designs, Empirical Software Engineering, 6(1), 2001, pp. 11-
58.
CKJM — Chidamber and Kemerer Java Metrics, http://www.spinellis.gr/sw/ckjm/,
accessed in January 2013.
H. S. Chae, T. Y. Kim, W. Jung, and J. Lee, Using metrics for estimating maintainability
of web applications: an empirical study, 6th
IEEE/ACIS International Conference on
Computer and Information Science, 2007.
H. S. Chae, Y. R. Kwon, and D. Bae, Improving cohesion metrics for classes by
considering dependent instance variables, IEEE Transactions on Software Engineering,
30(11), 2004, pp. 826-832.
M. A. Chaumun, H. Kabaili, R. K. Keller, and F. Lustman, A change impact model for
changeability assessment in object-oriented software systems, Science of Computer
Programming, 2002, 45(2), pp. 155-174.
Z. Chen, Y. Zhou, and B. Xu, A novel approach to measuring class cohesion based on
dependence analysis, Proceedings of the International Conference on Software
Maintenance, 2002, pp. 377-384.
S.R. Chidamber and C.F. Kemerer, Towards a Metrics Suite for Object-Oriented Design,
Object-Oriented Programming Systems, Languages and Applications (OOPSLA), Special
Issue of SIGPLAN Notices, 26(10), 1991, pp. 197-211.
S.R. Chidamber and C.F. Kemerer, A Metrics suite for object Oriented Design, IEEE
Transactions on Software Engineering, 20(6), 1994, pp. 476-493.
S. Counsell, S. Swift, and J. Crampton, The interpretation and utility of three cohesion
metrics for object-oriented design, ACM Transactions on Software Engineering and
Methodology (TOSEM), 15(2), 2006, pp.123-149.
M. Dagpinar and J. H. Jahnke, Predicting maintainability with object-oriented metrics –
an empirical comparison, Proceedings of the 10th
Working Conference on Reverse
Engineering, 2003.
M. Elish and K. Elish, Application of treenet in predicting object-oriented software
maintainability: a comparative study, 13th
European Conference on Software
Maintenance and Reengineering (CSMR '09), 2009, pp. 69-78.
K. Erdil, E. Finn, K. Keating, J. Meattle, S. Park, and D. Yoon, Software maintenance as
part of the software life cycle, Comp180: Software Engineering Project, Department of
Computer Science, Tufits University, 2003.
L. Etzkorn, S. Gholston, J. Fortune, C. Stein, D. Utley, P. Farrington, and G. Cox, A
comparison of cohesion metrics for object-oriented systems, Information and Software
Technology, 46(10), 2004, pp. 677-687.
N. Fenton and S. Pfleeger, Software Metrics A Rigorous & Practical Approach, ITP, 2nd
edition, 1997.
L. Fernández, and R. Peña, A sensitive metric of class cohesion, International Journal of
Information Theories and Applications, 13(1), 2006, pp. 82-91.
FreeMind, http://freemind.sourceforge.net/, accessed March 2012.
M. Genero, M. Piattini, and C. Calero, A survey of metrics for UML class diagrams,
Journal of Object Technology, 2005, 4(9), pp. 59-92.
39
J. Granja-Alvarez and M. J. Barranco-Garcia, A method for estimating maintenance cost
in a software project: a case study, Journal of Software Maintenance: Research and
Practice, 9(3), 1997, pp. 161-175.
G. Gui and P. D. Scott, Measuring software component reusability by coupling and
cohesion metrics, Journal of Computers, 4(9), 2009, pp. 797-805.
T. Gyimothy, R. Ferenc, and I. Siket, Empirical validation of object-oriented metrics on
open source software for fault prediction, IEEE Transactions on Software Engineering,
3(10), 2005, pp. 897-910.
J. Hayes, S. C. Patel, and L. Zhao, A metrics-based software maintenance effort model,
In Proceeding of the 8th
European Conference on Software Maintenance and
Reengineering, Tampere, Finland, 2004, pp. 254-258.
D. Hosmer and S. Lemeshow, Applied Logistic Regression, John Wiley and Sons, 2000.
IEEE, IEEE standard glossary of software engineering terminology, IEEE Std 610.12-
1990, Institute of Electrical and Electronics Engineering, 1990.
Illusion, http://sourceforge.net/projects/aoi/, accessed March 2012.
JabRef, http://sourceforge.net/projects/jabref/, accessed March 2012.
H. Kabaili, R. Keller, and F. Lustman, Class cohesion as predictor of changeability: an
empirical study, L'Objet, Hermes Science Publications, 2001, 7(4), pp. 515-534.
D. Krantz, R. Luce, P. Suppes, A. Tversky, Foundations of Measurement, Vol. 1,
Academic Press, San Diego, 1971.
L. Lavazza, S. Morasca, D. Taibi, and D. Tosi, An empirical investigation of perceived
reliability of open source Java programs. accepted for publication in Proceedings of
the27th Symposium On Applied Computing, SAC ’12, 2012.
Y. Lee and K. Chang, Reusability and maintainability metrics for object-oriented
software, Proceedings of the 38th annual on Southeast regional conference, USA, 2000.
Y. Lee, B. Liang, S. Wu, and F. Wang, Measuring the coupling and cohesion of an
object-oriented program based on information flow, In Proceedings of International
Conference on Software Quality, Maribor, Slovenia, 1995, pp. 81-90.
W. Li and S.M. Henry, Object-oriented metrics that predict maintainability, Journal of
Systems and Software, 1993, 23(2), pp. 111-122.
W. Li-jin, H. Xin-xin, N. Zheng-yuan, K. Wen-hua, Predicting object-oriented software
maintainability using projection pursuit regression, 1st International Conference on
Information Science and Engineering (ICISE), 2009, pp. 3827-3830.
S. Mamone, The IEEE standard for software maintenance, SIGSOFT SE Notes, 1994,
19(1), pp. 75-76.
A. Marcus, D. Poshyvanyk, and R. Ferenc, Using the conceptual cohesion of classes for
fault prediction in object-oriented systems, IEEE Transactions on Software Engineering,
34(2), 2008, pp. 287-300.
T. Meyers and D. Binkley, An empirical study of slice-based cohesion and coupling
metrics, ACM Transactions on Software Engineering Methodology, 17(1), 2007, pp. 2-
27.
A. Mockus, R. Fielding, and J. Herbsleb, Two case studies of open source software
development: Apache and Mozilla, ACM Trans. Softw. Eng. Methodol., 2002, 11(3), pp.
309-346.
40
S. Morasca, On the definition and use of aggregate indices for nominal, ordinal, and other
scales, 10th IEEE International Software Metrics Symposium (METRICS 2004), Chicago,
IL, USA, 2004, pp. 46-57.
S. Morasca, Refining the axiomatic definition of internal software attributes, Proceedings
of the 2nd
ACM-IEEE International Symposium on Empirical Software Engineering and
Measurement, Kaiserslautern, Germany, 2008, pp. 188–197.
S. Morasca, A probability-based approach for measuring external attributes of software
artifacts, Proceedings of the 3rd
International Symposium on Empirical Software
Engineering and Measurement, 2009, USA, pp. 44-55.
R. O'Brien, A caution regarding rules of thumb for variance inflation factors, Quality and
Quantity, Vol. 41, No. 5, 2007, pp. 673-690.
H. Olague, L. Etzkorn, S. Gholston, and S. Quattlebaum, Empirical validation of three
software metrics suites to predict fault-proneness of object-oriented classes developed
using highly iterative or agile software development processes, IEEE Transactions on
Software Engineering, 2007, 33(6), pp. 402-419.
D. Olson and D. Delen, Advanced Data Mining Techniques, Springer, 1st edition, 2008.
M. Perepletchikov, C. Ryan, and K. Frampton, Cohesion metrics for predicting
maintainability of service-oriented software, IEEE Seventh International Conference on
Quality Software, 2007.
D. Powers, Evaluation: form precision, recall and F-factor to ROC, School of Informatics
and Engineering, Flinders University, Technical report SIE-07-001, 2007.
QMT: Quality Measuring Tool, http://www.cfw.kuniv.edu/drjehad/research.htm,
accessed May 2013.
S. Rizvi and R. Khan, Maintainability estimation model for object-oriented software in
design phase (MEMOOD), Journal of Computing, 2010, 2(4), pp. 26-32.
F. Roberts, Measurement theory with applications to decisionmaking, utility, and the
social sciences, Encyclopedia of Mathematics and its Applications, Vol. 7, Addison-
Wesley, 1979.
P. Rousseeuw, I. Ruts, and J. Tukey, The bagplot: a bivariate boxplot, The American
Statistician, Vol. 53, No. 4, 1999, pp. 382–387.
R. Shatnawi, W. Li, J. Swain, and T. Newman, Finding software metrics threshold values
using ROC curves, Journal of Software Maintenance and Evolution: Research and
Practice, 2010, Vol. 22, No. 1, pp. 1-16.
I. Samoladas, S. Bibi, I. Stamelos, and G.L. Bleris. Exploring the quality of free/open-
source software: a case study on an ERP/CRM system, 9th
Panhellenic Conference in
Informatics, Thessaloniki, Greece, 2003.
I. Samoladas, G. Gousios, D. Spinellis, and I. Stamelos, The SQO-OSS quality model:
measurement based open-source software evaluation, Open Source Development,
Communities and Quality, 275, 2008, pp. 237-248.
S. Sarkar, G. Rama, A. Kak, API-based information-theoretic metrics for measuring the
quality of software modularization, IEEE Transactions on Software Engineering, 2007,
33(1), pp. 14-32.
F. Sheldon, K. Jerath, and H. Chung, Metrics for maintainability of class inheritance
hierarchies, Journal of Software Maintenance and Evolution: research and Practice,
2002, Vol. 14, pp. 1-14.
41
D. Spinellis, G. Gousios, V. Karakoidas, P. Louridas, P. J. Adams, I. Samoladas, and I.
Stamelos, Evaluating the quality of open source software, Electronic Notes in Theoretical
Computer Science, 233, 2009, pp. 5-28.
R. Subramanyam and M. S. Krishnan, Empirical analysis of CK metrics for object-
oriented design complexity: implications for software defects, IEEE Transactions on
Software Engineering, 2003, 29(4), pp. 297-310.
J. Wang, Y. Zhou, L. Wen, Y. Chen, H. Lu, and B. Xu, DMC: a more precise cohesion
measure for classes. Information and Software Technology, 47(3), 2005, pp. 167-180.
E. Weyuker, Evaluating software complexity measures, IEEE Transactions on Software
Engineering, 1988, 14(9), pp. 1357–1365.
F. Xia and P. Srikanth, A change impact dependency measure for predicting the
maintainability of source code, IEEE Proceedings of the 28th
Annual International
Computer Software and Applications Conference, 2004.
X. Yang, Research on Class Cohesion Measures, M.S. Thesis, Department of Computer
Science and Engineering, Southeast University, 2002.
Y. Zhou and H. Leung, Predicting object-oriented software maintainability using
multivariate adaptive regression splines, Journal of Systems and Software, 2007, 80(8),
pp. 1349-1361.
42
Appendix A: Descriptive statistics using FreeMind and JabRef classes
Table A.1: Descriptive statistics of the 19 considered independent measures for FreeMind
classes
Table A.2: Descriptive statistics of the 19 considered independent measures for JabRef
classes
Quality
attribute
Metric Min Max 25% Med 75% Mean Std. Dev.
Siz
e NOM 1 126 4.00 8.00 10.00 8.68 9.03
NOA 0 48 1.00 2.00 4.00 3.11 4.05
LOC 7 989 39.00 68.00 87.50 82.15 82.64
Co
hes
ion
Coh 0 1 0.13 0.29 0.75 0.42 0.36
CAMC 0 1 0.37 0.46 0.54 0.47 0.19
TCC 0 1 0.00 0.17 0.98 0.39 0.41
LCC 0 1 0.00 0.30 1.00 0.44 0.43
LSCC 0 1 0.01 0.06 0.50 0.29 0.39
SCOM 0 1 0.02 0.10 0.83 0.33 0.41
PCCC 0 1 0.00 0.00 1.00 0.31 0.44
OL2 0 1 0.00 0.00 0.14 0.23 0.41
Co
up
lin
g
CBO 0 90 2.00 4.00 5.00 4.79 7.88
CBO_IUB 0 58 1.00 2.00 2.00 2.55 5.69
CBO_U 0 58 1.00 2.00 3.00 2.25 3.57
RFC 0 185 7.00 11.00 24.00 17.77 18.56
MPC 0 536 10.00 17.00 37.00 32.10 48.57
DAC1 0 42 1.00 2.00 3.00 2.56 3.29
DAC2 0 19 1.00 2.00 3.00 2.05 1.79
OCMEC 0 13 2.00 2.00 4.00 3.01 1.89
Quality
attribute
Metric Min Max 25% Med 75% Mean Std. Dev.
Siz
e
NOM 1 54 2.00 4.00 7.00 5.46 5.61
NOA 0 135 0.00 2.00 6.00 5.02 10.79
LOC 7 686 23.25 50.50 94.75 79.73 93.20
Co
hes
ion
Coh 0 1 0.28 0.65 1.00 0.62 0.36
CAMC 0 1 0.00 0.00 1.00 0.41 0.48
TCC 0 1 0.00 0.49 1.00 0.50 0.44
LCC 0 1 0.00 0.62 1.00 0.54 0.45
LSCC 0 1 0.07 0.49 1.00 0.52 0.43
SCOM 0 1 0.19 0.60 1.00 0.59 0.40
PCCC 0 1 0.00 0.33 1.00 0.51 0.47
OL2 0 1 0.00 0.00 1.00 0.41 0.48
Co
up
lin
g
CBO 0 298 1.00 4.00 7.00 7.91 22.17
CBO_IUB 0 287 0.00 1.00 2.00 4.45 20.28
CBO_U 0 49 1.00 2.00 5.00 3.45 4.62
RFC 0 117 5.00 10.00 24.00 17.05 18.70
MPC 0 583 6.00 16.00 48.00 40.42 62.56
DAC1 0 131 0.00 1.00 4.00 4.26 10.00
DAC2 0 22 0.00 1.00 3.00 2.41 3.33
OCMEC 0 15 1.00 2.00 3.00 2.42 1.88
43
Appendix B: Univariate regression analysis results for FreeMind and JabRef classes
Table B.1: Univariate results for RC using FreeMind classes
Table B.2: Univariate results for RC using JabRef classes
Measure c0 c1 MSE R2 p-value P R IP IR AUC
NOM -1.565 0.083 0.199 0.054 < 0.0001 0.374 0.582 0.760 0.577 0.634
NOA -1.341 0.160 0.199 0.046 < 0.0001 0.441 0.373 0.744 0.794 0.665
LOC -1.606 0.009 0.196 0.064 < 0.0001 0.466 0.436 0.762 0.783 0.654
Coh -0.464 -0.944 0.208 0.018 0.006 0.361 0.709 0.782 0.455 0.567
CAMC 0.491 -2.961 0.202 0.044 < 0.0001 0.376 0.618 0.769 0.553 0.646
TCC -1.001 0.415 0.212 0.005 0.129 0.373 0.509 0.746 0.628 0.579
LCC -1.168 0.725 0.209 0.017 0.007 0.384 0.555 0.760 0.613 0.602
LSCC -0.562 -1.063 0.206 0.025 0.002 0.345 0.782 0.789 0.356 0.550
SCOM -0.606 -0.738 0.209 0.014 0.015 0.346 0.755 0.780 0.379 0.537
PCCC -0.604 -0.828 0.207 0.020 0.004 0.343 0.764 0.780 0.364 0.569
OL2 -0.648 -0.930 0.207 0.021 0.004 0.337 0.836 0.800 0.285 0.586
CBO -1.158 0.068 0.203 0.031 0.004 0.394 0.391 0.736 0.739 0.610
CBO_IUB -1.006 0.066 0.205 0.022 0.007 0.604 0.291 0.748 0.917 0.585
CBO_U -1.211 0.170 0.206 0.026 0.005 0.373 0.373 0.727 0.727 0.594
RFC -1.837 0.054 0.183 0.116 < 0.0001 0.522 0.536 0.796 0.787 0.720
MPC -1.545 0.022 0.189 0.097 < 0.0001 0.523 0.518 0.791 0.794 0.716
DAC -1.356 0.200 0.200 0.046 <0.001 0.458 0.491 0.771 0.747 0.667
DAC2 -1.755 0.426 0.192 0.077 < 0.0001 0.472 0.464 0.769 0.775 0.672
OCMEC -2.097 0.397 0.191 0.090 < 0.0001 0.536 0.473 0.782 0.822 0.698
Measure c0 c1 MSE R2 p-value P R IP IR AUC
NOM -0.955 0.081 0.225 0.031 0.001 0.540 0.470 0.704 0.759 0.666
NOA -1.069 0.126 0.204 0.097 < 0.0001 0.678 0.530 0.750 0.848 0.743
LOC -1.223 0.009 0.207 0.089 < 0.0001 0.674 0.539 0.752 0.843 0.746
Coh 0.532 -1.753 0.217 0.065 < 0.0001 0.538 0.670 0.767 0.654 0.671
CAMC 0.286 -1.475 0.229 0.028 0.001 0.503 0.670 0.752 0.602 0.640
TCC -0.082 -0.885 0.228 0.026 0.001 0.450 0.591 0.697 0.565 0.600
LCC -0.142 -0.697 0.231 0.017 0.009 0.437 0.539 0.677 0.581 0.578
LSCC 0.373 -1.838 0.206 0.099 < 0.0001 0.533 0.704 0.779 0.628 0.702
SCOM 0.487 -1.778 0.210 0.085 < 0.0001 0.546 0.670 0.770 0.665 0.692
PCCC 0.311 -1.789 0.203 0.112 < 0.0001 0.538 0.730 0.793 0.623 0.692
OL2 0.160 -1.905 0.201 0.120 < 0.0001 0.532 0.800 0.827 0.576 0.681
CBO -1.494 0.178 0.184 0.158 < 0.0001 0.660 0.591 0.768 0.817 0.805
CBO_IUB -0.741 0.085 0.221 0.052 0.005 0.577 0.357 0.685 0.843 0.657
CBO_U -1.680 0.354 0.180 0.193 < 0.0001 0.648 0.609 0.773 0.801 0.800
RFC -1.485 0.057 0.194 0.142 < 0.0001 0.624 0.548 0.746 0.801 0.765
MPC -1.084 0.015 0.204 0.091 < 0.0001 0.677 0.565 0.762 0.838 0.766
DAC -1.050 0.146 0.203 0.098 < 0.0001 0.659 0.522 0.744 0.838 0.744
DAC2 -1.114 0.251 0.205 0.096 < 0.0001 0.606 0.522 0.734 0.796 0.731
OCMEC -1.826 0.528 0.198 0.130 < 0.0001 0.571 0.626 0.761 0.717 0.734
44
Table B.3: Univariate results for FRC using FreeMind classes
Table B.4: Univariate results for FRC using JabRef classes
Measure c0 c1 MSE R2 p-value P R IP IR AUC
NOM -3.292 0.113 0.085 0.135 < 0.0001 0.277 0.462 0.930 0.855 0.747
NOA -2.486 0.101 0.095 0.038 0.006 0.215 0.513 0.930 0.775 0.739
LOC -3.514 0.014 0.081 0.180 < 0.0001 0.296 0.538 0.938 0.846 0.806
Coh -1.347 -2.497 0.094 0.070 0.001 0.143 0.692 0.931 0.500 0.651
CAMC -0.306 -4.364 0.092 0.073 < 0.0001 0.159 0.641 0.932 0.593 0.702
TCC -2.096 -0.054 0.096 0.000 0.896 NA 0.000 0.893 1.000 0.482
LCC -2.400 0.590 0.096 0.009 0.133 0.128 0.487 0.907 0.602 0.578
LSCC -1.607 -3.425 0.093 0.084 0.003 0.157 0.872 0.966 0.438 0.624
SCOM -1.639 -2.209 0.094 0.061 0.002 0.140 0.795 0.944 0.414 0.604
PCCC -1.627 -4.090 0.092 0.110 0.006 0.157 0.923 0.978 0.404 0.659
OL2 -1.826 -2.814 0.094 0.064 0.011 0.133 0.923 0.967 0.275 0.620
CBO -2.416 0.051 0.094 0.042 0.004 0.400 0.410 0.929 0.926 0.737
CBO_IUB -2.368 0.074 0.093 0.047 0.001 0.410 0.410 0.929 0.929 0.680
CBO_U -2.266 0.059 0.097 0.012 0.105 0.155 0.436 0.913 0.713 0.645
RFC -3.488 0.059 0.079 0.190 < 0.0001 0.261 0.615 0.945 0.790 0.802
MPC -3.099 0.023 0.082 0.175 < 0.0001 0.319 0.590 0.945 0.849 0.815
DAC -2.580 0.153 0.095 0.050 0.003 0.271 0.487 0.932 0.843 0.742
DAC2 -3.181 0.427 0.091 0.099 < 0.0001 0.222 0.615 0.941 0.741 0.748
OCMEC -3.603 0.413 0.084 0.113 < 0.0001 0.206 0.513 0.929 0.762 0.685
Measure c0 c1 MSE R2 p-value P R IP IR AUC
NOM -2.964 0.099 0.081 0.066 0.001 0.198 0.607 0.950 0.752 0.715
NOA -2.606 0.047 0.082 0.056 0.007 0.186 0.464 0.936 0.795 0.739
LOC -2.954 0.006 0.079 0.082 < 0.0001 0.194 0.500 0.940 0.791 0.758
Coh -1.251 -2.030 0.080 0.066 0.001 0.154 0.750 0.959 0.586 0.706
CAMC -1.489 -1.608 0.082 0.026 0.030 0.131 0.714 0.948 0.522 0.666
TCC -1.943 -0.801 0.083 0.016 0.091 0.126 0.679 0.942 0.525 0.581
LCC -1.977 -0.653 0.083 0.012 0.145 0.118 0.607 0.932 0.543 0.578
LSCC -1.578 -1.824 0.080 0.067 0.001 0.147 0.786 0.962 0.540 0.700
SCOM -1.491 -1.663 0.081 0.057 0.002 0.154 0.750 0.959 0.586 0.688
PCCC -1.801 -1.234 0.082 0.040 0.010 0.128 0.714 0.947 0.511 0.652
OL2 -1.883 -1.395 0.081 0.045 0.009 0.129 0.786 0.956 0.468 0.627
CBO -3.648 0.139 0.062 0.321 < 0.0001 0.400 0.786 0.976 0.881 0.883
CBO_IUB -2.911 0.111 0.067 0.226 < 0.0001 0.433 0.464 0.946 0.939 0.830
CBO_U -3.550 0.251 0.068 0.207 < 0.0001 0.241 0.714 0.964 0.773 0.812
RFC -3.254 0.041 0.077 0.123 < 0.0001 0.207 0.607 0.951 0.766 0.760
MPC -2.742 0.008 0.081 0.063 0.001 0.268 0.679 0.962 0.813 0.753
DAC -2.575 0.050 0.082 0.052 0.014 0.210 0.464 0.939 0.824 0.732
DAC2 -2.852 0.170 0.081 0.070 0.000 0.206 0.464 0.938 0.820 0.728
OCMEC -3.791 0.484 0.073 0.138 < 0.0001 0.224 0.536 0.946 0.813 0.741
45
Table B.5: Univariate results for CRC using FreeMind classes
Table B.6: Univariate results for CRC using JabRef classes
Measure c0 c1 MSE R2 p-value P R IP IR AUC
NOM -3.635 0.109 0.065 0.147 < 0.0001 0.231 0.517 0.953 0.850 0.780
NOA -2.637 0.054 0.074 0.012 0.087 0.151 0.483 0.944 0.763 0.750
LOC -3.918 0.014 0.062 0.203 < 0.0001 0.262 0.586 0.960 0.856 0.854
Coh -1.761 -2.166 0.072 0.052 0.005 0.119 0.793 0.964 0.488 0.640
CAMC -0.551 -4.639 0.072 0.079 <0.001 0.127 0.690 0.956 0.590 0.723
TCC -2.405 -0.100 0.074 0.000 0.833 0.075 0.552 0.913 0.410 0.502
LCC -2.781 0.688 0.073 0.012 0.126 0.117 0.621 0.947 0.593 0.599
LSCC -2.023 -2.448 0.072 0.055 0.013 0.109 0.897 0.976 0.362 0.613
SCOM -2.010 -1.944 0.072 0.047 0.011 0.108 0.828 0.964 0.404 0.598
PCCC -1.886 -9.451 0.070 0.128 0.109 0.130 0.966 0.993 0.440 0.700
OL2 -2.131 -5.115 0.072 0.078 0.143 0.104 0.966 0.989 0.275 0.701
CBO -2.706 0.043 0.078 0.036 0.007 0.375 0.517 0.957 0.925 0.777
CBO_IUB -2.693 0.069 0.076 0.047 0.002 0.359 0.483 0.954 0.925 0.744
CBO_U -2.546 0.041 0.078 0.006 0.223 0.109 0.414 0.933 0.707 0.646
RFC -3.810 0.056 0.063 0.195 < 0.0001 0.238 0.690 0.968 0.808 0.834
MPC -3.426 0.022 0.064 0.183 < 0.0001 0.284 0.655 0.966 0.856 0.841
DAC -2.658 0.073 0.074 0.015 0.065 0.161 0.655 0.959 0.704 0.742
DAC2 -3.090 0.267 0.072 0.048 0.003 0.167 0.621 0.957 0.731 0.744
OCMEC -3.933 0.405 0.066 0.111 < 0.0001 0.155 0.517 0.947 0.754 0.706
Measure c0 c1 MSE R2 p-value P R IP IR AUC
NOM -2.950 0.092 0.079 0.058 0.002 0.186 0.593 0.950 0.749 0.726
NOA -2.457 0.020 0.083 0.012 0.108 0.214 0.444 0.940 0.842 0.713
LOC -2.950 0.006 0.078 0.073 <0.001 0.227 0.556 0.950 0.817 0.758
Coh -1.394 -1.793 0.079 0.052 0.003 0.141 0.704 0.953 0.584 0.683
CAMC -1.311 -2.107 0.080 0.043 0.006 0.160 0.778 0.966 0.606 0.703
TCC -1.987 -0.790 0.080 0.015 0.101 0.114 0.630 0.936 0.527 0.577
LCC -2.030 -0.623 0.080 0.010 0.171 0.101 0.519 0.923 0.556 0.572
LSCC -1.613 -1.850 0.078 0.068 0.002 0.140 0.778 0.962 0.538 0.687
SCOM -1.585 -1.528 0.079 0.048 0.005 0.147 0.741 0.959 0.584 0.670
PCCC -1.670 -1.921 0.077 0.083 0.001 0.143 0.815 0.967 0.527 0.694
OL2 -1.835 -1.954 0.078 0.074 0.003 0.129 0.815 0.963 0.470 0.624
CBO -3.067 0.072 0.064 0.190 <0.001 0.383 0.667 0.965 0.896 0.885
CBO_IUB -2.669 0.050 0.070 0.126 0.001 0.429 0.333 0.937 0.957 0.779
CBO_U -3.686 0.265 0.066 0.227 < 0.0001 0.279 0.630 0.959 0.842 0.830
RFC -3.152 0.036 0.077 0.095 < 0.0001 0.203 0.593 0.952 0.774 0.745
MPC -2.735 0.007 0.080 0.053 0.002 0.250 0.593 0.955 0.828 0.750
DAC -2.436 0.019 0.083 0.010 0.141 0.189 0.370 0.933 0.846 0.721
DAC2 -2.878 0.166 0.077 0.067 <0.001 0.190 0.444 0.938 0.817 0.714
OCMEC -3.629 0.426 0.076 0.112 < 0.0001 0.224 0.556 0.950 0.814 0.763
46
Appendix C: Multivariate regression analysis results for FreeMind and JabRef classes
Table C.1: RC-based forward multivariate regression model for FreeMind classes.
Metric Coefficient p-value VIF MSE R2 AUC
Intercept -1.237 0.001 -
0.171 0.245 0.824
RFC 0.039 0.000 1.876
OCMEC 0.326 0.003 1.978
OL2 -1.163 0.002 1.536
NOM -0.109 0.008 1.667
CBO_IUB 0.085 0.005 1.023
Table C.2: RC-based forward multivariate regression model for JabRef classes.
Metric Coefficient p-value VIF MSE R2 AUC
Intercept -3.035 < 0.0001 -
0.074 0.253 0.807 RFC 0.064 < 0.0001 1.009
SCOM -2.063 0.003 1.009
Table C.3: FRC-based forward multivariate regression model for FreeMind classes.
Metric Coefficient p-value VIF MSE R2 AUC
Intercept -4.419 < 0.0001 -
0.058 0.383 0.913 CBO 0.123 < 0.0001 1.024
RFC 0.034 0.001 1.024
Table C.4: FRC-based forward multivariate regression model for JabRef classes.
Metric Coefficient p-value VIF MSE R2 AUC
Intercept -3.375 < 0.0001 -
0.059 0.237 0.848 RFC 0.058 < 0.0001 1.009
SCOM -2.251 0.017 1.009
Table C.5: CRC-based forward multivariate regression model for FreeMind classes.
Metric Coefficient p-value VIF MSE R2 AUC
Intercept -2.392 < 0.0001 -
0.173 0.151 0.742 RFC 0.046 < 0.0001 1.125
DAC2 0.330 0.000 1.125
47
Metric Coefficient p-value VIF MSE R2 AUC
Intercept -3.448 < 0.0001 -
0.063 0.310 0.865 CBO 0.063 0.014 1.328
CBO_U 0.201 <0.0001 1.400
OL2 -1.530 0.043 1.081
Table C.6: CRC-based forward multivariate regression model for JabRef classes.
Jehad Al Dallal received his PhD in Computer Science from the University of Alberta in
Canada and was granted the award for best PhD researcher. He is currently working at
Kuwait University in the Department of Information Science as an Associate Professor
and as a department chairman. Dr. Al Dallal has completed several research projects in
the areas of software testing, software metrics, and communication protocols. In addition,
he has published more than 70 papers in reputable journals and conference proceedings.
Dr. Al Dallal was involved in developing more than 20 software systems. He also served
as a technical committee member of several international conferences and as an associate
editor for several refereed journals.