[IEEE 2011 Second International Conference on Emerging Applications of Information Technology (EAIT)...

4
Towards a Valid Metric for Class Cohesion at Design Level Kuljit Kaur, Hardeep Singh Deptt. of Computer Science and Engineering, Guru Nanak Dev University, Amritsar, Punjab, India. [email protected] , [email protected] AbstractIn object oriented paradigm, cohesion of a class refers to the degree to which members of the class are interrelated. Design level class cohesion metrics are based on the assumption that if all the methods of a class have access to similar parameter types then they all process closely related information. A class with a large number of parameter types common in its methods is more cohesive than a class with less number of parameter types common in its methods. In this paper, we review the design level class cohesion metrics with a special focus on metrics which use similarity of parameter types of methods of a class as the basis of its cohesiveness. Keeping in mind the anomalies in the definitions of the existing metrics, a variant of the NHD metric is introduced. It is named NHD Modified (NHDM). The metric is analyzed with the mathematical properties of cohesion metrics as proposed in research literature. An automated metric collection tool is used to collect the metric data from an open source software program. Statistical analysis of the data shows that NHDM metric takes the lowest average value in this group of four metrics. It may be due to the fact that NHDM, unlike other metrics, does not give any false positives/negatives. Keywords- Object Oriented Design Metrics, Cohesion Metrics, Cohesion among Methods of a Class, Normalized Hamming Distance, Scaled NHD, NHD Modified. I. INTRODUCTION Cohesion is defined as the interrelatedness of members of a module. In an object oriented design, cohesion of a class refers to the degree to which members of the class are interrelated. Empirical studies indicate that class cohesion is related to quality characteristics of a software system [14- 19]. Several metrics have been proposed to measure cohesion of a class at code level [4-13]. However, metrics proposals for class cohesion at design level are only a few. The first such metric, named Cohesion among Methods of a Class (CAMC) was proposed in 1999 by Jagdish Bansiya [1]. As per the definition of this metric, a class is cohesive if all methods of the class use the same set of parameter types. It is a normalized metric in which methods with same type of parameter types are assumed to process related kind of information. Counsell et al. point out some anomalies in the definition of the metric [2]. They propose a new metric named Normalized Hamming Distance (NHD) and claim that it is free from the anomalies. A variant of the NHD metric called Scaled NHD (SNHD) is introduced in the same paper. It addresses shortcomings of both CAMC and NHD, as claimed by the authors. Another metric SCC for measuring class cohesion at design level is proposed recently by Dallal et. al. [3]. Unlike other design level metrics (CAMC, NHD, and SNHD), this metric takes into account relatedness not only among class methods but also among class methods and class attributes. Design level class cohesion metrics use the limited amount of information available about a class at this level i.e. only the class attributes, and method signatures. Method implementation is not completely defined at design level. So some assumptions are made. The CAMC, NHD, and SNHD class cohesion metrics are based on the assumption that the types of method parameters match the types of the attributes accessed by the method [1, 2]. Whereas the SCC metric assumes that the set of attribute types accessed by a method is the intersection of the set of this method’s parameter types and the set of its class attribute types [3]. This paper investigates the design level class cohesion metrics based on the first assumption. It analyses the definitions of the existing metrics and proposes an improved version of the metrics. The new metric is named as NHD modified (NHDM) as it is largely based on the definition of the NHD metric. The paper is organized as follows: Section 2 explains the existing design level class cohesion metrics. Section 3 introduces a modified version of the existing metrics. Section 4 presents the theoretical and statistical analysis of the data collected from an open source project. Section 5 concludes the paper. II. EXISTING DESIGN LEVEL COHESION METRICS This section describes the existing class cohesion metrics computable with the information available at design level. At the design level, information regarding name of the class, its attributes/ variable, and signatures of its methods is available. Method signature includes name of the method and its parameter list which describes names of the parameters and their types. A. CAMC This metric computes the relatedness among methods of a class based upon the parameter lists of the methods. The CAMC metric measures the extent of intersection of individual method parameter type lists with the parameter type list of all methods in the class [1]. The CAMC metric uses a parameter-occurrence matrix (PO matrix) that has a row for each method and a column for each data type that appears at least once as the type of a parameter in at least one method in the class. The value in row i and column j in the matrix is 1 when the ith method 2011 Second International Conference on Emerging Applications of Information Technology 978-0-7695-4329-1/11 $26.00 © 2011 IEEE DOI 10.1109/EAIT.2011.76 351

Transcript of [IEEE 2011 Second International Conference on Emerging Applications of Information Technology (EAIT)...

Towards a Valid Metric for Class Cohesion at Design Level

Kuljit Kaur, Hardeep Singh Deptt. of Computer Science and Engineering, Guru Nanak Dev University, Amritsar, Punjab, India.

[email protected], [email protected]

Abstract— In object oriented paradigm, cohesion of a class refers to the degree to which members of the class are interrelated. Design level class cohesion metrics are based on the assumption that if all the methods of a class have access to similar parameter types then they all process closely related information. A class with a large number of parameter types common in its methods is more cohesive than a class with less number of parameter types common in its methods. In this paper, we review the design level class cohesion metrics with a special focus on metrics which use similarity of parameter types of methods of a class as the basis of its cohesiveness. Keeping in mind the anomalies in the definitions of the existing metrics, a variant of the NHD metric is introduced. It is named NHD Modified (NHDM). The metric is analyzed with the mathematical properties of cohesion metrics as proposed in research literature. An automated metric collection tool is used to collect the metric data from an open source software program. Statistical analysis of the data shows that NHDM metric takes the lowest average value in this group of four metrics. It may be due to the fact that NHDM, unlike other metrics, does not give any false positives/negatives. Keywords- Object Oriented Design Metrics, Cohesion Metrics, Cohesion among Methods of a Class, Normalized Hamming Distance, Scaled NHD, NHD Modified.

I. INTRODUCTION Cohesion is defined as the interrelatedness of members

of a module. In an object oriented design, cohesion of a class refers to the degree to which members of the class are interrelated. Empirical studies indicate that class cohesion is related to quality characteristics of a software system [14-19]. Several metrics have been proposed to measure cohesion of a class at code level [4-13]. However, metrics proposals for class cohesion at design level are only a few. The first such metric, named Cohesion among Methods of a Class (CAMC) was proposed in 1999 by Jagdish Bansiya [1]. As per the definition of this metric, a class is cohesive if all methods of the class use the same set of parameter types. It is a normalized metric in which methods with same type of parameter types are assumed to process related kind of information. Counsell et al. point out some anomalies in the definition of the metric [2]. They propose a new metric named Normalized Hamming Distance (NHD) and claim that it is free from the anomalies. A variant of the NHD metric called Scaled NHD (SNHD) is introduced in the same paper. It addresses shortcomings of both CAMC and NHD, as claimed by the authors. Another metric SCC for

measuring class cohesion at design level is proposed recently by Dallal et. al. [3]. Unlike other design level metrics (CAMC, NHD, and SNHD), this metric takes into account relatedness not only among class methods but also among class methods and class attributes.

Design level class cohesion metrics use the limited amount of information available about a class at this level i.e. only the class attributes, and method signatures. Method implementation is not completely defined at design level. So some assumptions are made. The CAMC, NHD, and SNHD class cohesion metrics are based on the assumption that the types of method parameters match the types of the attributes accessed by the method [1, 2]. Whereas the SCC metric assumes that the set of attribute types accessed by a method is the intersection of the set of this method’s parameter types and the set of its class attribute types [3].

This paper investigates the design level class cohesion metrics based on the first assumption. It analyses the definitions of the existing metrics and proposes an improved version of the metrics. The new metric is named as NHD modified (NHDM) as it is largely based on the definition of the NHD metric. The paper is organized as follows: Section 2 explains the existing design level class cohesion metrics. Section 3 introduces a modified version of the existing metrics. Section 4 presents the theoretical and statistical analysis of the data collected from an open source project. Section 5 concludes the paper.

II. EXISTING DESIGN LEVEL COHESION METRICS This section describes the existing class cohesion metrics

computable with the information available at design level. At the design level, information regarding name of the class, its attributes/ variable, and signatures of its methods is available. Method signature includes name of the method and its parameter list which describes names of the parameters and their types.

A. CAMC This metric computes the relatedness among methods of a class based upon the parameter lists of the methods. The CAMC metric measures the extent of intersection of individual method parameter type lists with the parameter type list of all methods in the class [1]. The CAMC metric uses a parameter-occurrence matrix (PO matrix) that has a row for each method and a column for each data type that appears at least once as the type of a parameter in at least one method in the class. The value in row i and column j in the matrix is 1 when the ith method

2011 Second International Conference on Emerging Applications of Information Technology

978-0-7695-4329-1/11 $26.00 © 2011 IEEE

DOI 10.1109/EAIT.2011.76

351

TABLE I: PO MATRIX FOR A CLASS

has a parameter of the jth data type and is 0 otherwise. For a class with two methods M1(int, float, char) and M2(int, char), the PO matrix is as given in table I. In the original version of the metric, the PO matrix has one additional column of all 1s. This column corresponds to the type of the class itself which is by default one of the parameters of every method, the ‘self’ parameter. In this discussion, the original version of the metric is referred to as CAMCs (Cohesion among methods of a class with ‘self’ parameter) and metric definition without the ‘self’ parameter is named as CAMC [3]. The CAMC metric is defined as the ratio of the total number of 1s in the PO matrix to the total size of the matrix. CAMC I= σ , where ∑ ∑ Anomalies in the metric definition (see Table II) – 1. CAMC gives false positives – the metric takes non-zero

value for a class with no parameter sharing in its methods.

2. CAMC can not differentiate between two classes having same number of 1s but with different patterns of 1s in their PO matrices.

3. Smaller classes take high values for the cohesion metric than the larger classes with same properties.

B. NHD Counsell et al [3] suggested an alternative of CAMC. It is based on the definition of hamming distance. NHD measures agreement between rows in the PO matrix. NHD metric for a class with k methods and l unique parameter types (union of parameter types received by its methods) is defined as: NHD= ∑ ∑ , Where a(i,j) is value of the cell at (i,j)th location in the PO matrix. Another easy way to compute NHD is to first find the sum of disagreements between methods for all the parameter types and then subtract it from 1. 1 ∑ Where cj is the number of 1s in the jth column of the PO matrix. Similarly NHDs can be defined for a PO matrix with one column of all 1s. Anomalies in the definition (see Table III): 1. NHD metric also gives false positives. The metric

removes the first anomaly of the CAMC for a class with k=l=2. But the metrics fails to give correct answer if k=l=3 and all the operations do not agree on any of the parameter types. As shown in row 1 of table 2, NHD takes value 0.3 whereas it should take value 0 if none of the methods agree on any parameter type.

2. NHD does not give different answers for classes with different properties – metric fails to distinguish a class with no parameter sharing in its methods from a class with substantial amount of parameter sharing in its methods.

TABLE II: ANOMALIES IN CAMC METRIC

3. Class size influences metric value. As size of the class increases, value of the NHD metric also increases (even if the PO matrix gets sparser).

C. SNHD SNHD is the Scaled NHD metric proposed to interpret values of the NHD metric in a more varied range. Proponents of the NHD metric are of the opinion that NHD metric can take values at two extremes: the minimum or the maximum. But they admit that it is not clear as to which of these extremes represents a cohesive class. However without giving any clear explanation they state that classes at both extremes may be cohesive. They define these extreme values as NHDmin and NHDmax respectively [3]. SNHD metric value helps to know how close the NHD metric is to the maximum value of the NHD value in comparison to the minimum value. SNHD is defined as follows: 0 ,1 ,2 1 ,

The SNHD metric values lies in the range [-1,1]. SNHD= -1 implies that NHD = NHDmin, and SNHD = 1 implies that NHD=NHDmax. NHD is closer to its minimum or maximum value depends upon whether SNHD is getting values close to -1 or +1 respectively. A class is considered non-cohesive

int float char Op1 1 1 1 Op2 1 0 1

S.No. Example Matrix Metric Value

1. 1 0

0 1

0.5

2. 1 0 0 0

1 1 1 1

1 1 1 1

0 0 1 0

0 0 0 1

0 0 1 1

1 1 0 0

0 1 1 0

1 0 0 1

0 1 1 1

(a) (b)

(a)=0.2 (b)=0.2

3. 1 0

0 1

0 0 1

0 1 0

1 0 0

(a) (b)

(a)=0.5 (b)=0.3

352

if SNHD metric for the class is 0. Similarly SNHDs is defined by considering the ‘self’ parameter. Anomalies: 1. Difficult to calculate and interpret. 2. False negatives – SNHD metric gives 0 value for a class, in which 6 out of 9 operations are sharing one parameter (the only parameter passed). This class may be less cohesive, but it is not non-cohesive at all.

TABLE III: ANOMALIES IN THE NHD METRIC S.No Example Matrix Metric

Value 1.

1 0 0

0 1 0

0 0 1

0.3

2.

(a) (b)

1 1 1 1

1 0 0 0

1 1 1 1

0 1 1 1

0 0 1 0

1 0 0 0

0 0 0 1

0 1 0 0

(a)=0.5 (b)=0.5

3.

(a) (b)

1 0 1

1 1 0

0 1 1

1 0 0

1 0 1

1 1 0

0 1 1

1 0 0

1 0 0

(a)=0.4 (b)=0.6

III. AN IMPROVED METRIC - NHDM Keeping in view the anomalies of the cohesion metrics discussed above, this research proposes a variation of this metric. This variation of the metric is named as Normalized Hamming Distance Modified (NHDM) metric. The NHD metric ignored the method pairs with zero values in a column of the PO matrix. It counts only those methods pairs which do not agree, and ignores all other method pairs irrespective of whether they agree on a 0 or a 1. NHDM counts the method pairs which agree on a 0, as a disagreement. NHDM for a class with k methods and l unique parameter types, of its methods, is defined as: 1 ∑ 1 Where cj is the number of ones and zj is the number of zeroes in the jth column of the PO matrix for the class. Similarly NHDMs is defined by including the ‘self’ parameter in the PO matrix. This metric removes the anomalies present in the defintion of CAMC, NHD, and SNHD metrics. NHDM gives correct

results. It gives zero value for PO matrix of any order (unlike NHD which removes the anomaly in CAMC metric but gives correct answer for a matrix of order 2 only and fails for matrices of higher order). NHDM gives different results for classes with different properties. A class expected to be more cohesive takes larger value for the NHDM metric. NHDM metric values are independent of the class size. Class size influences values of both the CAMC and NHD metrics. CAMC gives larger values for smaller classes, and NHD gives larger values for larger classes. NHDM is independent of the size effect in the sense that if class size increases and class remains more cohesive as well then NHDM metric will be comparatively high. However if an increase in size leads to a poorly cohesive class (results in a sparser PO matrix) then value of the NHDM metric will be comparatively less.

IV. THEORETICAL AND EMPIRICAL ANALYSIS Cohesion metrics discussed above are collected from an open source software system available at www.sourceforge.net. The software is a charting library, and it has evolved in more than 40 versions since its inception in year 2000. It is developed using the JAVA platform, it consists of 884 classes. For automated collection of metrics, a tool CohMetric is developed using the C programming language.

A. Theoretical Analysis This metric is evaluated theoretically with four mathematical properties given by Briand et al. [20]. 1. Non-Negativity and Normalization: The NHDM metric satisfies this property as it takes values in the range of [0,1]. 2. Null Value and Maximum Value: NHDM satisfies this property as it takes 0 value for a class with non-cohesive interactions in its methods and 1 for a class with cohesive interactions. Consider the definition of the NHDM metric in equation 4, suppose for all values of l, the method pairs disagree i.e. in the PO matrix, for all columns, there is only one cell with a value 1 in a column and all other cells of the column contain 0s. In this case first part of the subtrahend in equation 4, cj(k-cj), counts the disagreements in cells with 0 valuess and the cell with 1 value, and second part of the equation 4, zj(zj-1), counts the disagreements in all 0s only. This will count all the disagreements in method pairs . All the method pairs disagree so subtrahend is 1 and NHD metric takes 0 value in this case. On the other hand, if all cells of all the columns contain 1s only then first as well second part of the subtrahend in equation 4 are both 0. The NHDM metric takes value 1 in this case. 3. Monotonicity: In order to show that a cohesive interaction is added to a class, a 0 value in a cell of the PO matrix has to be changed to 1. This increases the number of agreements and decreases the number of disagreements in method pairs. So value of the subtrahend in equation 4 decreases and the metric value improves.

353

4. Cohesive Modules: NHDM satisfies this property. Parameter types of methods of two unrelated classes belong to two different sets. The PO matrix of a class obtained after merging two such classes will now have more disagreements in every column as none of the methods in the two classes share any parameter types. This will increase value of the subtrahend and hence metric value for this merged class will decrease.

B. Descriptive Statistics Table IV shows that NHD metric takes the highest value as far as the metrics averages are concerned. The SNHD metric takes the minimum value in this group. Its values lie more on the left side of 0 which implies that majority of the classes has NHD more close to NHDmin than NHDmax. But as suggested by the proponents of these two metrics, SNHD represents a good level of cohesion at both ends of the interval [-1,1]. A zero value for the SNHD indicates a non-cohesive class. This means that NHDM takes on average a very low level of cohesion in this group of metrics. CAMC is also close to NHDM. But this difference can be due to the fact that all other metrics give false positives/negatives (as discussed in Section 3). Cohesion metrics which consider the ‘self’ parameter are expected to give higher values as the class methods agree at least on one parameter type. All the metrics in this category have higher averages than their counterparts as shown in table 1.

TABLE IV: DESCRIPTIVE STATISTICS FOR COHESION METRICS

V. CONCLUSIONS AND FUTURE WORK In this paper existing design level class cohesion metrics

such as CAMC, NHD, and SNHD have been investigated theoretically as well as empirically. In view of the anomalies present in the existing metrics’ definitions, a modified version of the existing metric NHD is proposed. It removes anomalies of the existing metrics and also satisfies all the mathematical properties of a cohesion metric. Statistical analysis of the metrics data shows that average values of NHDM are very small in comparison to CAMC, NHD and SNHD. It may be attributed to the fact that classes are non-cohesive in nature and other metrics suffer from the anomaly of false positives/negatives.

REFERENCES [1] Bansiya, J., Etzkorn, L., Davis, C., and Li, W., “A class

cohesion metric for object oriented designs”. Journal of

Object Oriented Programming, Vol. 11, No. 8, pp 47-52, 1999.

[2] Counsell, S., Swift, S., and Crampton, J., “The Interpretation and Utility of Three Cohesion Metrics for Object-Oriented Design,” ACM Trans. Software Eng. and Methodology, Vol. 15, no. 2, pp. 123- 149, 2006.

[3] Dallal, J., “A Design-Based Cohesion Metric for Object-Oriented Classes”, Proceedings of the International Conference on Computer and Information Science andEngineering (CISE 2007), pp 301-306, 2007.

[4] Chidamber, S., Kemerer, C., “A Metrics Suite for Object Oriented Design”, IEEE Transactions on Software Engineering, Vol. 20, pp. 476-493, 1994.

[5] Chae, H., Kwon, Y., Bae, D., “A Cohesion Measure for Object-Oriented Classes”, Softw. Pract. Exper., Vol 30, Issue 12, pp. 1405–1431, 2000.

[6] Chen, Z., Zhou, Y., Xu, B., “A Novel Approach to Measuring Class Cohesion based on Dependence Analysis”, Proc. of the International Conference on Software Maintenance, pp. 377-384, 2002.

[7] Badri, L., Badri, M., “A Proposal of a New Class Cohesion Criterion: An Empirical Study”, Journal of Object Technology, Vol. 3, Issue 4, 2004.

[8] Wang, J., Zhou, Y., Wen, L. Chen, Y., Lu, H., Xu, B., “DMC: A more Precise Cohesion Measure for Classes”, Information and Software Technology, Vol. 47, No. 3, pp. 176-180, 2005.

[9] Bonja, C., Kidanmariam, E., “Metrics for class cohesion and similarity between Methods”, In Proceedings of the 44th annual Southeast regional conference, pages 91–95, New York, NY, USA, 2006. ACM Press.

[10] Cox, G., Etzkorn, L., Hughes, W., “Cohesion Metric for Object-Oriented Systems Based on Semantic Closeness from Disambiguity”, Applied Artificial Intelligence, Vol 20, Issue 5, pp. 419–436, 2006.

[11] Fernández, L., Peña R., “A sensitive metric of class cohesion”, International Journal of Information Theories and Applications, Vol. 13, No. 1, pp. 82-91, 2006.

[12] Makela, S., Leppanen V., “Client Based Object Oriented Cohesion Metrics”, 31st Annual International Computer Software and Applications Conference(COMPSAC 2007), Vol. 2, pp. 743-748.

[13] Marcus, A., Poshyvanyk, D., “The Conceptual Cohesion of Classes”, In Proceedings, 21st IEEE International Conference on Software Maintenance (ICSM’05), pages 133–142, 2005.

[14] Briand L., Wust, J., Daly, J., and Porter, D., “Exploring the Relationships between design measures and software Quality in Object Oriented Systems”, Journal of Systems and Software, Vol. 51, Issue 3, pp. 245-273, 2000.

[15] Gyimothy, T., Ferenc, R., and Siket, I., “Empirical Validation of Object-Oriented Metrics on Open Source Software for Fault Prediction,” IEEE Transactions on Software Enineering, Vol. 31, Issue 10, pp. 897-910, 2005.

[16] Zhou, Z., Leung, H., “Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults”, IEEE Trans. On Software Engineering, Vol. 32, No. 10, pp 771-789, 2006.

[17] Marcus, M., Poshyvanyk, D., “Using the Conceptual Cohesion of Classes for Fault Prediction in Object-Oriented System”, IEEE Transactions on Software Engineering, Vol. 34, No. 2, 2008.

[18] Lee, J., Jung, S., Kim, S., Jang, W., and Ham, D: “Component Identification Method with Coupling and Cohesion,” in Proceedings Eighth Asia-Pacific Software Eng. Conf., pp. 79-86, Dec. 2001.

[19] Gui, G., Scott, D., "Measuring Software Component Reusability by Coupling and Cohesion Metrics," Journal of Computers, vol 4, no 9, pp 797-805, 2009, Academy Publishers.

[20] Briand, L., Daly, J., Wust, J., “A Unified Framework for Cohesion Measurement in Object-Oriented Systems”, Empirical Softw. Engg., Vol. 3, Issue 1, pp. 65–117, 1998.

Metric Average Std Dev

Metric Average Std Dev

CAMC 0.21 0.18 CAMCs 0.48 0.21 NHD 0.66 0.21 NHDs 0.81 0.12 SNHD -0.43 0.51 SNHDs 0.63 0.42 NHDM 0.05 0.16 NHDMs 0.38 0.22

354