[IEEE 2006 International Conference on Emerging Technologies - Peshawar, Pakistan...

6
IEEE-ICET 2006 2 nd International Conference on Emerging Technologies Peshawar, Pakistan, 13-14 November 2006 1-4244-0502-5/06/$20.00©2006 IEEE 664 A Neuro-Fuzzy Based Software Reusability Evaluation System with Optimized Rule Selection Parvinder Singh Sandhu 1 And Hardeep Singh 2 1 Assistant Professor (CSE), Guru Nanak Dev Engineering College, Ludhiana-141006 (Punjab) India Email: [email protected] 2 Professor & Head, Deptt. Of CSE, Guru Nanak Dev University, Amritsar (Punjab) India Abstract: There are metrics for identifying the quality of reusable components but the function that makes use of these metrics to find reusability of software components is still not clear. We critically analyzed the CK metrics, tried to remove the inconsistencies and devised Neuro-fuzzy framework that gets input in form of tuned WMC, DIT, NOC, CBO, LCOM values of a software component and output can be obtained in terms of reusability. This paper also shows how a small number of fuzzy rules can be selected for designing initial fuzzy rule-base for Neuro-fuzzy systems. It consists of two phases: Generation of candidate rules by ID3 decision tree algorithm and rule pruning by evaluation of rules with help of two rule evaluation criteria. The developed reusability evaluation system has produced high precision results. Hence, the developed system can be used for identification and extraction of OO based reusable components from legacy systems and evaluation of developed or developing reusable components. Keywords: Reusability, ID3, CK Metric, Neuro-fuzzy system. 1. INTRODUCTION The requirement today is to relate the reusability attributes with the metrics and to find how these metrics collectively determine the reusability of the software component. To achieve both the quality and productivity objectives it is always recommended to go for the software reuse that not only saves the time taken to develop the product from scratch but also delivers the almost error free code, as the code is already tested many times during its earlier reuse. A great deal of research over the past several years has been devoted to the development of methodologies to create reusable software components and component libraries, where there is an additional cost involved to create a reusable component from scratch. That additional cost could be avoided by identifying and extracting reusable components from the already developed large inventory of existing systems. But the issue of how to identify good reusable components from existing systems has remained relatively unexplored. Our approach, for identification and evaluation of reusable software, is based on software models and metrics. As the exact relationship between the attributes of the reusability is difficult to establish so a Neural Network approach could serve as an economical, automatic tool to generate reusability ranking of software[1][2] by formulating the relationship based on its training. When one designs with Neural Networks alone, the network is a black box that needs to be defined; this is a highly compute-intensive process. One must develop a good sense, after extensive experimentation and practice, of the complexity of the network and the learning algorithm to be used. Neural nets and fuzzy systems, although very different, have close relationship: they work with impression in a space that is not defined by crisp, deterministic boundaries [3]. With the objective of taking advantage of the features of the both [4], we proposed Neuro-Fuzzy approach to economically determining reusability of OO- based software components in existing systems as well as the reusable components that are in the design phase. Inputs to Neuro-fuzzy system, are provided in form of tuned WMC, DIT, NOC, CBO , LCOM values of the OO-based software component and output is be obtained in terms of reusability. This paper also shows how a small number of fuzzy rules can be selected for designing initial fuzzy rule-base for Neuro-fuzzy systems. Our approach consists of two phases: Generation of candidate rules by ID3 decision tree algorithm and rule pruning by rule evaluation with help of two rule evaluation criteria.

Transcript of [IEEE 2006 International Conference on Emerging Technologies - Peshawar, Pakistan...

Page 1: [IEEE 2006 International Conference on Emerging Technologies - Peshawar, Pakistan (2006.11.13-2006.11.14)] 2006 International Conference on Emerging Technologies - A Neuro-Fuzzy Based

IEEE-ICET 2006 2nd International Conference on Emerging Technologies Peshawar, Pakistan, 13-14 November 2006

1-4244-0502-5/06/$20.00©2006 IEEE 664

A Neuro-Fuzzy Based Software Reusability

Evaluation System with Optimized Rule Selection

Parvinder Singh Sandhu1 And Hardeep Singh2

1Assistant Professor (CSE), Guru Nanak Dev Engineering College, Ludhiana-141006 (Punjab) India Email: [email protected]

2Professor & Head, Deptt. Of CSE, Guru Nanak Dev University, Amritsar (Punjab) India

Abstract: There are metrics for identifying the quality

of reusable components but the function that makes

use of these metrics to find reusability of software

components is still not clear. We critically analyzed

the CK metrics, tried to remove the inconsistencies

and devised Neuro-fuzzy framework that gets input in

form of tuned WMC, DIT, NOC, CBO, LCOM values

of a software component and output can be obtained

in terms of reusability. This paper also shows how a

small number of fuzzy rules can be selected for

designing initial fuzzy rule-base for Neuro-fuzzy

systems. It consists of two phases: Generation of

candidate rules by ID3 decision tree algorithm and

rule pruning by evaluation of rules with help of two rule evaluation criteria. The developed reusability

evaluation system has produced high precision

results. Hence, the developed system can be used for identification and extraction of OO based reusable

components from legacy systems and evaluation of

developed or developing reusable components.

Keywords: Reusability, ID3, CK Metric, Neuro-fuzzy

system.

1. INTRODUCTION

The requirement today is to relate the reusability attributes with the metrics and to find how these metrics collectively determine the reusability of the software component. To achieve both the quality and productivity objectives it is always recommended to go for the software reuse that not only saves the time taken to develop the product from scratch but also delivers the almost error free code, as the code is already tested many times during its earlier reuse.

A great deal of research over the past several years has been devoted to the development of methodologies to create reusable software components and component libraries, where there is an additional cost involved to create a reusable component from scratch. That

additional cost could be avoided by identifying and extracting reusable components from the already developed large inventory of existing systems. But the issue of how to identify good reusable components from existing systems has remained relatively unexplored. Our approach, for identification and evaluation of reusable software, is based on software models and metrics. As the exact relationship between the attributes of the reusability is difficult to establish so a Neural Network approach could serve as an economical, automatic tool to generate reusability ranking of software[1][2] by formulating the relationship based on its training. When one designs with Neural Networks alone, the network is a black box that needs to be defined; this is a highly compute-intensive process. One must develop a good sense, after extensive experimentation and practice, of the complexity of the network and the learning algorithm to be used. Neural nets and fuzzy systems, although very different, have close relationship: they work with impression in a space that is not defined by crisp, deterministic boundaries [3]. With the objective of taking advantage of the features of the both [4], we proposed Neuro-Fuzzy approach to economically determining reusability of OO-based software components in existing systems as well as the reusable components that are in the design phase. Inputs to Neuro-fuzzy system, are provided in form of tuned WMC, DIT, NOC, CBO , LCOM values of the OO-based software component and output is be obtained in terms of reusability. This paper also shows how a small number of fuzzy rules can be selected for designing initial fuzzy rule-base for Neuro-fuzzy systems. Our approach consists of two phases: Generation of candidate rules by ID3 decision tree algorithm and rule pruning by rule evaluation with help of two rule evaluation criteria.

Page 2: [IEEE 2006 International Conference on Emerging Technologies - Peshawar, Pakistan (2006.11.13-2006.11.14)] 2006 International Conference on Emerging Technologies - A Neuro-Fuzzy Based

665

2. BACKGROUND

It was Selby [5] tried to identify a number of characteristics of those components, from existing systems, that are been reused at NASA laboratory and reported that the developers there has achieved a 32 percent reusability index. Dunn and Knight in 1991[6] also experimented and reported the usefulness of reusable code scavenging. Chen, Nishimoto and Ramamoorty briefly discuss the idea of subsystem extraction by using code information stored in a relational database [7]. They also describe a tool called the C Information Abstraction System to support this process. Esteva and Reynolds [8] describe the use of Inductive Learning techniques based on software metrics used to identify reusable modules. Caldiera and Basili [9] describe a tool, called Care, which helps to identify reusable components according to a set of “reusability attributes” based on software metrics. Mayobre [10] describes how these techniques can be extended and used to help in identifying data communication components a Hewlett-Packard.

Arnold [11][12] mentions a number of heuristics that can used for locating reusable components in the Ada source code. The heuristics are counting the number of references to a particular procedure, identifying the loosely coupled modules and identifying modules that carry high cohesion. Selby’s recent experimental study [13] has identified factors that characterize module reuse without revision were: low coupling, high cohesion, few input-output parameters, and many comments. The module implementation factors that characterize module reuse without revision were small size in source lines and many assignment statements (i.e. low Cyclometric complexity).

3. METHODOLOGY

Reusability evaluation System for OO-Based Software Components can be framed using following steps:

Selection and refinement of metrics targeting the quality of OO-based software system and perform parsing of the software system to generate the Meta information related to that Software.

Deciding the Membership functions and optimized selection of initial rule-base for the Fuzzy Inference System(FIS)

Neuro-Fuzzy system, which is makes use of FIS of previous phase, will get the meta information from the first stage and determines the reusability value of the software components.

3.1 Metric Selection and Refinement

As CK metric suit is able to target all the essential attributes of OO-based software as mentioned by Selby [13] in his latest findings, so we analyzed, refined and used following metrics of CK metric suit to explore different structural dimensions of a class.

Weighted methods per class (WMC)

We have used “tuned WMC” (TWMC) measure as input to the neuro-fuzzy inference engine by restricting the WMC[14][15] value in between 0 and 1 as shown in (1).

ecxa

caxf)(1

1),,( (1)

Where a=10 and c=0.5.

Depth of inheritance tree (DIT)

If we add all the ancestor classes coming in common path to the ancestor classes coming in alternative paths then that will be the true representation of the theoretical basis of the DIT metric [16]. We have used “lack of tuned degree of inheritance” (LTDIT) measure as input to the Neuro-fuzzy inference system, in order to restrict the input value between 0 and 1.

Number of Children (NOC)

The definition of NOC metric [14] gives the distorted view of the system as it counts only the immediate sub-classes instead of all the descendants of the class. So according to [16] the NOC value of a class, say class i, should reflect all the subclasses that share the properties of that class as shown in (2).

sesAllSubclas

i

iNOCNiNOC )()( (2)

Where N is the total number of immediate subclasses of class i. In order to restrict the input value between 0 and 1, we have used “lack of tuned Number of Children” (LTNOC) measure as input to the Neuro-fuzzy inference system.

Page 3: [IEEE 2006 International Conference on Emerging Technologies - Peshawar, Pakistan (2006.11.13-2006.11.14)] 2006 International Conference on Emerging Technologies - A Neuro-Fuzzy Based

666

Coupling Between Object Classes (CBO)

As Coupling between Object classes increases, reusability decreases and it becomes harder to modify and test the software system. So there is the need to set some maximum value of coupling level for its reusability & thief the value if CBO for a class is beyond that maximum value then the class is said to be non-reusable[14][15].

In order to restrict the input value between 0 and 1, we have used “lack of CBO” (LCBO) measure as input to the Neuro-fuzzy inference system.

Lack of Cohesion in Methods (LCOM)

The high value of LCOM indicates that the methods in the class are not really related to each other and vice versa means less reusability otherwise low value of LCOM depicts high internal strength of the class which results into high reusability. So there should be some maximum value of LCOM after that class becomes non-reusable [14][15].

We have used “tuned LCOM” (TLCOM) measure as input to the neuro-fuzzy inference engine by restricting the LCOM value in between 0 and 1 with help of sigmoidal function as shown in (3).

ecxa

caxf)(1

1),,( (3)

Where a=4 and c=1.5.

3.2 Optimized Initial Rule Selection

In this paper, we used fifteen antecedent fuzzy sets. For generating short fuzzy rules with a small number of antecedent conditions, we use “don’t care” as an additional antecedent fuzzy set. Thus an antecedent fuzzy set for each input is chosen from the 15 fuzzy sets and “don’t

care”. The total number of combinations of antecedent fuzzy sets is 16n for an n-dimensional pattern classification problem.

Optimized rule selection is performed in two steps: candidate rules are generated and rule pruning is done using rule evaluation. Evaluation of candidate rules is based on fuzzy versions of two rule evaluation criteria (i.e., confidence and support) for association rules,

which have been frequently used in the field of data mining [17]. In our rule pruning procedure, fuzzy rules are divided into several groups according to their consequent classes. Then fuzzy rules in each group are sorted in a descending order of the product of confidence

and support. Finally a pre-specified number of fuzzy rules are chosen from the top of the rule list for each group. The selected fuzzy rules are used as initial rule-base in Neuro-fuzzy System.

Generation of Candidate Rules

In this phase infer the decision tree from the training set, growing the tree until the training data is fit as well as possible and allowing over fitting to occur. We have used ID3 decision tree generation algorithm, as its time complexity is less as compared to others. ID3 algorithm uses information gain measure to select among the candidate attributes at each step while growing the tree. Thereafter, convert the learned tree into an equivalent set of rules by creating one rule for each path from the root node to a leaf node.

Let us consider an M-class pattern classification problem with m labeled patterns xp=(xp1,…,xpn),p=1,2,…,m in an n-dimensional continuous pattern space. For simplicity of explanation, we assume that the pattern space is the n-dimensional unit hypercube [1, 0]n . That is, we assume that all attribute values are real numbers in the unit interval [1, 0]. Using the ID3 decision tree generation algorithm following form of fuzzy rules is generated:

Rule Rq : If x1 is Aq1 and ... and xn is Aqn then Class Cq.

Where Rq is the q-th fuzzy rule, x = (x1,…, xn)is an n-dimensional pattern vector, Aqi is an antecedent fuzzy set and Cq is a consequent class (i.e., one of the M classes).The antecedent fuzzy set Aqi is one of the 15 fuzzy sets in Fig. or “don’t care”.

Rule Pruning

In the area of data mining, two criteria called confidence and support have often been used for evaluating association rules [17]. Our fuzzy rule in can be viewed as an association rule of the form Aq Cq . We use the two criteria for evaluating and selection of fuzzy rules. The definitions of these two criteria can be extended to the case of the fuzzy association rule Aq Cq

Page 4: [IEEE 2006 International Conference on Emerging Technologies - Peshawar, Pakistan (2006.11.13-2006.11.14)] 2006 International Conference on Emerging Technologies - A Neuro-Fuzzy Based

667

[18]. Similar extensions of the two criteria to fuzzy association rules were also proposed in [19]. Let D be the set of the given m training patterns xp=(xp1,…,xpn), p=1,2,…,m. The confidence [17] of Aq Cq is defined in (4).

|)(|

|)()D(|)(

A

CA

q

qq

D

Dc CA qq (4)

Where the denominator is the number of training patterns compatible with the antecedent part Aq , and the numerator is the number of training patterns compatible with both the antecedent part Aq and the consequent class Cq .

The confidence, c, indicates the c ( 100%) of training patterns compatible with Aq are also compatible with Cq. In the case of standard association rules, neither Aq nor Cq is fuzzy. Thus the calculations of |D(Aq) | and |D(Aq)D(Cq)| can be performed by simply counting compatible training patterns. But, in case of fuzzy association rules, each training pattern has

a different compatibility grade Aq xp) with the antecedent part Aq of the rules. Thus such a compatibility grade should be taken into account. Since the consequent class Cq is not fuzzy in the sugeno type fuzzy inference system, the confidence in (4) can be rewritten [18]as shown in (5).

m

ppAq

classppAq

qq

x

x

CAC

cq

1

)(

)(

)( (5)

The compatibility grade Aq xp )is usually defined by the product or minimum operator. We have use the product operator as

Aq xp )= Aq1 xp1 ) Aqn xpn )

where Aqi xpi ) is the membership function of the antecedent fuzzy set Aqi. On the other hand, the support [17] of Aq Cq is defined in (6).

||

|)()D(|)(

CA qq

D

Ds CA qq

(6)

The support s indicates s ( 100%) of all the training patterns are compatible with the association rule Aq Cq (i.e., compatible with both Aq and Cq). The support in can be rewritten [18]as shown in (7).

m

Cs

qclassp

pAq

qq

x

CA

)(

)((7)

The generated candidate fuzzy rules, from previous section, are divided into M groups according to their consequent classes. Fuzzy rules in each group are sorted in a descending order of the product of the confidence and the

support (i.e., c s ). For selecting N candidate

rules, the first M/ N rules are chosen from each of the M groups. In this manner, we can choose a pre-specified number of rules as initial rulebase for the neuro-fuzzy inference system.

3.3 Neuro-Fuzzy Based System

In the Sugeno type neuro-fuzzy inference system using a given input/output data set, we have constructed a fuzzy inference system (FIS) whose membership function parameters are tuned (adjusted) using stochastic gradient descent rule with momentum for the parameters associated with the input membership functions. The initial rule-base for the Neuro-fuzzy system can be obtained from the earlier phase of optimal rule selection. As more the initial membership functions resemble the optimal ones, the easier it will be for the model parameter training to converge. As a result, the training error decreases, at least locally, throughout the

learning process.

4. IMPLEMENTATION & RESULTS

We applied the developed system on the data set consists of 48 samples with 5 continuous attributes from six classes. We normalized each attribute value into a real number in the unit interval [0, 1]. Thus the data set was handled as a six-class pattern classification problem in the 5-dimensional unit hypercube [1, 0]5. The total number of possible combinations of antecedent fuzzy sets is 165.

In the fuzzy Inference system Linguistic variables are then assigned to the input parameters based on their values. The assignment of the linguistic variables depends on the range of the input measurement. TWMC, LTDIT, LTNOC, LCBO and TLCOM are assigned three linguistic variables LOW, MEDIUM and HIGH in the range of 0 to 1.

Page 5: [IEEE 2006 International Conference on Emerging Technologies - Peshawar, Pakistan (2006.11.13-2006.11.14)] 2006 International Conference on Emerging Technologies - A Neuro-Fuzzy Based

668

Reusability is assigned six linguistic variables PERFECT, HIGH, MEDIUM, LOW, VERY-LOW and NIL as constants in the range of 0-1.

Fig. 1. Neural Network incorporating the Fuzzy inference system.

A network-type structure similar to that of a neural network, as shown in the fig. 1, which maps inputs through input membership functions and associated parameters, and then through output membership functions and associated parameters to outputs, is used to interpret the input/output map. The parameters associated with the membership functions will change through the learning process.

The Training of the for 200 iterations and the training error reduces after each iteration as shown by the fig. 2 and stabilities at the error value of 0.0345, so at this point the network is said to be converged.

Fig. 2 Plot of Training error V/s Epochs.

5. CONCLUSION

During the testing phase, when the developed system is tested against the testing data and an average testing error 0.0324 is obtained. The plot between the actual output and the expected output is shown in fig. 3. The best result among the three criteria for rule evaluation (i.e., confidence, support, and their product) was obtained. As the actual output produced by the

“Reusability Evaluation System” is close to the expected output, so the system can be recommended for Automatic identification potential reusable object oriented software components from the legacy systems and evaluating the quality of developed or developing reusable components for better productivity and quality.

Fig. 3 Plot between the actual output and expected output

REFERENCES

[1] G. Boetticher and D. Eichmann, “A Neural Network Paradigm for Characterizing Reusable Software”, Proc. of the Australian Conference on Software Metrics, 18-19 November 1993.

[2] G. Boetticher, K. Srinivas, and D. Eichmann, “A Neural Net-based Approach to Software Metrics”, Proc. of the 5th International Conference on Software Engineering and Knowledge Engineering, San Francisco, CA, 14-18 June 1993, pp 271-274.

[3] S. V. Kartalopoulos, Understanding Neural Networks and Fuzzy Logic-Basic Concepts and Applications, IEEE Press, 1996, pp. 153-160.

[4] J-S. R. Jang and C.T. Sun, “Neuro-fuzzy Modeling and Control,” Proc. of IEEE, March 1995.

[5] R. W. Selby, Empirically Analyzing Software Reuse in a Production Environment, Software Reuse: Emerging Technology, W. Tracz, ed, IEEE Computer Society Press, 1988.

[6] M. F. Dunn and J. C. Knight, “Software reuse in Industrial setting : A Case Study,” Proc. of the 13th International Conference on Software Engineering, Baltimore, MA, 1993.

[7] Y. F. Chen, M. Y. Nishimoto and C. V. Ramamoorty, “The C Information Abstraction System,” IEEE Trans. on Software Engineering, Vol. 16, No. 3, March 1990.

[8] J. C. Esteva and R. G. Reynolds, “Identifying Reusable Components using Induction,” International Journal of Software Engineering and Knowledge Engineering, Vol. 1, No. 3 (1991) 271-292.

[9] G. Caldiera and V. R. Basili, “Identifying and Qualifying Reusable Software Components,” IEEE Computer, February 1991.

[10] G. Mayobre, “Using Code Reusability Analysis to Identify Reusable Components from Software Related to an Application Domain,” Proc. of the Fourth

Page 6: [IEEE 2006 International Conference on Emerging Technologies - Peshawar, Pakistan (2006.11.13-2006.11.14)] 2006 International Conference on Emerging Technologies - A Neuro-Fuzzy Based

669

Workshop on Software Reuse, Reston. VA, November, 1991.

[11] R.S. Arnold, Heuristics for Salvaging Reusable Parts From Ada Code, SPC Technical Report, ADA_REUSE_HEURISTICS-90011-N, March 1990.

[12] R.S. Arnold, Salvaging Reusable Parts From Ada Code: A Progress Report, SPC Technical Report, SALVAGE_ADA_PARTS_PR-90048-N, September 1990.

[13] Richard W. Selby, "Enabling Reuse-Based Software Development of Large-Scale Systems", IEEE Trans. on Software Engineering, Vol. 31, No. 6, June 2005 pp. 495-510.

[14] S.R. Chidamber and C.F. Kemerer, “A Metric Suite for Object Oriented Design”, IEEE Trans. on Software Engineering , Vol. 20, pp. 476-493,1994.

[15] S.R. Chidamber and C.F. Kemerer, “Towards a Metrics Suite for Object Oriented Design,” Proc. Conf.

Object Oriented Programming Systems, Languages, and Applications (OOPSLA’91), vol. 26, no. 11, pp. 197-211, 1991.

[16] Parvinder Singh and Hardeep Singh, “Critical Suggestive Evaluation of CK METRIC”, Proc. of 9th Pacific Asia Conference on Information Technology (PACIS-2005),Bangkok, Thailand, July 7 – 10, 2005.

[17] R. Agrawal et al., Fast discovery of association rules, in U. M. Fayyad et al. (eds.) Advances in Knowledge Discovery Data Mining, 307-328, AAAI Press, Menlo Park, 1996.

[18] H. Ishibuchi, T. Yamamoto, and T. Nakashima, Fuzzy data mining: Effect of fuzzy discretization, Proc. of 1st IEEE International Conference on Data Mining, 241-248, 2001.

[19] T. -P. Hong, C. -S. Kuo, and S. -C. Chi, Trade-off between computation time and number of rules for fuzzy mining from quantitative data, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 9, 587-604, 2001.