Download - Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.

Software Quality Ranking: Bringing Order to Software

Modules in Testing

Fei Xing

Michael R. Lyu

Ping Guo

2

Outline

Background Support Vector Machine

Basic theory Ranking SVM Other types of SVM

Our proposed framework Experiments Conclusions

3

Background

Modern society is fast becoming dependent on software products and systems.

Achieving high reliability is one of the most important challenges facing the software industry.

Software quality models are in desperate need.

4

Background

Software quality model A software quality model is a tool for focusing

software enhancement efforts. Such a model yield timely predictions on a

module-by-module basis, enabling one to target high-risk modules.

5

Background

Software complexity metrics A quantitative description of program attributes. Closely related to the distribution of faults in

program modules. Playing a critical role in predicting the quality of

the resulting software.

6

Background

Software quality prediction Software quality prediction aims to evaluate

software quality level periodically and to indicate software quality problems early.

Investigating the relationship between the number of faults in a program and its software complexity metrics

7

Several different techniques have been proposed to develop predictive software metrics for the classification of software program modules into fault-prone and non fault-prone categories.

Discriminant analysis, Factor analysis, Classification trees, Pattern recognition,

EM algorithm, Feedforward neural

networks, Random forests

Related work

8

The limitation of current models

Two categories can not fully reflect the characteristics (human, time, equipment, etc) are limited, some of fault-prone modules should be tested with higher priority

An ideal approach is ranking all the modules according to their fault-prone level

9

Research Objectives

In search of a well accepted mathematical model for software quality ranking.

Lay out the integrated solution of software quality prediction for real-world project.

Perform experimental comparison for the assessment of the proposed model.

10

Support Vector Machine

Introduced by Vapnik in the late 1960s on the foundation of statistical learning theory

Traced back to the classical structural risk minimization (SRM) approach

Generalize well even in high dimensional spaces under small training sample conditions

11

The current state-of-the-art classifier

Decision Plane

Support Vectors

Margin

Basic theory of SVM

12

The Optimal Separating Hyperplane Place a linear boundary between the two

different classes, and orient the boundary in such a way that the margin is maximized.

The optimal hyperplane is required to satisfy the following constrained minimization as:

( ) 0g x w x b

. . [( ) ] 1 0i is t y w x b

21min{ }

2w

Basic theory of SVM

13

The Generalized Optimal Separating Hyperplane For the linearly non-separable case, positive slack

variables are introduced:

C is used to weight the penalizing variables , and a larger C corresponds to assigning a higher penalty to errors.

. . [( ) ] 1 0i i is t y w x b

0i

2

1

1min

2

n

ii

C

w

i

Basic theory of SVM

14

Rank each sample to an appropriate position. For linear case, find a weight vector w which

makes the maximum number of the following inequalities hold:

Constrained optimization problem:

i

Ranking SVM

15

Other types of SVM

SVM with risk control Transductive Support Vector Machines Support Vector Regression

16

Ou

r framew

ork

17

Experiments

Data Description Medical Imaging System (MIS) data set. 11 software complexity metrics were

measured for each of the modules Change Reports (CRs) represent faults

detected.

18

Total lines of code including comments (LOC) Total code lines (CL) Total character count (TChar) Total comments (TComm) Number of comment characters (MChar) Number of code characters (DChar) Halstead’s program length (N) Halstead’s estimated program length ( ) Jensen’s estimator of program length (NF ) McCabe’s cyclomatic complexity (v(G)) Belady’s bandwidth metric (BW), ……

N̂

Metrics of MIS data

19

Experiments on Model Selection

The later the errors are found, the higher the risk will be

Risk increases as time goes by

e.g. r(t)=bt2 r(t)=aebt

20


Measure of risk

21


Software Development Process Simulation, Case1

# of developed software modules are increasing at a speed of 40 modules at each time advancement

10 percent of all the modules have fault data available

The modules with fault data for training model The 40 newly developed modules for testing

22


23


Software Development Process Simulation, Case2

# of developed software modules are increasing at a speed of 40 modules at each time advancement

The fault data of all the previous modules can be obtained

The modules with fault data for training model The 40 newly developed modules for testing

24


25

Comparison of ranking models

Applied models LOC: Lines of code PCA: Principal Component Analysis Regression tree SVR: Support Vector Regression Ranking SVM

Evaluation criteria Normalized Discounted Cumulative Gain (nDCG) Average Distance Measure (ADM)

26

Normalized Discounted Cumulative Gain (nDCG)

The Gain (G) of each software module is its fault-prone score

27

Comparison on nDCG measure

28

Average Distance Measure (ADM)

29

Comparison on ADM measure

30

Features of this work Introduce ranking model instead of

classification model into software quality prediction

Propose an integrated framework of software quality prediction on real-world project

Discussion

31

Conclusions

Ranking SVM offers a promising technique in software module ranking.

The ranking model is more efficient than classification model on the case of enough fault data.

For the case of limited fault data, classification model is better than ranking model

The end

Thanks

Q&A