Software Quality Ranking: Bringing Order to Software
Modules in Testing
Fei Xing
Michael R. Lyu
Ping Guo
2
Outline
Background Support Vector Machine
Basic theory Ranking SVM Other types of SVM
Our proposed framework Experiments Conclusions
3
Background
Modern society is fast becoming dependent on software products and systems.
Achieving high reliability is one of the most important challenges facing the software industry.
Software quality models are in desperate need.
4
Background
Software quality model A software quality model is a tool for focusing
software enhancement efforts. Such a model yield timely predictions on a
module-by-module basis, enabling one to target high-risk modules.
5
Background
Software complexity metrics A quantitative description of program attributes. Closely related to the distribution of faults in
program modules. Playing a critical role in predicting the quality of
the resulting software.
6
Background
Software quality prediction Software quality prediction aims to evaluate
software quality level periodically and to indicate software quality problems early.
Investigating the relationship between the number of faults in a program and its software complexity metrics
7
Several different techniques have been proposed to develop predictive software metrics for the classification of software program modules into fault-prone and non fault-prone categories.
Discriminant analysis, Factor analysis, Classification trees, Pattern recognition,
EM algorithm, Feedforward neural
networks, Random forests
Related work
8
The limitation of current models
Two categories can not fully reflect the characteristics (human, time, equipment, etc) are limited, some of fault-prone modules should be tested with higher priority
An ideal approach is ranking all the modules according to their fault-prone level
9
Research Objectives
In search of a well accepted mathematical model for software quality ranking.
Lay out the integrated solution of software quality prediction for real-world project.
Perform experimental comparison for the assessment of the proposed model.
10
Support Vector Machine
Introduced by Vapnik in the late 1960s on the foundation of statistical learning theory
Traced back to the classical structural risk minimization (SRM) approach
Generalize well even in high dimensional spaces under small training sample conditions
11
The current state-of-the-art classifier
Decision Plane
Support Vectors
Margin
Basic theory of SVM
12
The Optimal Separating Hyperplane Place a linear boundary between the two
different classes, and orient the boundary in such a way that the margin is maximized.
The optimal hyperplane is required to satisfy the following constrained minimization as:
( ) 0g x w x b
. . [( ) ] 1 0i is t y w x b
21min{ }
2w
Basic theory of SVM
13
The Generalized Optimal Separating Hyperplane For the linearly non-separable case, positive slack
variables are introduced:
C is used to weight the penalizing variables , and a larger C corresponds to assigning a higher penalty to errors.
. . [( ) ] 1 0i i is t y w x b
0i
2
1
1min
2
n
ii
C
w
i
Basic theory of SVM
14
Rank each sample to an appropriate position. For linear case, find a weight vector w which
makes the maximum number of the following inequalities hold:
Constrained optimization problem:
i
Ranking SVM
15
Other types of SVM
SVM with risk control Transductive Support Vector Machines Support Vector Regression
16
Ou
r framew
ork
17
Experiments
Data Description Medical Imaging System (MIS) data set. 11 software complexity metrics were
measured for each of the modules Change Reports (CRs) represent faults
detected.
18
Total lines of code including comments (LOC) Total code lines (CL) Total character count (TChar) Total comments (TComm) Number of comment characters (MChar) Number of code characters (DChar) Halstead’s program length (N) Halstead’s estimated program length ( ) Jensen’s estimator of program length (NF ) McCabe’s cyclomatic complexity (v(G)) Belady’s bandwidth metric (BW), ……
N̂
Metrics of MIS data
19
Experiments on Model Selection
The later the errors are found, the higher the risk will be
Risk increases as time goes by
e.g. r(t)=bt2 r(t)=aebt
20
Experiments on Model Selection
Measure of risk
21
Experiments on Model Selection
Software Development Process Simulation, Case1
# of developed software modules are increasing at a speed of 40 modules at each time advancement
10 percent of all the modules have fault data available
The modules with fault data for training model The 40 newly developed modules for testing
22
Experiments on Model Selection
23
Experiments on Model Selection
Software Development Process Simulation, Case2
# of developed software modules are increasing at a speed of 40 modules at each time advancement
The fault data of all the previous modules can be obtained
The modules with fault data for training model The 40 newly developed modules for testing
24
Experiments on Model Selection
25
Comparison of ranking models
Applied models LOC: Lines of code PCA: Principal Component Analysis Regression tree SVR: Support Vector Regression Ranking SVM
Evaluation criteria Normalized Discounted Cumulative Gain (nDCG) Average Distance Measure (ADM)
26
Normalized Discounted Cumulative Gain (nDCG)
The Gain (G) of each software module is its fault-prone score
27
Comparison on nDCG measure
28
Average Distance Measure (ADM)
29
Comparison on ADM measure
30
Features of this work Introduce ranking model instead of
classification model into software quality prediction
Propose an integrated framework of software quality prediction on real-world project
Discussion
31
Conclusions
Ranking SVM offers a promising technique in software module ranking.
The ranking model is more efficient than classification model on the case of enough fault data.
For the case of limited fault data, classification model is better than ranking model
The end
Thanks
Q&A
Top Related