Multi-Objective Cross-Project Defect Prediction

Post on 18-Jul-2015

116 views 0 download

Tags:

Transcript of Multi-Objective Cross-Project Defect Prediction

Gerardo  Canfora  

Andrea  De  Lucia  

Massimiliano  Di  Penta  

Rocco  Oliveto  

Annibale Panichella

Sebas<ano  Panichella  

Multi-Objective Cross-Project Defect Prediction

Bugs are everywhere…

Software Testing

Practical Constraints

Sofwtare Quality

Money Time

Defect Prediction

Spent more resources on components most

likely to fail

Indicators of defects

Cached history information Kim  at  al.    ICSE  2007  

Change Metrics Moset  at  al.    ICSE  2008.  

A metrics suite for object oriented design Chidamber  at  al.  TSE      1994  

Defect Prediction Methodology

Predic<ng  Model  

 Project    

Test  Set  

Training  Set  Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Defect Prediction Methodology

Predic<ng  Model  

 Project    

Test  Set  

Training  Set  Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Within Project

Defect Prediction Methodology

Predic<ng  Model  

 Project    

Test  Set  

Training  Set  Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Within Project Issue: Size of the

Training Set

Defect Prediction Methodology

Predic<ng  Model  

 Project    

Test  Set  

Training  Set  Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Predic<ng  Model  

Test  Set  

Training  Set  Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Within Project Issue: Size of the

Training Set

 Past  Projects  

 New  Project  

 Project  B  

 Project  A  

Defect Prediction Methodology

Predic<ng  Model  

 Project    

Test  Set  

Training  Set  Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Predic<ng  Model  

Test  Set  

Training  Set  Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Within Project

Cross-Project

Issue: Size of the Training Set

 Project  B  

 Project  A  

Defect Prediction Methodology

Predic<ng  Model  

 Project    

Test  Set  

Training  Set  Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Predic<ng  Model  

Test  Set  

Training  Set  Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Within Project

Cross-Project

Issue: Size of the Training Set

Issue: The predicting

accuracy can be lower

Cost Effectiveness

1)  Cross-project does not necessarily works worse than within-project

2)  Better precision (accuracy)

does not mirror less inspection cost

3)  Traditional predicting model: logistic regression

Recaling the “imprecision” of Cross-project Defect Prediction, Rahman   at   al.  FSE  2012  

Cost Effectiveness: example

Class  A   Class  B   Class  C   Class  D  

Cost Effectiveness: example

Predicting model 1

Class  A   Class  B   Class  A   Class  C   Class  D  

100 LOC

10,000 LOC

100 LOC

100 LOC

100 LOC

Predicting model 2

Class  A   Class  B   Class  C   Class  D  

Cost Effectiveness: example

Predicting model 1

Class  A   Class  B   Class  A   Class  C   Class  D  

BUG   BUG  

100 LOC

10,000 LOC

100 LOC

100 LOC

100 LOC

Predicting model 2

Class  A   Class  B   Class  C   Class  D  

Cost Effectiveness: example

Predicting model 1

Class  A   Class  B   Class  A   Class  C   Class  D  

BUG   BUG  

100 LOC

10,000 LOC

100 LOC

100 LOC

100 LOC

Precision  =  50  %  

Cost  =10,100  LOC  

Predicting model 2

Class  A   Class  B   Class  C   Class  D  

Cost Effectiveness: an example

Predicting model 1

Class  A   Class  B   Class  A   Class  C   Class  D  

BUG   BUG  

100 LOC

10,000 LOC

100 LOC

100 LOC

100 LOC

Precision  =  50  %  

Cost  =10,100  LOC  

Predicting model 2

Precision  =  33  %  

Cost  =  300  LOC  

Class  A   Class  B   Class  C   Class  D  

Class  A   Class  B   Class  C   Class  D  

Cost Effectiveness: an example

Predicting model 1

Class  A   Class  B   Class  A   Class  C   Class  D  

BUG   BUG  

100 LOC

10,000 LOC

100 LOC

100 LOC

100 LOC

Predicting model 2

Precision does not mirror the inspection cost

All the existing predicting models work

on precision and not on cost

We need COST oriented models

Mul+-­‐objec+ve    Logis+c  Regression  

Building Predicting Model on Training Set

Training  Set  

P1 P2 …

Class1 m11 m12 …

Class2 m21 m22 …

Class3 m31 m32 …

Class4 … … …

… … … …

Logis<c  Regression  

Pred.

C1 1

C2 1

C3 0

C4 1

… 0

Building Predicting Model on Training Set

Training  Set  

Logis<c  Regression  

Pred.

C1 1

C2 1

C3 0

C4 1

… 0

Actual Val

C1 1

C2 0

C3 1

C4 1

… 0

P1 P2 …

Class1 m11 m12 …

Class2 m21 m22 …

Class3 m31 m32 …

Class4 … … …

… … … …

Building Predicting Model on Training Set

Training  Set  

Logis<c  Regression  

Pred.

C1 1

C2 1

C3 0

C4 1

… 0

Actual Val

C1 1

C2 0

C3 1

C4 1

… 0

Comparison

P1 P2 …

Class1 m11 m12 …

Class2 m21 m22 …

Class3 m31 m32 …

Class4 … … …

… … … …

Building Predicting Model on Training Set

Training  Set  

Logis<c  Regression  

Pred.

C1 1

C2 1

C3 0

C4 1

… 0

Actual Val

C1 1

C2 0

C3 1

C4 1

… 0

Comparison

P1 P2 …

Class1 m11 m12 …

Class2 m21 m22 …

Class3 m31 m32 …

Class4 … … …

… … … …

GOAL: minimazing the predicting error (PRECISION)

Building Predicting Model on Training Set

Training  Set  

Logis<c  Regression  

Pred.

C1 1

C2 1

C3 0

C4 1

… 0

Actual Val

C1 1

C2 0

C3 1

C4 1

… 0

Comparison

P1 P2 …

Class1 m11 m12 …

Class2 m21 m22 …

Class3 m31 m32 …

Class4 … … …

… … … …

GOAL: minimazing the predicting error (PRECISION)

Multi-objective Logistic Regression

Pred.

1

0

1

0

LOC

100

95

110

10

*   =  

Cost

100

0

110

0

Ispection Cost = 210 LOC

Multi-objective Logistic Regression

Pred.

1

0

1

0

LOC

100

95

110

10

*   =  

Cost

100

0

110

0

Ispection Cost = 210 LOC

Pred.

1

0

1

0

Actual Values

1

1

1

0

*   =  

#Bug

1

0

1

0

Effectiveness = 2 defects

Multi-objective Logistic Regression

⎪⎩

⎪⎨

⋅=

⋅=

∑∑

iii

ii

i

ActualPredessEffectiven

CostPredCostIspection min

max

Pred.

1

0

1

0

LOC

100

95

110

10

*   =  

Cost

100

0

110

0

Ispection Cost = 210 LOC

Pred.

1

0

1

0

Actual Values

1

1

1

0

*   =  

#Bug

1

0

1

0

Effectiveness = 2 defects

Multi-objective Logistic Regression

⎪⎩

⎪⎨

⋅=

⋅=

∑∑

iii

ii

i

ActualedessEffectiven

CostPredCostIspection

Pr

min

max

Pred.

1

0

1

0

LOC

100

95

110

10

*   =  

Cost

100

0

110

0

Ispection Cost = 210 LOC

Pred.

1

0

1

0

Actual Values

1

1

1

0

*   =  

#Bug

1

0

1

0

Effectiveness = 2 defects

a + b mi1 + c mi2 + …

Multi-objective Genetic Algorithm

⎪⎩

⎪⎨

⋅=

⋅=

∑∑

iii

ii

i

ActualedessEffectiven

CostPredCostIspection

Pr

min

max

. 1

eePred+

=a + b mi1 + c mi2 + …

Chromosome        (a, b,c , …)

Fitness Function

Multiple objectives are optimized using Pareto efficient approaches

Multi-objective Genetic Algorithm

Pareto Optimality: all solutions that are not dominated by any other solutions form the Pareto optimal set.

Multiple otpimal solutions (models) can be found Cost

Effe

ctiv

enes

s

The frontier allows to make a

well-informed decision that

b a l a n c e s t h e t r a d e - o f f s

between the two objectives

Empirical Evaluation

Research Questions

RQ1: How does the multi-objective (MO)prediction perform,

compared to single-objective (SO) prediction

Research Questions

RQ1: How does the multi-objective (MO)prediction perform,

compared to single-objective (SO) prediction

Cross-project MO vs. cross-project SO vs. within project SO

Research Questions

RQ2: How does the proposed approach perform, compared to the local prediction approach by Menzie et al. ?

RQ1: How does the multi-objective (MO)prediction perform,

compared to single-objective (SO) prediction

Cross-project MO vs. cross-project SO vs. within project SO

Research Questions

RQ2: How does the proposed approach perform, compared to the local prediction approach by Menzie et al. ?

RQ1: How does the multi-objective (MO)prediction perform,

compared to single-objective (SO) prediction

Cross-project MO vs. cross-project SO vs. within project SO

Cross-project MO vs. Local Prediction

Experiment outline •  10 java projects from PROMISE dataset ü   different  sizes  

ü   different  context  applica<on  

•  10 java projects from PROMISE dataset ü   different  sizes  ü   different  context  applica<on  

Experiment outline

•  Cross-projects defect prediction: ü  Training  model  on  nine  projects  and  test  on  the  remaining  one    (10  <mes)  

RQ1  

•  10 java projects from PROMISE dataset ü   different  sizes  ü   different  context  applica<on  

Experiment outline

•  Cross-projects defect prediction: ü  Training  model  on  nine  projects  and  test  on  the  remaining  one    (10  <mes)  

•  Within project defect prediction: ü   10  cross-­‐folder  valida<on  

RQ1  

RQ1  

•  10 java projects from PROMISE dataset ü   different  sizes  ü   different  context  applica<on  

Experiment outline

•  Cross-projects defect prediction: ü  Training  model  on  nine  projects  and  test  on  the  remaining  one    (10  <mes)  

•  Within project defect prediction: ü   10  cross-­‐folder  valida<on  

•  Local prediction: ü     K-­‐means  clustering  algorithm  ü   Silhoue]e  Coefficient  

RQ1  

RQ1  

RQ2  

Results

Results

Log4j jEdit

Cross-project MO vs. Cross-project SO

0  

50  

100  

150  

200  

250  

300  

KLOC  

Cross-­‐project  SO   Cross  project  MO  

Cross-project MO vs. Cross-project SO

0  

50  

100  

150  

200  

250  

300  

KLOC  

Cross-­‐project  SO   Cross  project  MO  

The proposed multi-objective model Outperform the single-objective one

Cross-project MO vs. Within-project SO

0  

50  

100  

150  

200  

250  

300  

350  

KLOC  

Within  project  SO   Cross  project  MO  

Cross-project MO vs. Within-project SO

0  10  20  30  40  50  60  70  80  90  

100  

Precision  

Within  project  SO   Cross  project  MO  

Cross-project MO vs. Within-project SO

0  10  20  30  40  50  60  70  80  90  

100  

Precision  

Within  project  SO   Cross  project  MO  

Cross-project prediction is worse than within-project

prediction in terms of PRECISION

Cross-project MO vs. Within-project SO

0  10  20  30  40  50  60  70  80  90  

100  

Precision  

Within  project  SO   Cross  project  MO  

Cross-project prediction is worse than within-project

prediction in terms of PRECISION

But it is better than within-project predictors in term

of COST-EFFECTIVENESS

0  

50  

100  

150  

200  

250  

300  

KLOC  

Local  Predic<on   Cross  project  MO  

Cross-project MO vs. Local Prediction

0  

50  

100  

150  

200  

250  

300  

KLOC  

Local  Predic<on   Cross  project  MO  

Cross-project MO vs. Local Prediction

The multi-objective predictor outperforms the local

predictor.

Conclusions

Conclusions

Conclusions

Conclusions