Multi-Objective Cross-Project Defect Prediction

Gerardo Canfora

Andrea De Lucia

Massimiliano Di Penta

Rocco Oliveto

Annibale Panichella

Sebas<ano Panichella

Multi-Objective Cross-Project Defect Prediction

Bugs are everywhere…

Software Testing

Practical Constraints

Sofwtare Quality

Money Time

Defect Prediction

Spent more resources on components most

likely to fail

Indicators of defects

Cached history information Kim at al. ICSE 2007

Change Metrics Moset at al. ICSE 2008.

A metrics suite for object oriented design Chidamber at al. TSE 1994

Defect Prediction Methodology

Predic<ng Model

Project

Test Set

Training Set Defect Prone

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Predic<ng Model

Project

Test Set

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Within Project

Predic<ng Model

Project

Test Set

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Within Project Issue: Size of the

Training Set

Predic<ng Model

Project

Test Set

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Predic<ng Model

Test Set

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Within Project Issue: Size of the

Training Set

Past Projects

New Project

Project B

Project A

Predic<ng Model

Project

Test Set

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Predic<ng Model

Test Set

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Within Project

Cross-Project

Issue: Size of the Training Set

Project B

Project A

Predic<ng Model

Project

Test Set

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Predic<ng Model

Test Set

Class1 YES

Class2 YES

Class3 NO

… YES

ClassN …

Within Project

Cross-Project

Issue: Size of the Training Set

Issue: The predicting

accuracy can be lower

Cost Effectiveness

1)  Cross-project does not necessarily works worse than within-project

2)  Better precision (accuracy)

does not mirror less inspection cost

3)  Traditional predicting model: logistic regression

Recaling the “imprecision” of Cross-project Defect Prediction, Rahman at al. FSE 2012

Cost Effectiveness: example

Class A Class B Class C Class D

Predicting model 1

Class A Class B Class A Class C Class D

100 LOC

10,000 LOC

100 LOC

Predicting model 2

Predicting model 1

BUG BUG

100 LOC

10,000 LOC

100 LOC

Predicting model 2

Predicting model 1

BUG BUG

100 LOC

10,000 LOC

100 LOC

Precision = 50 %

Cost =10,100 LOC

Predicting model 2

Cost Effectiveness: an example

Predicting model 1

BUG BUG

100 LOC

10,000 LOC

100 LOC

Precision = 50 %

Cost =10,100 LOC

Predicting model 2

Precision = 33 %

Cost = 300 LOC

Cost Effectiveness: an example

Predicting model 1

BUG BUG

100 LOC

10,000 LOC

100 LOC

Predicting model 2

Precision does not mirror the inspection cost

All the existing predicting models work

on precision and not on cost

We need COST oriented models

Mul+-‐objec+ve Logis+c Regression

Building Predicting Model on Training Set

Training Set

P1 P2 …

Class1 m11 m12 …

Class2 m21 m22 …

Class3 m31 m32 …

Class4 … … …

… … … …

Logis<c Regression

Training Set

Logis<c Regression

Actual Val

P1 P2 …

Class1 m11 m12 …

Class2 m21 m22 …

Class3 m31 m32 …

Class4 … … …

… … … …

Training Set

Logis<c Regression

Actual Val

Comparison

P1 P2 …

Class1 m11 m12 …

Class2 m21 m22 …

Class3 m31 m32 …

Class4 … … …

… … … …

Training Set

Logis<c Regression

Actual Val

Comparison

P1 P2 …

Class1 m11 m12 …

Class2 m21 m22 …

Class3 m31 m32 …

Class4 … … …

… … … …

GOAL: minimazing the predicting error (PRECISION)

Training Set

Logis<c Regression

Actual Val

Comparison

P1 P2 …

Class1 m11 m12 …

Class2 m21 m22 …

Class3 m31 m32 …

Class4 … … …

… … … …

GOAL: minimazing the predicting error (PRECISION)

Multi-objective Logistic Regression

Ispection Cost = 210 LOC

Actual Values

Effectiveness = 2 defects

⎪⎩

⎪⎨

∑∑

ActualPredessEffectiven

CostPredCostIspection min

Actual Values

⎪⎩

⎪⎨

∑∑

ActualedessEffectiven

CostPredCostIspection

Actual Values

a + b mi1 + c mi2 + …

Multi-objective Genetic Algorithm

⎪⎩

⎪⎨

∑∑

ActualedessEffectiven

CostPredCostIspection

eePred+

=a + b mi1 + c mi2 + …

Chromosome (a, b,c , …)

Fitness Function

Multiple objectives are optimized using Pareto efficient approaches

Multi-objective Genetic Algorithm

Pareto Optimality: all solutions that are not dominated by any other solutions form the Pareto optimal set.

Multiple otpimal solutions (models) can be found Cost

The frontier allows to make a

well-informed decision that

b a l a n c e s t h e t r a d e - o f f s

between the two objectives

Empirical Evaluation

Research Questions

RQ1: How does the multi-objective (MO)prediction perform,

compared to single-objective (SO) prediction

Research Questions

Cross-project MO vs. cross-project SO vs. within project SO

Research Questions

RQ2: How does the proposed approach perform, compared to the local prediction approach by Menzie et al. ?

Research Questions

RQ2: How does the proposed approach perform, compared to the local prediction approach by Menzie et al. ?

Cross-project MO vs. Local Prediction

Experiment outline •  10 java projects from PROMISE dataset ü  different sizes

ü  different context applica<on

•  10 java projects from PROMISE dataset ü  different sizes ü  different context applica<on

Experiment outline

•  Cross-projects defect prediction: ü  Training model on nine projects and test on the remaining one (10 <mes)

Experiment outline

•  Within project defect prediction: ü  10 cross-‐folder valida<on

Experiment outline

•  Within project defect prediction: ü  10 cross-‐folder valida<on

•  Local prediction: ü  K-‐means clustering algorithm ü  Silhoue]e Coefficient

Results

Log4j jEdit

Cross-project MO vs. Cross-project SO

Cross-‐project SO Cross project MO

Cross-project MO vs. Cross-project SO

Cross-‐project SO Cross project MO

The proposed multi-objective model Outperform the single-objective one

Cross-project MO vs. Within-project SO

Within project SO Cross project MO

0 10 20 30 40 50 60 70 80 90

Precision

0 10 20 30 40 50 60 70 80 90

Precision

Cross-project prediction is worse than within-project

prediction in terms of PRECISION

0 10 20 30 40 50 60 70 80 90

Precision

Cross-project prediction is worse than within-project

prediction in terms of PRECISION

But it is better than within-project predictors in term

of COST-EFFECTIVENESS

Local Predic<on Cross project MO

The multi-objective predictor outperforms the local

predictor.

Conclusions

Multi-Objective Cross-Project Defect Prediction

Presentations & Public Speaking

Transcript of Multi-Objective Cross-Project Defect Prediction

Survey on Software Defect Prediction

Heterogeneous Defect Prediction (ESEC/FSE 2015)

Personalized Defect Prediction

Defect prediction from static code features: current ...

SOFTWARE DEFECT PREDICTION USING MAXIMAL …

Evidence-based defect assessment and prediction for ...

A Critique of Software Defect Prediction Models

Defect Inflow Prediction in Large Software Projects

Defect Prediction Model for estimating Project Scheduleuploads.pnsqc.org/2016/papers/45.DefectPrediction... · Defect Prediction Model for estimating Project Schedule Rajesh Yawantikar,

Nearest neighbor, defect prediction

SAL: An Effective Method for Software Defect Prediction

Benchmarking cross-project defect prediction approaches with … · Benchmarking cross-project defect prediction approaches with costs metrics Steffen Herbold University of Goettingen,

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. 32, … Defect 200… · Index Terms—Software defect prediction, defect association, defect isolation effort, defect correction effort.

Software Defect Prediction Models for Quality …ijcsi.org/papers/IJCSI-9-5-2-288-296.pdf · Software Defect Prediction Models for ... model of defect prediction, reliability based

File-Level Defect Prediction: Unsupervised vs. Supervised Models · 2020. 7. 8. · unsupervised defect prediction models. Finally, we describe existing work on effort-aware defect

Evaluating Defect Prediction Approaches: A …users.dcc.uchile.cl/~rrobbes/p/EMSE-BugPrediction.pdfEvaluating Defect Prediction Approaches: A Benchmark ... of line-based code churn.

Defect Prediction as a Multi-Objective Optimization Problem€¦ · KEY WORDS: Defect prediction; multi-objective optimization; cost-effectiveness; cross-project defect prediction.

Introduction to Defect Prediction

Defect effort prediction models in software

Manifested flatness defect prediction