Towards Building a Universal Defect Prediction Model

22
Towards Building a Universal Defect Prediction Model Feng Zhang Audris Mockus Iman Keivanloo Ying Zou

description

To predict files with defects, a suitable prediction model must be built for a software project from either itself (withinproject) or other projects (cross-project). A universal defect prediction model that is built from the entire set of diverse projects would relieve the need for building models for an individual project. A universal model could also be interpreted as a basic relationship between software metrics and defects. However, the variations in the distribution of predictors pose a formidable obstacle to build a universal model. Such variations exist among projects with different context factors (e.g., size and programming language). To overcome this challenge, we propose context-aware rank transformations for predictors. We cluster projects based on the similarity of the distribution of 26 predictors, and derive the rank transformations using quantiles of predictors for a cluster. We then fit the universal model on the transformed data of 1,398 open source projects hosted on SourceForge and GoogleCode. Adding context factors to the universal model improves the predictive power. The universal model obtains prediction performance comparable to the within-project models and yields similar results when applied on five external projects (one Apache and four Eclipse projects). These results suggest that a universal defect prediction model may be an achievable goal.

Transcript of Towards Building a Universal Defect Prediction Model

Page 1: Towards Building a Universal Defect Prediction Model

Towards Building a Universal Defect Prediction Model

Feng Zhang

Audris Mockus

Iman Keivanloo

Ying Zou

Page 2: Towards Building a Universal Defect Prediction Model

2

ONE ring that rules the other rings of power.

Page 3: Towards Building a Universal Defect Prediction Model

3

A universal model that predicts defects for all the projects.

Page 4: Towards Building a Universal Defect Prediction Model

4

Most successful prediction models are within-project models

Page 5: Towards Building a Universal Defect Prediction Model

5

How about cross-project models?

Page 6: Towards Building a Universal Defect Prediction Model

6

Deriving a universal model with cross-project models?

Page 7: Towards Building a Universal Defect Prediction Model

7

Select the training set of projects like this?

Page 8: Towards Building a Universal Defect Prediction Model

8

Or select the training set of projects like this?

Page 9: Towards Building a Universal Defect Prediction Model

9

Is it still possible to build a universal model? If so, then how?

Page 10: Towards Building a Universal Defect Prediction Model

10

What context factors to consider ?

Page 11: Towards Building a Universal Defect Prediction Model

11

C++

S

C++

L

Java

S

Java

L

Steps towards building a universal model 1. Partition

C++ Java Small size

Large size

Programming languages System Size

Page 12: Towards Building a Universal Defect Prediction Model

12

C++

S

C++

L

Java

S

Java

L

Steps towards building a universal model 1. Partition

C++

S

C++

L

Java

2. Cluster

R1(x)

R1(x)

R3(x)

3. Obtain Ranking Functions

4. Rank

Using quantiles of metric values (- ∞, 10%] => level 1 (10%, 20%] => level 2

… [90%, +∞) => level 10

Java

S

Java

L

Java

Page 13: Towards Building a Universal Defect Prediction Model

13

C++

S

C++

L

Java

S

Java

L

Build a universal model 1. Partition

C++

S

C++

L

Java

2. Cluster

R1(x)

R1(x)

R3(x)

3. Obtain Ranking Functions

4. Rank

Build a universal defect prediction model using rank-transformed values.

Page 14: Towards Building a Universal Defect Prediction Model

14

Case study setup

937

461

0 200 400 600 800

Version Control System

0

200

400

600

800

1000

Using Not Using

Issue Tracking System

0

200

400

600

800

Programming languages

Page 15: Towards Building a Universal Defect Prediction Model

15

Research Questions

Page 16: Towards Building a Universal Defect Prediction Model

16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Precision Recall AUC

Rank Transformation

Log Transformation

0.48 0.48 0.57

0.58 0.62

0.61

RQ1. Is our rank transformation good ?

Page 17: Towards Building a Universal Defect Prediction Model

17

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Precision Recall AUC

Universal Model

Within-project Model

0.45 0.48

0.58 0.63 0.64

0.62

RQ2. How good is the universal model ?

Page 18: Towards Building a Universal Defect Prediction Model

18

RQ3. Does the universal model work for external projects ?

Predict

Page 19: Towards Building a Universal Defect Prediction Model

19

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Eclipse Equinox PDE Mylyn Lucene

Universal Model

Within-project Model 0.31

0.47

0.63 0.66

0.21

0.13

Precision

0.23 0.28

0.23 0.28

RQ3. Precision comparison

Page 20: Towards Building a Universal Defect Prediction Model

20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Eclipse Equinox PDE Mylyn Lucene

Universal Model

Within-project Model

0.57

0.79

0.54 0.61 0.61

0.34

Recall

0.47

0.72

0.42

0.60

RQ3. Recall comparison

Page 21: Towards Building a Universal Defect Prediction Model

21

0.6 0.62 0.64 0.66 0.68

0.7 0.72 0.74 0.76 0.78

0.8

Eclipse Equinox PDE Mylyn Lucene

Universal Model

Within-project Model

0.76 0.77 0.78

0.79

0.69 0.67

AUC

0.70 0.70 0.68

0.69

RQ3. AUC comparison

Page 22: Towards Building a Universal Defect Prediction Model

22

Summary