Download - Towards Building a Universal Defect Prediction Model

Transcript
Page 1: Towards Building a Universal Defect Prediction Model

Towards Building a Universal Defect Prediction Model

Feng Zhang

Audris Mockus

Iman Keivanloo

Ying Zou

Page 2: Towards Building a Universal Defect Prediction Model

2

ONE ring that rules the other rings of power.

Page 3: Towards Building a Universal Defect Prediction Model

3

A universal model that predicts defects for all the projects.

Page 4: Towards Building a Universal Defect Prediction Model

4

Most successful prediction models are within-project models

Page 5: Towards Building a Universal Defect Prediction Model

5

How about cross-project models?

Page 6: Towards Building a Universal Defect Prediction Model

6

Deriving a universal model with cross-project models?

Page 7: Towards Building a Universal Defect Prediction Model

7

Select the training set of projects like this?

Page 8: Towards Building a Universal Defect Prediction Model

8

Or select the training set of projects like this?

Page 9: Towards Building a Universal Defect Prediction Model

9

Is it still possible to build a universal model? If so, then how?

Page 10: Towards Building a Universal Defect Prediction Model

10

What context factors to consider ?

Page 11: Towards Building a Universal Defect Prediction Model

11

C++

S

C++

L

Java

S

Java

L

Steps towards building a universal model 1. Partition

C++ Java Small size

Large size

Programming languages System Size

Page 12: Towards Building a Universal Defect Prediction Model

12

C++

S

C++

L

Java

S

Java

L

Steps towards building a universal model 1. Partition

C++

S

C++

L

Java

2. Cluster

R1(x)

R1(x)

R3(x)

3. Obtain Ranking Functions

4. Rank

Using quantiles of metric values (- ∞, 10%] => level 1 (10%, 20%] => level 2

… [90%, +∞) => level 10

Java

S

Java

L

Java

Page 13: Towards Building a Universal Defect Prediction Model

13

C++

S

C++

L

Java

S

Java

L

Build a universal model 1. Partition

C++

S

C++

L

Java

2. Cluster

R1(x)

R1(x)

R3(x)

3. Obtain Ranking Functions

4. Rank

Build a universal defect prediction model using rank-transformed values.

Page 14: Towards Building a Universal Defect Prediction Model

14

Case study setup

937

461

0 200 400 600 800

Version Control System

0

200

400

600

800

1000

Using Not Using

Issue Tracking System

0

200

400

600

800

Programming languages

Page 15: Towards Building a Universal Defect Prediction Model

15

Research Questions

Page 16: Towards Building a Universal Defect Prediction Model

16

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Precision Recall AUC

Rank Transformation

Log Transformation

0.48 0.48 0.57

0.58 0.62

0.61

RQ1. Is our rank transformation good ?

Page 17: Towards Building a Universal Defect Prediction Model

17

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Precision Recall AUC

Universal Model

Within-project Model

0.45 0.48

0.58 0.63 0.64

0.62

RQ2. How good is the universal model ?

Page 18: Towards Building a Universal Defect Prediction Model

18

RQ3. Does the universal model work for external projects ?

Predict

Page 19: Towards Building a Universal Defect Prediction Model

19

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Eclipse Equinox PDE Mylyn Lucene

Universal Model

Within-project Model 0.31

0.47

0.63 0.66

0.21

0.13

Precision

0.23 0.28

0.23 0.28

RQ3. Precision comparison

Page 20: Towards Building a Universal Defect Prediction Model

20

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Eclipse Equinox PDE Mylyn Lucene

Universal Model

Within-project Model

0.57

0.79

0.54 0.61 0.61

0.34

Recall

0.47

0.72

0.42

0.60

RQ3. Recall comparison

Page 21: Towards Building a Universal Defect Prediction Model

21

0.6 0.62 0.64 0.66 0.68

0.7 0.72 0.74 0.76 0.78

0.8

Eclipse Equinox PDE Mylyn Lucene

Universal Model

Within-project Model

0.76 0.77 0.78

0.79

0.69 0.67

AUC

0.70 0.70 0.68

0.69

RQ3. AUC comparison

Page 22: Towards Building a Universal Defect Prediction Model

22

Summary