Fuzzy Set and Cache-based Approach for Bug Triaging

46
Fuzzy Set and Cache-based Approach for Bug Triaging Ahmed Y. Tamrawi Electrical and Computer Engineering Department Iowa State University 2011

description

Fuzzy Set and Cache-based Approach for Bug Triaging. Ahmed Y. Tamrawi. Electrical and Computer Engineering Department Iowa State University 2011. Software Bugs. 2. 3. 4. 5. 1. { Introduction }. Bugs can occur in any software. - PowerPoint PPT Presentation

Transcript of Fuzzy Set and Cache-based Approach for Bug Triaging

Page 1: Fuzzy Set and Cache-based Approach for Bug Triaging

Fuzzy Set and Cache-based Approach for Bug Triaging

Ahmed Y. Tamrawi

Electrical and Computer Engineering Department

Iowa State University

2011

Page 2: Fuzzy Set and Cache-based Approach for Bug Triaging

2

Software Bugs1 2 3 4 5

{ Introduction }

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

A common term used to describe a flaw, mistake, or failure in a computer system that produces an incorrect or unexpected result, or causes it to behave in unintended ways.

Definition: (Software Bug)

• Bugs can occur in any software.• Ranging from operating systems, flight auto-

pilot software, to a simple arithmetic program!

• Software bugs are costing ~60 bln US$/Y.

The term “Bug”

(September 9, 1947)

Page 3: Fuzzy Set and Cache-based Approach for Bug Triaging

3

More Bugs1 2 3 4 5

{ Introduction }

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

Page 4: Fuzzy Set and Cache-based Approach for Bug Triaging

4

Bug Repository• Software users and developers report bugs,

to allow software developers to fix them.• Bugs are reported using bug reports which

are added to an issue tracking system or bug repository.

1 2 3 4 5

{ Introduction }

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

reported storedAn interface for Bugs Repository

Bugs Repository

Page 5: Fuzzy Set and Cache-based Approach for Bug Triaging

5

• Manual bug triaging is a difficult, expensive, and lengthy process, since it needs the bug triager to manually read, analyze, and assign bug fixers for each newly reported bug.

Bug Triaging1 2 3 4 5

{ Introduction }

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

Assigning a bug to the most appropriate/capable developer who will fix it.

Definition: (Bug Triaging)

Page 6: Fuzzy Set and Cache-based Approach for Bug Triaging

6

Bug Triaging1 2 3 4 5

{ Introduction }

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

New Bug Reports

Bugs Repository

Software Developers

Bug AssignmentBug Triager

Page 7: Fuzzy Set and Cache-based Approach for Bug Triaging

7

Bug Triaging• Bug triager challenges:– Knowledge about the system/project;– Descriptiveness of bug report;– Rate of reporting bugs;– Many developers, different projects, and various

expertise!• Why not to automate the bug triaging

process?– Improve software quality;– Reduce cost and time.

1 2 3 4 5

{ Introduction }

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

Eclipse – Feb 2011

Page 8: Fuzzy Set and Cache-based Approach for Bug Triaging

8

Example

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Motivation }

Assigned to: James MoodySummary: New Repository wizard follows implementation model, not user model.Description: The new CVS Repository Connection wizard's layout is confusing. This is because it follows the implementation model of the order of elds in the full CVS location path rather than the user model...

Assigned to: James MoodySummary: Opening repository resources doesn't honor type.Description: Opening repository resource always open the default text editor and doesn'thonor any mapping between resource types and editors. As a result it is not possible to viewthe contents of an image (*.gif le) in a sensible way....

Version Control Management

(VCM)

Technical Aspect

James Moody

This aspect is concerned about various Concurrent Versions System (CVS) repository features and operations within Eclipse project.

Page 9: Fuzzy Set and Cache-based Approach for Bug Triaging

9

Technical Aspects & Terms

• A software system has many technical aspects.

• Technical aspects are described via the technical terms extracted from software artifacts.

• A bug report describes issues related to technical aspects via its terms.

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Motivation }

Page 10: Fuzzy Set and Cache-based Approach for Bug Triaging

10

Automatic Bug Triaging

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Motivation }

Who have the most bug-fixing capability/expertise with respect to the reported technical aspect(s) in a give bug report should be the fixer(s)

Key Philosophy for Automatic Bug Triaging

Page 11: Fuzzy Set and Cache-based Approach for Bug Triaging

11

Problem Definition

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{Bugzie Model }

In a software system, given a bug report B, and a set of developers D who have past fixing activity.Find the developers(s) with the most fixing expertise with respect to the reported technical aspect(s) in B.

Problem: (Automatic Bug Assignment)

Bugs RepositoryNew Bug

Report B

Software

Developers

Page 12: Fuzzy Set and Cache-based Approach for Bug Triaging

12

Bugzie Overview• Bugzie considers the problem as a ranking

problem.– State-of-the-art approaches view the problem as

a classification problem.• For a bug report, Bugzie determines a

ranked list of developers most capable toward the reported issue(s).

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{Bugzie Model }

Page 13: Fuzzy Set and Cache-based Approach for Bug Triaging

13

Bugzie Overview• Bugzie utilizes the fuzzy set theory to rank

the fixing expertise of developers toward the technical aspects.

• Bugzie models the association of a developer and technical aspects.

• If a developer has higher fixing association with a technical aspect, he will have higher expertise and rank for that aspect.

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{Bugzie Model }

Page 14: Fuzzy Set and Cache-based Approach for Bug Triaging

14

Association of Fixer & Term

• is more capable than in the issues related to t.

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{Bugzie Model }

For a technical term t, a fuzzy set Ct, with associated membership function , represents the set of developers who have the bug-fixing expertise relevant to technical aspects(s) described by t

Definition: (Capable Fixer toward a Term)

Ct

𝝁𝒕

𝟎

𝟏

Page 15: Fuzzy Set and Cache-based Approach for Bug Triaging

15

Association of Fixer & Term• The membership score of a developer d

toward a term t is:

• Dd: Bug reports d has fixed.

• Dt: Bug reports containing t.

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{Bugzie Model }

𝜇𝑡 (𝑑 )=|𝐷𝑑∩𝐷𝑡||𝐷𝑑∪𝐷𝑡|

𝜇𝑡 (𝑑)∈[0 ,1]  

𝐷𝑡D( )

𝜇𝑡 ( )=0

D( )

𝜇𝑡 ( )=1

𝐷𝑡𝐷𝑡

D( )

D( )

Page 16: Fuzzy Set and Cache-based Approach for Bug Triaging

16

Association of Fixer & Bug Report

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{Bugzie Model }

𝜇𝐵 (𝑑 )=1−∏𝑡∈ 𝐵

(1−𝜇𝑡 (𝑑))

Bug Report

(B)t1 t2 tn

𝝁𝑩

𝟎

𝟏CB

¿𝑡∈ 𝐵𝐶𝑡

Page 17: Fuzzy Set and Cache-based Approach for Bug Triaging

17

Association of Fixer & Bug Report

• In fuzzy set, union is a flexible combination.• The strong membership to a sub-fuzzy set(s)

implies the strong membership to the combined fuzzy set.

• After calculating for the developers, Bugzie recommends the top-scored ones as fixers for the bug report.

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{Bugzie Model }

Page 18: Fuzzy Set and Cache-based Approach for Bug Triaging

18

Bugzie Model

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{Bugzie Model }

4

Bug Report

(B)Pre-processing

2t1 t2 tn

𝑡𝑖∈𝐵𝑢𝑔𝑠𝑅𝑒𝑝 . 𝑡𝑒𝑟𝑚𝑠

Des

cend

ing

on

Reco

mm

enda

tion

List

Reco

mm

enda

tion

3

∀ term𝑡

Bugs Repository

Initial Training

∀ d

evel

oper

1

Updating5

Bug Report

(B)

Page 19: Fuzzy Set and Cache-based Approach for Bug Triaging

19

Bugzie Caching• Fixer candidates selection (Developers Caching).• Significant terms selection (Terms Caching).

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{Bugzie Model }

∀ term𝑡

Bugs Repository

Initial Training

∀ d

evel

oper

∀ term𝑡∈𝑇 (𝑘)

Developers Cache F(x)

Terms Cache T(k)

∀∈𝐹

(𝑥)

Page 20: Fuzzy Set and Cache-based Approach for Bug Triaging

20

Data Collection• Collected all fixed bug reports from 7 bug

repositories.• For each bug report, we extracted and

merged the summary and description.• For each system, we pre-processed these

reports: stemming, stop words removal, etc.

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

System History Range #Bug Reports #Fixers Eclipse 10-10-2001 to 10-28-2010 177,637 2,144 Firefox 04-07-1998 to 10-28-2010 188,139 3,014 Jazz 06-01-2005 to 06-01-2008 34,228 156 Gcc 08-03-1999 to 10-28-2010 19,430 293 Apache 05-10-2002 to 01-01-2011 43,162 1,695 FreeDesktop 01-09-2003 to 12-05-2010 17,084 374 NetBeans 01-01-2008 to 11-01-2010 23,522 380

1 2 3 4 5

{Bugzie Model }

System #Terms Eclipse 193,862 Firefox 177,028 Jazz 39,771 Gcc 63,013 Apache 110,231 FreeDesktop 61,773 NetBeans 42,797

Page 21: Fuzzy Set and Cache-based Approach for Bug Triaging

21

Locality of Fixing Activity

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

Timeline

Bug Report

20102009200820072006

1 2 3 4 5

{Bugzie Model }

Page 22: Fuzzy Set and Cache-based Approach for Bug Triaging

22

Locality of Fixing Activity

• If d belongs to the F(x), we count this as a hit.

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

Bug Report BFixed by d

Fixing Timeline

20102009200820072006

All Developers that have been fixing before B

Developers Cache F(x)

Recent x%

1 2 3 4 5

{Bugzie Model }

The recent fixing developers are likely to fix bug reports in the near future.

Hypothesis: (Locality of Fixing Activity)

Page 23: Fuzzy Set and Cache-based Approach for Bug Triaging

23

Locality of Fixing Activity

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{Bugzie Model }

94% - 98%

96% - 99%

Page 24: Fuzzy Set and Cache-based Approach for Bug Triaging

24

Selection of Fixer Candidates

• The locality of fixing activity suggests the actual fixer for a given bug report is likely the one having recent fixing activity.

• For each bug report, Bugzie chooses the top x% of developers sorted by their fixing time as the fixer candidates F(x).

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{Bugzie Model }

Bug Report BFixed by d

Fixing Timeline

20102009200820072006

All Developers that have been fixing before B

Developers Cache F(x)

Recent x%

Page 25: Fuzzy Set and Cache-based Approach for Bug Triaging

25

Bug Report

(B)

Developers Caching

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{Bugzie Model }

5

∀ term𝑡

Initial Training

Developers Cache F(x)

1

Bug Report

(B)Pre-processing

3t1 t2 tn

𝑡𝑖∈𝐵𝑢𝑔𝑠𝑅𝑒𝑝 . 𝑡𝑒𝑟𝑚𝑠

Des

cend

ing

on

Reco

mm

enda

tion

List

Reco

mm

enda

tion

Updating

4

Updating6 Bugs Repository

∀∈𝐹

(𝑥)

2

Page 26: Fuzzy Set and Cache-based Approach for Bug Triaging

26

Selection of Descriptive Terms

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{Bugzie Model }

System #Terms Eclipse 193,862 Firefox 177,028 Jazz 39,771 Gcc 63,013 Apache 110,231 FreeDesktop 61,773 NetBeans 42,797

RECALL :For a developer d and a term t, the higher their association score , the higher significance of t in describing the technical aspects that d has fixing expertise.

Page 27: Fuzzy Set and Cache-based Approach for Bug Triaging

27

Selection of Descriptive Terms

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{Bugzie Model }

Descending on

𝑘 (All Terms)

𝑇(𝐴𝑙𝑙𝑇𝑒𝑟𝑚𝑠)

𝑇 (𝑘)

Page 28: Fuzzy Set and Cache-based Approach for Bug Triaging

28

Terms Caching

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{Bugzie Model }

Bugs Repository

Initial Training

Terms Cache T(k)

Bug Report

(B)Pre-processing t1 t2 tn

𝑡𝑖∈𝑇 (𝑘)

Des

cend

ing

on

Reco

mm

enda

tion

List

Reco

mm

enda

tion

Updating

∀ term𝑡∈𝑇 (𝑘)

∀ d

evel

oper

s

Bug Report

(B)Updating

Page 29: Fuzzy Set and Cache-based Approach for Bug Triaging

29

Empirical Evaluation• We evaluated Bugzie on our collected

datasets.• Experiments:– Selection of fixer candidates;– Selection of terms;– Selection of developers and terms;– Comparison with state-of-the-art approaches.

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Empirical Evaluation }

System History Range #Bug Reports #Fixers Eclipse 10-10-2001 to 10-28-2010 177,637 2,144 Firefox 04-07-1998 to 10-28-2010 188,139 3,014 Jazz 06-01-2005 to 06-01-2008 34,228 156 Gcc 08-03-1999 to 10-28-2010 19,430 293 Apache 05-10-2002 to 01-01-2011 43,162 1,695 FreeDesktop 01-09-2003 to 12-05-2010 17,084 374 NetBeans 01-01-2008 to 11-01-2010 23,522 380

Page 30: Fuzzy Set and Cache-based Approach for Bug Triaging

30

Experiment Setup

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Empirical Evaluation }

Bug Report B

Creation Timeline

0 1 2 3 4 5 6 7 8 9 10

Bugzie uses frame 0 for initial training1

Using training data, Bugzie recommends a top-n developers to fix bug report B

2

Bugzie updates the training data with the tested bug report B3

Move to next Bug Report

Bug Report B

Des

cend

ing

on

Reco

mm

enda

tion

List

for B

Bugzie repeats steps 2 and 3 till it consumes all bug reports

Page 31: Fuzzy Set and Cache-based Approach for Bug Triaging

31

Prediction Accuracy• If the recommendation list for a bug report

contains its actual fixer, we count this as a hit (i.e. a correct recommendation).

• For each frame under test, we calculated Prediction Accuracy (PA).

• If we have 100 bugs and for 60 of those bugs, we could recommend the actual fixing developer is in our Top-2 list, then Top-2 prediction accuracy is 60%.

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Empirical Evaluation }

𝑃𝐴 (% )= ¿𝐻𝑖𝑡𝑠¿𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛𝐶𝑎𝑠𝑒𝑠

×100 %

Page 32: Fuzzy Set and Cache-based Approach for Bug Triaging

32

Selection of Fixer Candidates

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Empirical Evaluation }

Bug Report

(B)

5

∀ term𝑡

Initial Training

Developers Cache F(x)

1

Bug Report

(B)Pre-processing

3t1 t2 tn

𝑡𝑖∈𝐵𝑢𝑔𝑠𝑅𝑒𝑝 . 𝑡𝑒𝑟𝑚𝑠

Des

cend

ing

on

Reco

mm

enda

tion

List

Reco

mm

enda

tion

Updating

4

Updating6 Bugs Repository

∀∈𝐹

(𝑥)

2

Bug Report BFixed by d

Fixing Timeline

20102009200820072006

All Developers that have been fixing before B

Developers Cache F(x)

Recent x%

Page 33: Fuzzy Set and Cache-based Approach for Bug Triaging

33

Selection of Fixer Candidates

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Empirical Evaluation }

Top-1 Prediction Accuracy

Top-5 Prediction Accuracy

Firefox ( ):At x = 10%, PA = 72.4%At x = 100%, PA = 70.7%

Page 34: Fuzzy Set and Cache-based Approach for Bug Triaging

34

• Selecting a suitable portion of recent fixers does not lessen much the accuracy, and sometimes improves it as in the cases of Firefox, Eclipse, etc.

• Selecting only a portion of available developers as candidates also improves time efficiency.

Selection of Fixer Candidates

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Empirical Evaluation }

Page 35: Fuzzy Set and Cache-based Approach for Bug Triaging

35

Selection of Terms

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Empirical Evaluation }

5

Bugs Repository

Initial Training

Terms Cache T(k)

1

2

Bug Report

(B)Pre-processing

3t1 t2 tn

𝑡𝑖∈𝑇 (𝑘)

Des

cend

ing

on

Reco

mm

enda

tion

List

Reco

mm

enda

tion

Updating

4

∀ term𝑡∈𝑇 (𝑘)

∀ d

evel

oper

s

Bug Report

(B)Updating 6

Page 36: Fuzzy Set and Cache-based Approach for Bug Triaging

36

Selection of Terms

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Empirical Evaluation }

Top-1 Prediction Accuracy

Top-5 Prediction Accuracy

Peak Range Peak Range

Eclipse( ):At k = 16, PA = 80%At k = All Terms, PA = 72%

Page 37: Fuzzy Set and Cache-based Approach for Bug Triaging

37

• Selection of terms could improve much the prediction accuracy.

• The results suggest that one just needs a small yet significant set of terms for each developer to describe his bug-fixing expertise.

Selection of Terms

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Empirical Evaluation }

Page 38: Fuzzy Set and Cache-based Approach for Bug Triaging

38

Selection of Developers & Terms

• To study the impact of both developers selection (x) and terms selection (k).

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Empirical Evaluation }

Eclipse

Firefox

Page 39: Fuzzy Set and Cache-based Approach for Bug Triaging

39

Selection of Developers & Terms

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Empirical Evaluation }

Base: Base model with all developers and all termsC.S.: Candidate SelectionT.S.: Terms SelectionBoth: The best PA when applying both C.S. and T.S.

Page 40: Fuzzy Set and Cache-based Approach for Bug Triaging

40

Comparison• We compared Bugzie Results with state-of-

the-art approaches.

• Used Weka to re-implement those approaches

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Empirical Evaluation }

Approach Papers

Naïve Bayes (NB) Cubranic & Murphy[1] Anvik et al.[2] Bhattacharya & Neamtiu[3]

Bayesian Networks (BN) Bhattacharya & Neamtiu Inc. Naïve Bayes (InB) Bhattacharya & Neamtiu Inc. Bayesian Networks (InBN) Bhattacharya & Neamtiu Support Vector Machine (SVM) Anvik et al. Vector Space Model (VSM) Matter et al.[4] C4.5 (Decision Trees) Anvik et al.

Page 41: Fuzzy Set and Cache-based Approach for Bug Triaging

41

Comparison• Some of the approaches (C4.5 - Decision

Trees) can not scale up well to our dataset.• We prepared smaller dataset:

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Empirical Evaluation }

System History Range #Bug Reports #Fixers #Terms Eclipse 01-01-2008 to 10-28-2010 69,829 1,510 103,690 Firefox 01-01-2008 to 10-28-2010 77,236 1,682 85,951 Jazz 06-01-2005 to 06-01-2008 34,228 156 39,771 Gcc 01-01-2008 to 10-28-2010 6,865 161 20,279 Apache 01-01-2008 to 01-01-2011 28,682 1,354 80,757 FreeDesktop 01-01-2008 to 12-05-2010 10,624 161 37,596 NetBeans 01-01-2008 to 11-01-2010 23,522 380 42,797

3-Year Histories of the full dataset

Page 42: Fuzzy Set and Cache-based Approach for Bug Triaging

42

Comparison Results

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Empirical Evaluation }

(d) days, (h) hours, (m) minutes, (s) seconds

Page 43: Fuzzy Set and Cache-based Approach for Bug Triaging

43

Conclusions• Bugzie achieves higher accuracy and

efficiency than state-of-the-art approaches.• Bugzie can accommodate the locality of

fixing activity and software evolution with flexible caching of developers and terms.

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Conclusions}

Page 44: Fuzzy Set and Cache-based Approach for Bug Triaging

44

Thesis Contributions • Bugzie, a scalable, fuzzy set and cache-based

automatic bug triaging approach, which is significantly more efficient and accurate than existing state-of-the-art approaches.

• The finding of the locality of fixing activity.• A comprehensive evaluation on the efficiency and

correctness of Bugzie in comparison with state-of-the-art approaches.

• An observation/method to capture a small and significant set of terms describing developers’ bug-fixing expertise.

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Conclusions}

Page 45: Fuzzy Set and Cache-based Approach for Bug Triaging

45

Future Work• Use different caching mechanisms for

developers and terms.• Explore the usage of other textual and non-

textual contents of bug reports for bug triaging.

• Use other software artifacts to accurately measure the developer’s expertise.

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging

1 2 3 4 5

{ Conclusions}

Page 46: Fuzzy Set and Cache-based Approach for Bug Triaging

47

Thank You!

Iowa State University Fuzzy Set and Cache-based Approach for Bug Triaging