Revisiting the Experimental Design Choices for Approaches ...€¦ · Ph.D. : Mohamed Sami Rakha...

Revisiting the Experimental Design Choices for Approaches for the Automated Retrieval

of Duplicate Issue Reports

Ph.D. : Mohamed Sami Rakha

Supervisor: Prof. Ahmed E. Hassan

Ph.D. Thesis Defense

Software issues are common

There are many issue tracking systems

Issue tracking systems receive a large number of issue reports

New Issue Reports

Over 1.2 MillionIssue Reports

Different persons may report the same issue

User #2

User #1

Different persons may report the same issue

User #2

User #1

Duplicate

Master

The identification of duplicate issue reports is an annoying and timewasting task

User #2

User #1

Duplicate

20-30% Duplicates! Master

Bettenburg et al. Duplicate bug reports considered harmful… really? [ICSM 2008:]

Information retrieval is leveraged by prior research

Duplicate

AutomatedApproach

Search

Most similar previously reported issues

Ranked list

Master

Developer choose the accurate candidate for the newly reported duplicate… If there is any

Duplicate

AutomatedApproach

Search

Most similar previously reported issues

Ranked list

Developer

Master

Several automated approaches for duplicates retrieval are proposed by prior research

Prior Research

2007: Runeson et al.

2008: Wang et al.

2010: Sun et al.

2010: Sureka et al.

2011: Sun et al.

2012: Zhou al.

2012: Nguyen al.

2015: Aggarwal al.

2016: Hindle al.2016: Jie et al.

Prior research focuses on improving the performance

AutomatedApproach

Performance

Recall Rate

Duplicate

The goal of newly proposed approach is to have higher performance than prior ones

Prior Research

2008: Wang et al.

2010: Sun et al.

2010: Sureka et al.

2011: Sun et al.

2012: Zhou al.

2012: Nguyen al.

2015: Aggarwal al.

In my thesis, I revisit the experimental design choices from four perspectives

AutomatedApproach

Performance

Recall Rate

Duplicate

The goal of newly proposed approach is to have better performance than prior ones

A BRecall Rate = 0.6 Recall Rate = 0.5

>Example:

Prior Research

2008: Wang et al.

2010: Sun et al.

2010: Sureka et al.

2011: Sun et al.

2012: Zhou al.

2012: Nguyen al.

2015: Aggarwal al.

Revisiting the experimental design choices can help better understand the benefits and limitations of prior approaches evaluation.

Needed Effort

Data Filtration

Data Changes

Evaluation Process

Evaluation Process Data Filtration Data ChangesNeeded Effort

Prior research treats all duplicate reports equally

Duplicate reports differ based on the effortneeded for their manual identification

Real Example: Duplicate (B) needs more much effort to be identified

Timereported

Identified

15 minutes

Timereported

Identified

A BIdentified

15 minutes

44 days

Timereported

A duplicate can be hard or easybased on the effort needed for identification

HARDEASY

Within 1 Day

WithoutDiscussions

With onePerson

50%EASY HARD

Studied Systems:

Random Forest Model“Binary Classification”

AUCs0.68 – 0.8

50%50%EASY HARD

Random Forest Model“Binary Classification”

AUCs0.68 – 0.8

50%50%EASY HARD

This model is used to explain the most important factors on the effort needed for identification

Factors such as Description Similarity is one of the top important ones

Thesis Contribution #1:Show the importance of considering the neededeffort for retrieving duplicate reports in theperformance measurement of automated duplicateretrieval approaches

[Published in EMSE 2015]

Data ChangesData FiltrationNeeded Effort Evaluation Process

Testing period (1 Year)

AA Time

Not considered in the evaluation

Many prior research evaluate their approaches on short testing periods

AA B Time

Testing period (1 Year)Not considered in the evaluation

Many prior research evaluate their approaches on short testing periods

AA BB Time

Duplicate (B) match is outside the testing period

TimeB AA B

Ignored

Duplicate (B) match is outside the testing period

TimeB AA B

Ignored

Testing period (1 Year)Not considered in the evaluation

ClassicalEvaluation

I refer to this evaluation by “Classical Evaluation”

ClassicalEvaluationRealisticEvaluation

Instead, I propose “Realistic Evaluation”

AB A B

Mozilla 2010

Eclipse2008

OpenOffice 2008 - 2010

Problem #1

Number of considered duplicates

Problem #2

Performance overestimation

I compare the two evaluations on frequently used testing data in prior research

Classical evaluation is missing 23.5% of

duplicates 10,043 Duplicates

7,551 Duplicates

Mozilla 2010

Problem #1

Testing period

Recall rate is overestimated by 39%

for REP approach Recall: 0.43

Recall: 0.58

Mozilla 2010

Problem #2

Testing period

Mozilla 2010

Eclipse2008

Problem #1

Problem #2

Mozilla 2010

Eclipse2008

Problem #1

Problem #2

Is the difference in performance between the two evaluations generalize to other testing periods within the tested ITSs

Randomly select 100 chunks as “Testing Periods”

Studying the statistical difference between the classical and realistic

evaluations

Chunk = 1 Year Time

Chunk 1 (Dec2004 – Dec2005)

Chunk 2 (Nov2003 – Nov2004)

Chunk 100 (Sep2007 – Sep2008)

Results 1

Results 2

Results 100

AutomatedApproach

Classical evaluation significantly overestimates the performance of the automated approaches (p-value<0.001)

Thesis Contribution #2:Propose a realistic evaluation for the automatedretrieval of duplicate reports, and analyze priorfindings using such realistic evaluation

[Published in TSE 2017]

Data ChangesNeeded Effort Evaluation Process Data Filtration

RealisticEvaluation

Do we need all these issue reports to search in?

Filtering out old issue reports should lead to reduced search time

No Duplicates are Ignored

RealisticEvaluation

The possible solution is to filter out the old issue reports using a time window based on the report date

No Duplicates are Ignored

RealisticEvaluation

2 Years

Example (1)

RealisticEvaluation

4 months

Example (2)

RealisticEvaluation

Results of the limitation by (n) months

It is not possible to limit the issue reports that are searched without decreasing

the performance of automated approaches

A B Time

Testing periodreported

A B Time

(A) is resolved 1

month ago

A B Time

(A) is resolved 1

month ago(B) is resolved 2

years ago

A B Time

Testing period

A new duplicate is more likely

to be duplicate of (A)

(A) is resolved 1

month ago(B) is resolved 2

years ago

reported

I propose filtration approach based on

time threshold from resolution date for

each resolution valueand rank

The thresholds are optimized using

Genetic Algorithm (NSGA-II)

The proposed filtration approach

improves the performance by up

to 60%

Thesis Contribution #3:Propose a genetic algorithm based approach forfiltering out the old issue reports to improve theperformance of duplicate retrieval approaches.

Needed Effort Evaluation Process Data Filtration Data Changes

Just-in-time (JIT) duplicate retrieval feature is introduced

Significantly lessduplicates but more

identification effort is needed

AutomatedApproach

How the JIT duplicate retrieval feature impact the performance evaluation of the automated approaches?

Approach (1): BM25F Approach (2): REP

The performance of both approaches is significantly lower for after-JIT duplicates

Firefox

The relative improvement of the performance of REPover BM25F is significantly larger after the activation of the JIT

Firefox

Thesis Contribution #4:Highlight the impact of the “JIT” feature on theperformance evaluation of duplicate retrievalapproaches

[Submitted to EMSE] – under review

Summary

Prior research focuses on improving the performance

AutomatedApproach

Performance

Recall Rate

Duplicate

The goal of newly proposed approach is to have higher performance than prior ones

Prior Research

2008: Wang et al.

2010: Sun et al.

2010: Sureka et al.

2011: Sun et al.

2012: Zhou al.

2012: Nguyen al.

2015: Aggarwal al.

In my thesis, I revisit the experimental design choices from four perspectives

AutomatedApproach

Performance

Recall Rate

Duplicate

The goal of newly proposed approach is to have better performance than prior ones

A BRecall Rate = 0.6 Recall Rate = 0.5

>Example:

Prior Research

2008: Wang et al.

2010: Sun et al.

2010: Sureka et al.

2011: Sun et al.

2012: Zhou al.

2012: Nguyen al.

2015: Aggarwal al.

Revisiting the experimental design choices can help better understand the benefits and limitations of prior approaches evaluation.

Needed Effort

Data Filtration

Data Changes

Evaluation Process

Thesis Contribution #1:Show the importance of considering the neededeffort for retrieving duplicate reports in theperformance measurement of automated duplicateretrieval approaches

[Published in EMSE 2015]

Thesis Contribution #2:Propose a realistic evaluation for the automatedretrieval of duplicate reports, and analyze priorfindings using such realistic evaluation

Thesis Contribution #3:Propose a genetic algorithm based approach forfiltering out the old issue reports to improve theperformance of duplicate retrieval approaches

Thesis Contribution #4:Highlight the impact of the “JIT” feature on theperformance evaluation of duplicate retrievalapproaches

[EMSE 2017] – under review

Summary

Revisiting the Experimental Design Choices for Approaches ...€¦ · Ph.D. : Mohamed Sami Rakha...

Documents

Transcript of Revisiting the Experimental Design Choices for Approaches ...€¦ · Ph.D. : Mohamed Sami Rakha...

rakha fahreza

Sami catalogue

Mathoot Pro. Rakha Jatev

Laporan Project Akhir Rakha Widiyanto

ﺎﻫﺎﻘﻟﺃ - Marwa Rakha

Sami Tenori

The limping child dr. ibrahim rakha

By Mohamed Rakha, Jay Scott, Samuel Hutton, Hugh Smithtgc.ifas.ufl.edu/Presentations/7 Mohamed Breeding-some changes.pdf · Mohamed Rakha, Jay Scott, Samuel Hutton, Hugh Smith 43rd

Sami solangi

Catálogo sami

Sami A. Metwally , Gamil E. Ibrahim , Bedour H. Abo-Leila ...Corresponding Author: Dr. Sami A. Metwally* Ph.D. Ornamental Plants And Woody Trees Dept., National Research Center, Dokki,

Monazra rasheedia (rakha )

Curriculum vitae Sami Ahmed Khalid, Ph.D. - snas.org.sd · Curriculum vitae Sami Ahmed Khalid, Ph ... 1999-2000 Dean of Academic Research and Editor ... Research Initiative on Traditional

1 Distributed Algorithmic Mechanism Design for Network Problems Rahul Sami Ph.D. Dissertation Defense Advisor: Joan Feigenbaum Committee: Ravindran Kannan.

Rakha and Crowther 2002

Sami Reindeer Herders’ Snow Terminology in a Changing Climate Inger Marie Gaup Eira Ph.D. Student Sami University College, Kautokeino, Norway & Reindeer.

By Dr. Ahmed Rakha

Anna Rita Spein, M.D., Ph.D. Center for Sami Health Research, Karasjok.

Sistemas sami

Sami Lakkis