WCRE11b.ppt

Requirements Traceability for Object Oriented Requirements Traceability for Object Oriented

Systems by Partitioning Source CodeSystems by Partitioning Source CodeSystems by Partitioning Source CodeSystems by Partitioning Source Code

WCRE 2011, Limerick, IrelandWCRE 2011, Limerick, Ireland

Nasir Ali, Yann-Gaël Guéhéneuc, and Giuliano Antoniol

Requirements Traceability

Requirements traceability is defined as “the

ability to describe and follow the life of a

requirement, in both a forwards and backwards

direction” direction” [Gotel, 1994]

2WCRE 2011

What’s Requirements Traceability Good For?

�Program Comprehension

�Discover what code must change to handle a

new requirementnew requirement

�Aid in determining whether a specification is

completely implemented

3WCRE 2011

IR-based Approaches

• Vector Space Model (Antoniol et al. 2002)

• Latent Semantic Indexing (Marcus and Maletic, 2003)

• Jensen Shannon Divergence (Abadi et al. 2008)

• Latent Dirichlet Allocation (Asuncion, 2010)

4WCRE 2011

Problem in IR-based Approaches

Requirement

5WCRE 2011

Goal

• Reduce manual effort required to verify false-

positive links

• Increase F-measure• Increase F-measure

6WCRE 2011

Coparvo - COde PARtitioning and VOting

1. Partitioning source code

2. Defining experts

7WCRE 2011

2. Defining experts

3. Link recovery and expert voting

Partitioning Source Code

Class Name

8WCRE 2011

Method Name

Variable Name

Comments

Defining Experts

Class Name A

Class Name B

Merged Class Names------------------------------------

Class Name A

Class Name B

Class Name C

9WCRE 2011

Class Name C

Class Name D

Class Name C

Class Name D

Performed same step for method, variable names, comments, and requirements

Defining Experts (Cont.)

Merged Class Names Merged Requirements------------------------------------

Requirement 1

Requirement 1

Merged Method Names

20%

70%

10WCRE 2011

Requirement 1

……….

……

Requirement N

Merged Variable Names

Merged Comments

40%

60%

Defining Experts (Cont.)

Method Name

Comments

70%

60%

11WCRE 2011

Variable Names

Class Names

40%

20%

Extreme Cases:

•5% difference in two experts

•95% difference in two experts

Link Recovery and Expert Voting

Class A Requirements------------------------------------

Email client must

support pop3

12WCRE 2011

support pop3

integration……….

Method Names of Class A

Comments of Class A

Case Studies

• Goal: Investigate the effectiveness of Coparvo in improving the accuracy of VSM and reducing the effort required to manually discard false-positive links

• Quality focus: Ability to recover traceability links

13WCRE 2011

• Quality focus: Ability to recover traceability links between requirements and source code

• Context: Recovering requirements traceability links of three open-source programs, Pooka, SIP, and iTrust

Research Questions

R01: How does Coparvo help to find valuable partitions of source code that help in recovering traceability links?

R02: How much Coparvo helps to reduce the effort required

14

R02: How much Coparvo helps to reduce the effort required to manually verify recovered traceability links?

R03: How does the F-measure value of the traceability links recovered by Coparvo compare with a traditional VSM-based approach?

WCRE 2011

Datasets

SIP Communicator: Voice over IP and instate messenger

Pooka: An email Client

iTrust: Medical Application

15

Pooka SIP Communicator iTrust

Version 2.0 1.0 10

Number of Classes 298 1,771 526

Number of Methods 20,868 31,502 3,404

LOC 244K 487K 19K

WCRE 2011

IR Quality Measures

16WCRE 2011

callecision

callecisionF

RePr

RePr2

+

××=

Source Code Partitions

1.Class name

1.Method name

17

2.Variable name

3.Comments

WCRE 2011

Text Preprocessing

• Filter (#43@$)

18

• Stop words (the, is, an….)

• Stemmer (attachment, attached -> attach)

WCRE 2011

Information Retrieval (IR) Methods

• Vector Space Model (VSM)

– Each document, d, is represented by a vector of ranks of

the terms in the vocabulary:

vd = [rd(w1), rd(w2), …, rd(w|V|)]

– The query is similarly represented by a vector– The query is similarly represented by a vector

– The similarity between the query and document is the

cosine of the angle between their respective vectors

19WCRE 2011

Defining Expert

40

50

60

CN

20WCRE 2011

0

10

20

30

Pooka SIP iTrust

MN

VN

Cmt

Pooka Results

21WCRE 2011

SIP Comm. Results

22WCRE 2011

iTrust Results

23WCRE 2011

Voting vs. Combination

• Can we only use different combinations of source code partitions to create requirements traceability links?

24WCRE 2011

• How much a combination of source code improves the F-measure?

Pooka Results

25WCRE 2011

SIP Comm. Results

26WCRE 2011

iTrust Results

27WCRE 2011

Statistical Tests

Non-parametric test – Mann-Whitney test

28WCRE 2011

F-measure

Pooka SIP Comm. iTrust

P-value p<0.01 p<0.01 p<0.01

Effort Analysis

40,000

50,000

60,000

70,000

80,000

90,000

VSM

29WCRE 2011

0

10,000

20,000

30,000

40,000


Coparvo

Effort Analysis (F-Measure)

8

10

12

14

VSM

30WCRE 2011

0

2

4

6


VSM

Coparvo

RQ Answers

R01: Combinations or single source-code partitions also sometime provides better results than Coparvo

R02: Using different source of information reduces experts’ effort up to 83%experts’ effort up to 83%

R03: Partitioning source code and using the partitions as experts for voting yields better accuracy

31WCRE 2011

Threats to Validity

• External validity:

• We analyzed only three systems

• Different source code size

• Construct validity:

• The two researchers built both oracles

• Oracles were validated by the other two experts

• iTrust oracle was developed by developer(s)

• Conclusion validity: Non-parametric test

• Tool is online at www.factrace.net

32WCRE 2011

Ongoing work

�More IR approaches

�Empirical study

�Threshold

33WCRE 2011

Questions?

34WCRE 2011

WCRE11b.ppt

Technology

Transcript of WCRE11b.ppt