WCRE11b.ppt

34
Requirements Traceability for Object Oriented Requirements Traceability for Object Oriented Systems by Partitioning Source Code Systems by Partitioning Source Code Systems by Partitioning Source Code Systems by Partitioning Source Code WCRE 2011, Limerick, Ireland WCRE 2011, Limerick, Ireland Nasir Ali , Yann-Gaël Guéhéneuc, and Giuliano Antoniol

Transcript of WCRE11b.ppt

Page 1: WCRE11b.ppt

Requirements Traceability for Object Oriented Requirements Traceability for Object Oriented

Systems by Partitioning Source CodeSystems by Partitioning Source CodeSystems by Partitioning Source CodeSystems by Partitioning Source Code

WCRE 2011, Limerick, IrelandWCRE 2011, Limerick, Ireland

Nasir Ali, Yann-Gaël Guéhéneuc, and Giuliano Antoniol

Page 2: WCRE11b.ppt

Requirements Traceability

Requirements traceability is defined as “the

ability to describe and follow the life of a

requirement, in both a forwards and backwards

direction” direction” [Gotel, 1994]

2WCRE 2011

Page 3: WCRE11b.ppt

What’s Requirements Traceability Good For?

�Program Comprehension

�Discover what code must change to handle a

new requirementnew requirement

�Aid in determining whether a specification is

completely implemented

3WCRE 2011

Page 4: WCRE11b.ppt

IR-based Approaches

• Vector Space Model (Antoniol et al. 2002)

• Latent Semantic Indexing (Marcus and Maletic, 2003)

• Jensen Shannon Divergence (Abadi et al. 2008)

• Latent Dirichlet Allocation (Asuncion, 2010)

4WCRE 2011

Page 5: WCRE11b.ppt

Problem in IR-based Approaches

Requirement

5WCRE 2011

Page 6: WCRE11b.ppt

Goal

• Reduce manual effort required to verify false-

positive links

• Increase F-measure• Increase F-measure

6WCRE 2011

Page 7: WCRE11b.ppt

Coparvo - COde PARtitioning and VOting

1. Partitioning source code

2. Defining experts

7WCRE 2011

2. Defining experts

3. Link recovery and expert voting

Page 8: WCRE11b.ppt

Partitioning Source Code

Class Name

8WCRE 2011

Method Name

Variable Name

Comments

Page 9: WCRE11b.ppt

Defining Experts

Class Name A

Class Name B

Merged Class Names------------------------------------

Class Name A

Class Name B

Class Name C

9WCRE 2011

Class Name C

Class Name D

Class Name C

Class Name D

Performed same step for method, variable names, comments, and requirements

Page 10: WCRE11b.ppt

Defining Experts (Cont.)

Merged Class Names Merged Requirements------------------------------------

Requirement 1

Requirement 1

Merged Method Names

20%

70%

10WCRE 2011

Requirement 1

……….

……

Requirement N

Merged Variable Names

Merged Comments

40%

60%

Page 11: WCRE11b.ppt

Defining Experts (Cont.)

Method Name

Comments

70%

60%

11WCRE 2011

Variable Names

Class Names

40%

20%

Extreme Cases:

•5% difference in two experts

•95% difference in two experts

Page 12: WCRE11b.ppt

Link Recovery and Expert Voting

Class A Requirements------------------------------------

Email client must

support pop3

12WCRE 2011

support pop3

integration……….

Method Names of Class A

Comments of Class A

Page 13: WCRE11b.ppt

Case Studies

• Goal: Investigate the effectiveness of Coparvo in improving the accuracy of VSM and reducing the effort required to manually discard false-positive links

• Quality focus: Ability to recover traceability links

13WCRE 2011

• Quality focus: Ability to recover traceability links between requirements and source code

• Context: Recovering requirements traceability links of three open-source programs, Pooka, SIP, and iTrust

Page 14: WCRE11b.ppt

Research Questions

R01: How does Coparvo help to find valuable partitions of source code that help in recovering traceability links?

R02: How much Coparvo helps to reduce the effort required

14

R02: How much Coparvo helps to reduce the effort required to manually verify recovered traceability links?

R03: How does the F-measure value of the traceability links recovered by Coparvo compare with a traditional VSM-based approach?

WCRE 2011

Page 15: WCRE11b.ppt

Datasets

SIP Communicator: Voice over IP and instate messenger

Pooka: An email Client

iTrust: Medical Application

15

Pooka SIP Communicator iTrust

Version 2.0 1.0 10

Number of Classes 298 1,771 526

Number of Methods 20,868 31,502 3,404

LOC 244K 487K 19K

WCRE 2011

Page 16: WCRE11b.ppt

IR Quality Measures

16WCRE 2011

callecision

callecisionF

RePr

RePr2

+

××=

Page 17: WCRE11b.ppt

Source Code Partitions

1.Class name

1.Method name

17

2.Variable name

3.Comments

WCRE 2011

Page 18: WCRE11b.ppt

Text Preprocessing

• Filter (#43@$)

18

• Stop words (the, is, an….)

• Stemmer (attachment, attached -> attach)

WCRE 2011

Page 19: WCRE11b.ppt

Information Retrieval (IR) Methods

• Vector Space Model (VSM)

– Each document, d, is represented by a vector of ranks of

the terms in the vocabulary:

vd = [rd(w1), rd(w2), …, rd(w|V|)]

– The query is similarly represented by a vector– The query is similarly represented by a vector

– The similarity between the query and document is the

cosine of the angle between their respective vectors

19WCRE 2011

Page 20: WCRE11b.ppt

Defining Expert

40

50

60

CN

20WCRE 2011

0

10

20

30

Pooka SIP iTrust

MN

VN

Cmt

Page 21: WCRE11b.ppt

Pooka Results

21WCRE 2011

Page 22: WCRE11b.ppt

SIP Comm. Results

22WCRE 2011

Page 23: WCRE11b.ppt

iTrust Results

23WCRE 2011

Page 24: WCRE11b.ppt

Voting vs. Combination

• Can we only use different combinations of source code partitions to create requirements traceability links?

24WCRE 2011

• How much a combination of source code improves the F-measure?

Page 25: WCRE11b.ppt

Pooka Results

25WCRE 2011

Page 26: WCRE11b.ppt

SIP Comm. Results

26WCRE 2011

Page 27: WCRE11b.ppt

iTrust Results

27WCRE 2011

Page 28: WCRE11b.ppt

Statistical Tests

Non-parametric test – Mann-Whitney test

28WCRE 2011

F-measure

Pooka SIP Comm. iTrust

P-value p<0.01 p<0.01 p<0.01

Page 29: WCRE11b.ppt

Effort Analysis

40,000

50,000

60,000

70,000

80,000

90,000

VSM

29WCRE 2011

0

10,000

20,000

30,000

40,000

Pooka SIP Comm. iTrust

Coparvo

Page 30: WCRE11b.ppt

Effort Analysis (F-Measure)

8

10

12

14

VSM

30WCRE 2011

0

2

4

6

Pooka SIP Comm. iTrust

VSM

Coparvo

Page 31: WCRE11b.ppt

RQ Answers

R01: Combinations or single source-code partitions also sometime provides better results than Coparvo

R02: Using different source of information reduces experts’ effort up to 83%experts’ effort up to 83%

R03: Partitioning source code and using the partitions as experts for voting yields better accuracy

31WCRE 2011

Page 32: WCRE11b.ppt

Threats to Validity

• External validity:

• We analyzed only three systems

• Different source code size

• Construct validity:

• The two researchers built both oracles

• Oracles were validated by the other two experts

• iTrust oracle was developed by developer(s)

• Conclusion validity: Non-parametric test

• Tool is online at www.factrace.net

32WCRE 2011

Page 33: WCRE11b.ppt

Ongoing work

�More IR approaches

�Empirical study

�Threshold

33WCRE 2011

Page 34: WCRE11b.ppt

Questions?

34WCRE 2011