Post on 21-Jun-2015
Requirements Traceability for Object Oriented Requirements Traceability for Object Oriented
Systems by Partitioning Source CodeSystems by Partitioning Source CodeSystems by Partitioning Source CodeSystems by Partitioning Source Code
WCRE 2011, Limerick, IrelandWCRE 2011, Limerick, Ireland
Nasir Ali, Yann-Gaël Guéhéneuc, and Giuliano Antoniol
Requirements Traceability
Requirements traceability is defined as “the
ability to describe and follow the life of a
requirement, in both a forwards and backwards
direction” direction” [Gotel, 1994]
2WCRE 2011
What’s Requirements Traceability Good For?
�Program Comprehension
�Discover what code must change to handle a
new requirementnew requirement
�Aid in determining whether a specification is
completely implemented
3WCRE 2011
IR-based Approaches
• Vector Space Model (Antoniol et al. 2002)
• Latent Semantic Indexing (Marcus and Maletic, 2003)
• Jensen Shannon Divergence (Abadi et al. 2008)
• Latent Dirichlet Allocation (Asuncion, 2010)
4WCRE 2011
Problem in IR-based Approaches
Requirement
5WCRE 2011
Goal
• Reduce manual effort required to verify false-
positive links
• Increase F-measure• Increase F-measure
6WCRE 2011
Coparvo - COde PARtitioning and VOting
1. Partitioning source code
2. Defining experts
7WCRE 2011
2. Defining experts
3. Link recovery and expert voting
Partitioning Source Code
Class Name
8WCRE 2011
Method Name
Variable Name
Comments
Defining Experts
Class Name A
Class Name B
Merged Class Names------------------------------------
Class Name A
Class Name B
Class Name C
9WCRE 2011
Class Name C
Class Name D
Class Name C
Class Name D
Performed same step for method, variable names, comments, and requirements
Defining Experts (Cont.)
Merged Class Names Merged Requirements------------------------------------
Requirement 1
Requirement 1
Merged Method Names
20%
70%
10WCRE 2011
Requirement 1
……….
……
Requirement N
Merged Variable Names
Merged Comments
40%
60%
Defining Experts (Cont.)
Method Name
Comments
70%
60%
11WCRE 2011
Variable Names
Class Names
40%
20%
Extreme Cases:
•5% difference in two experts
•95% difference in two experts
Link Recovery and Expert Voting
Class A Requirements------------------------------------
Email client must
support pop3
12WCRE 2011
support pop3
integration……….
Method Names of Class A
Comments of Class A
Case Studies
• Goal: Investigate the effectiveness of Coparvo in improving the accuracy of VSM and reducing the effort required to manually discard false-positive links
• Quality focus: Ability to recover traceability links
13WCRE 2011
• Quality focus: Ability to recover traceability links between requirements and source code
• Context: Recovering requirements traceability links of three open-source programs, Pooka, SIP, and iTrust
Research Questions
R01: How does Coparvo help to find valuable partitions of source code that help in recovering traceability links?
R02: How much Coparvo helps to reduce the effort required
14
R02: How much Coparvo helps to reduce the effort required to manually verify recovered traceability links?
R03: How does the F-measure value of the traceability links recovered by Coparvo compare with a traditional VSM-based approach?
WCRE 2011
Datasets
SIP Communicator: Voice over IP and instate messenger
Pooka: An email Client
iTrust: Medical Application
15
Pooka SIP Communicator iTrust
Version 2.0 1.0 10
Number of Classes 298 1,771 526
Number of Methods 20,868 31,502 3,404
LOC 244K 487K 19K
WCRE 2011
IR Quality Measures
16WCRE 2011
callecision
callecisionF
RePr
RePr2
+
××=
Source Code Partitions
1.Class name
1.Method name
17
2.Variable name
3.Comments
WCRE 2011
Text Preprocessing
• Filter (#43@$)
18
• Stop words (the, is, an….)
• Stemmer (attachment, attached -> attach)
WCRE 2011
Information Retrieval (IR) Methods
• Vector Space Model (VSM)
– Each document, d, is represented by a vector of ranks of
the terms in the vocabulary:
vd = [rd(w1), rd(w2), …, rd(w|V|)]
– The query is similarly represented by a vector– The query is similarly represented by a vector
– The similarity between the query and document is the
cosine of the angle between their respective vectors
19WCRE 2011
Defining Expert
40
50
60
CN
20WCRE 2011
0
10
20
30
Pooka SIP iTrust
MN
VN
Cmt
Pooka Results
21WCRE 2011
SIP Comm. Results
22WCRE 2011
iTrust Results
23WCRE 2011
Voting vs. Combination
• Can we only use different combinations of source code partitions to create requirements traceability links?
24WCRE 2011
• How much a combination of source code improves the F-measure?
Pooka Results
25WCRE 2011
SIP Comm. Results
26WCRE 2011
iTrust Results
27WCRE 2011
Statistical Tests
Non-parametric test – Mann-Whitney test
28WCRE 2011
F-measure
Pooka SIP Comm. iTrust
P-value p<0.01 p<0.01 p<0.01
Effort Analysis
40,000
50,000
60,000
70,000
80,000
90,000
VSM
29WCRE 2011
0
10,000
20,000
30,000
40,000
Pooka SIP Comm. iTrust
Coparvo
Effort Analysis (F-Measure)
8
10
12
14
VSM
30WCRE 2011
0
2
4
6
Pooka SIP Comm. iTrust
VSM
Coparvo
RQ Answers
R01: Combinations or single source-code partitions also sometime provides better results than Coparvo
R02: Using different source of information reduces experts’ effort up to 83%experts’ effort up to 83%
R03: Partitioning source code and using the partitions as experts for voting yields better accuracy
31WCRE 2011
Threats to Validity
• External validity:
• We analyzed only three systems
• Different source code size
• Construct validity:
• The two researchers built both oracles
• Oracles were validated by the other two experts
• iTrust oracle was developed by developer(s)
• Conclusion validity: Non-parametric test
• Tool is online at www.factrace.net
32WCRE 2011
Ongoing work
�More IR approaches
�Empirical study
�Threshold
33WCRE 2011
Questions?
34WCRE 2011