Extracting Code Clones for Refactoring Using Combinations of Clone Metrics
-
Upload
brenden-duncan -
Category
Documents
-
view
35 -
download
3
description
Transcript of Extracting Code Clones for Refactoring Using Combinations of Clone Metrics
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Extracting Code Clones for Refactoring Using Combinations of Clone Metrics
1
†Osaka University, Japan ‡Nara Institute of Science and Technology , Japan
*NEC Corporation, Japan
Eunjong Choi†, Norihiro Yoshida‡, Takashi Ishio†,Katsuro Inoue†, and Tateki Sano*
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Background: Clone Set
A set of code clones that is similar or identical to each other
2
Clone Set: S1={Code Clone 1, Code Clone 3}
S2={Code Clone 2, Code Clone 4, Code Clone 5}
Code Clone 4Code Clone 4
Code Clone 5Code Clone 5Code Clone 3Code Clone 3
Code Clone 2Code Clone 2
Code Clone 1Code Clone 1 similar
identical
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Background: Refactoring Code Clone
Merge code clones into a single program unit
3
Refactoring
Code Clone 3Code Clone 3
Code Clone 2Code Clone 2
Code Clone 1Code Clone 1
Code Clone 2Code Clone 2
Code Clone’ 1Code Clone’ 1
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
/* Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0 */ else { // is the zip file in the cachefile); if == null) { (file);; }
4
Background: Language-dependent Code Clone
It is unavoidable to exist in source code because of features of the used program language.
/* Code Clone A */ replacement.setTaskType(taskType);replacement.setTaskName(taskName);replacement.setLocation(location);replacement.setOwningTarget(target);replacement.setRuntime (wrapper);wrapper.setProxy(replacement);/* … */
/* Code Clone B */def.setName(name);def.setClassName(classname);def.setClass(cl); def.setAdapterClass(adapterClass); def.setAdaptToClass(adaptToClass); def.setClassLoader(al);/* … */
Example of the language-dependent code clone(Consecutive setter invocations)
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Background: Clone Metrics [Higo2007]
Quantitative information on clone setsE.g., LEN(S), RNR(S), POP(S)
PurposesTo check features of code clones in software To extract code clones for several purposes
E.g., refactoring, defect-prone code clones
5
[Higo2007] Yoshiki Higo, Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue, "Method and Implementation for Investigating Code Clones in a Software System", Information and Software Technology, pp. 985-998 (2007-9)
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Clone Metrics: LEN(S)
The average length of token sequences of code clones in a clone set S
6
Clone set S
A token sequence [c c* ] is detected as a code clone from a token sequence <c c* c* a b>
Superscript * indicated that the token is in a repeated token sequence
LEN(S) = 2
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Clone Metrics: RNR(S)
The ratio of non-repeated token sequences of code clones in a clone set S
7
Clone set S
RNR(S) = • 100 = 50 1
2
The length of non-repeatedThe length of non-repeated token sequencetoken sequence
The length of whole The length of whole token sequencetoken sequence
A token sequence [c c* ] is detected as a code clone from a token sequence <c c* c* a b>
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Clone Metrics: POP(S)
The number of code clones in a clone set S
8
Clone set S
POP(S) = 6
1
23
4 5
6
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Single Clone Metric (1/2)
Clone sets whose RNR(S) is higherThey do not organize a single semantic unit
semantic unit : many instructions forming a single functionality
9
/* Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0 */ else { // is the zip file in the cache ZipFile zipFile = (ZipFile) zipFiles.get(file); if (zipFile == null) { zipFile = new ZipFile(file); zipFiles.put(file, zipFile); } ZipEntry entry = zipFile.getEntry(resourceName); if (entry != null) {x
a a part of semantic unitpart of semantic unit
Not Appropriate for Refactoring!
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Single Clone Metric (2/2)
Clone sets whose POP(S) is higherThey Include many language-dependent code clones
10
/* Code Clone in a clone set whose POP(S) is the first highest in Ant 1.7.0 */ out.println("\">");
out.println("");
out.print("<!ELEMENT project (target | "); out.print(TASKS); out.print(" | "); out.print(TYPES);
Not Appropriate for Refactoring!
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Key Idea
It is not appropriate to extract refactorable code clones using just a single clone metric According to our experiences
We propose a method based on combined clone metricsTo improve the weakness of single-metric-based
extraction
11
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Combined Clone Metrics
Clone sets whose RNR(S), POPS(S) are higherEach code clone organizes a single semantic units
12
/* Code Clone in a clone set whose RNR(S), POP(S) are higher than others*/if (ifProperty != null && p.getProperty(ifProperty) == null) { return false; } else if (unlessProperty != null && p.getProperty(unlessProperty) != null) { return false; }
return true; }
Appropriate for Refactoring!
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Case Study (1/2)
Goal: validating our key ideaUsing combined clone metrics is a feasible
method to extract code clone for refactoring
Target SystemIndustrial Java software developed by NEC110KLOC, 736 clone sets
13
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Case Study (2/2)
Experimental Step1. Selected 62 clone sets from CCFinder's
output using clone metrics.
2. Conducted a survey about these clone sets and got feedback from a developer.
14
Source files
CCFinderClone sets using clone metrics
Survey
Feed back
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Subject Code Clones (1/2)
Clone sets whose either clone metric value is highClone sets whose LEN(S) value is top 10 highClone sets whose RNR(S) value is top 10 highClone sets whose POP(S) value is top 10 high
15
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Subject Code Clones (2/2)
Clone sets whose combined clone metrics values are high15 clone sets whose LEN(S) and RNR(S) values
are high rank in the top 157 clone sets whose LEN(S) and POP(S) values
are high rank in the top 1518 clone sets whose RNR(S) and POP(S) values
are high rank in the top 151 clone set whose LEN(S), RNR(S) and POP(S)
values are high rank in the top 15
16
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Results of Case Study (1/2)
17
#Selected Clone Sets: The number of selected clones #Refactoring: The number of clone sets marked as
“Perform refactoring“ in survey
Filtering#Selected Clone Sets
#Refactoring Precision
Each Single Clone metric 30 14 0.47
Combined Clone metrics 41 34 0.87
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Results of Case Study (2/2)
18
Precision : “How many refactoring candidates were accepted by a developer?“
Combined clone metrics is more accepted as refactoring candidates by a developer
#Refactoring
#Selected Clone SetsPrecision =
Filtering#Selected Clone Sets
#Refactoring Precision
Each Single Clone metric 30 14 0.47
Combined Clone metrics 41 34 0.87
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Summary and Future Work
SummaryOur Industrial case study shows that our key idea
is appropriate.
Future WorkInvestigate about recallConduct case studies of open source softwareSuggest a new metric
19
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 20
Thank You
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Clone sets whose RNR(S) is higher than others
Each code clone in a clone set S consists of more non-repeated token sequences
21
/* Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0 */ else { // is the zip file in the cache ZipFile zipFile = (ZipFile) zipFiles.get(file); if (zipFile == null) { zipFile = new ZipFile(file); zipFiles.put(file, zipFile); } ZipEntry entry = zipFile.getEntry(resourceName); if (entry != null) {/* … */
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Clone sets whose RNR(S) is lower than others
Consists of more repeated token sequences Involve in language-dependent code clone
22
/* Code Clone in a clone set whose RNR(S) is the lowest in Ant 1.7.0 */ String sosCmdDir = null; …… skip code….
private String filename = null;
private boolean noCompress = false; private boolean noCache = false; private boolean recursive = false; private boolean verbose = false;/* … */
Consecutive variable declarations
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Survey Format: About Clone set XXX
(1) Do you think that this clone set need a practice? [] Yes [] No (→ Jump to next clone set)
(2) If you marked “Yes” in your answer to (1), what practice is
appropriate for this clone set? [] Refactoring
[] Write comments about code clones, but don’t perform refactoring.
[] Change nothing.
[] Others. (
(3) Write the reason why did you mark in your answer to (2)
Reason :
23
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Results, and Precision of each clone set in the survey
24
Filtering #Selected Clone Sets
#Refactoring Precision
Clone sets whose LEN(S) value is top 10 high 10 7 0.70Clone sets whose RNR(S) value is top 10 high 10 4 0.40Clone sets whose POP(S) value is top 10 high 10 3 0.30Clone sets whose LEN(S) and RNR(S) values are high rank in the top 15
15 13 0.87
Clone sets whose LEN(S) and POP(S) values are high rank in the top
7 6 0.86
RNR(S) and POP(S) values are high rank in the top 15
18 14 0.78
Clone sets whose 1 clone set whose LEN(S), RNR(S), and POP(S) values are high rank in the top 15
1 1 1.00
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Clone metric: RNR(S) (1/2)
File:F1: a b c a b,F2: c c* c* a b,F3: d a b, e fF4: c c* d e f
Superscript * indicated that the token is in a repeated token sequence
RNR(S1) of Clone Set S1 is
25
RNR(S1) = • 100 = 100
2 + 2 + 2 + 22 + 2 + 2 + 2
Clone Set:S1: { , , , }ab ab ab ab
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Clone metric: RNR(S) (2/2)
File:F1: a b c a b,F2: c c* c* a b,F3: d a b, e fF4: c c* d e f
Superscript * indicated that the token is in a repeated token sequence
RNR(S2) of Clone Set S2 is
26
Clone Set:S2: { , , }c c* c* c* c c*
RNR(S2) = • 100 = 33.31 + 0 + 12 + 2 + 2
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Subject Code Clones
62 clone sets clone sets whose individual clone metric value is high
SLEN Clone sets whose LEN(S) value is top 10 high.SRNR Clone sets whose RNR(S) value is top 10 high.SPOP Clone sets whose POP(S) value is top 10 high.
clone sets whose combined clone metrics values are highSLEN∙RNR 15 clone sets whose LEN(S) and RNR(S) values are high
rank in the top 15.SLEN∙POP 7 clone sets whose LEN(S) and POP(S) values are high
rank in the top 15.SRNR∙POP 18 clone sets whose RNR(S) and POP(S) values are high
rank in the top 15.SLEN∙RNR∙POP 1 clone set whose LEN(S), RNR(S) and POP(S) values
are high rank in the top 15.
27
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
| SRNR ∩ SPOP ∩ SRNR ∙ POP| = 1 | SRNR ∩ SRNR ∙ POP| = 2 | S POP ∩ SRNR ∙ POP| = 2 | SLEN ∙ RNR ∩ SLEN ∙ POP ∩ SRNR ∙ POP
∩ SLEN ∙ RNR ∙ POP| = 1
CS セミナー 2010/12/01
28
The Number of Duplicate Clone Set
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University
Example of clone set that are not selected…
It is too short to organize a semantic unit. RNR metric sometimes extract unintentional code
clones E.g., Language-dependent code clones
29
boolean isEqual(final DeweyDecimal other) { final int max = Math.max(other.components.length, components.length);
for (int i = 0; i < max; i++) { final int component1 = (i < components.length) ? components[ i ] : 0; final int component2 = (i < other.components.length) ? other.components[ i ] : 0; if (