Extracting Code Clones for Refactoring Using Combinations of Clone Metrics

29
Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University Extracting Code Clones for Refactoring Using Combinations of Clone Metrics 1 †Osaka University, Japan itute of Science and Technology , Japan *NEC Corporation, Japan jong Choi †, Norihiro Yoshida‡, Takashi Ishio†, Katsuro Inoue†, and Tateki Sano*

description

Extracting Code Clones for Refactoring Using Combinations of Clone Metrics. Eunjong Choi † , Norihiro Yoshida ‡ , Takashi Ishio † , Katsuro Inoue † , and Tateki Sano*. †Osaka University, Japan ‡ Nara Institute of Science and Technology , Japan *NEC Corporation, Japan. Background: Clone Set. - PowerPoint PPT Presentation

Transcript of Extracting Code Clones for Refactoring Using Combinations of Clone Metrics

Page 1: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Extracting Code Clones for Refactoring Using Combinations of Clone Metrics

1

†Osaka University, Japan ‡Nara Institute of Science and Technology , Japan

*NEC Corporation, Japan

Eunjong Choi†, Norihiro Yoshida‡, Takashi Ishio†,Katsuro Inoue†, and Tateki Sano*

Page 2: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Background: Clone Set

A set of code clones that is similar or identical to each other

2

Clone Set: S1={Code Clone 1, Code Clone 3}

S2={Code Clone 2, Code Clone 4, Code Clone 5}

Code Clone 4Code Clone 4

Code Clone 5Code Clone 5Code Clone 3Code Clone 3

Code Clone 2Code Clone 2

Code Clone 1Code Clone 1 similar

identical

Page 3: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Background: Refactoring Code Clone

Merge code clones into a single program unit

3

Refactoring

Code Clone 3Code Clone 3

Code Clone 2Code Clone 2

Code Clone 1Code Clone 1

Code Clone 2Code Clone 2

Code Clone’ 1Code Clone’ 1

Page 4: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

/* Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0 */ else { // is the zip file in the cachefile); if == null) { (file);; }

4

Background: Language-dependent Code Clone

It is unavoidable to exist in source code because of features of the used program language.

/* Code Clone A */ replacement.setTaskType(taskType);replacement.setTaskName(taskName);replacement.setLocation(location);replacement.setOwningTarget(target);replacement.setRuntime (wrapper);wrapper.setProxy(replacement);/* … */

/* Code Clone B */def.setName(name);def.setClassName(classname);def.setClass(cl); def.setAdapterClass(adapterClass); def.setAdaptToClass(adaptToClass); def.setClassLoader(al);/* … */

Example of the language-dependent code clone(Consecutive setter invocations)

Page 5: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Background: Clone Metrics [Higo2007]

Quantitative information on clone setsE.g., LEN(S), RNR(S), POP(S)

PurposesTo check features of code clones in software To extract code clones for several purposes

E.g., refactoring, defect-prone code clones

5

[Higo2007] Yoshiki Higo, Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue, "Method and Implementation for Investigating Code Clones in a Software System", Information and Software Technology, pp. 985-998 (2007-9)

Page 6: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Clone Metrics: LEN(S)

The average length of token sequences of code clones in a clone set S

6

Clone set S

A token sequence [c c* ] is detected as a code clone from a token sequence <c c* c* a b>

Superscript * indicated that the token is in a repeated token sequence

LEN(S) = 2

Page 7: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Clone Metrics: RNR(S)

The ratio of non-repeated token sequences of code clones in a clone set S

7

Clone set S

RNR(S) =     • 100 = 50 1

2

The length of non-repeatedThe length of non-repeated token sequencetoken sequence

The length of whole The length of whole token sequencetoken sequence

A token sequence [c c* ] is detected as a code clone from a token sequence <c c* c* a b>

Page 8: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Clone Metrics: POP(S)

The number of code clones in a clone set S

8

Clone set S

POP(S) = 6

1

23

4 5

6

Page 9: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Single Clone Metric (1/2)

Clone sets whose RNR(S) is higherThey do not organize a single semantic unit

semantic unit : many instructions forming a single functionality

9

/* Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0 */ else { // is the zip file in the cache ZipFile zipFile = (ZipFile) zipFiles.get(file); if (zipFile == null) { zipFile = new ZipFile(file); zipFiles.put(file, zipFile); } ZipEntry entry = zipFile.getEntry(resourceName); if (entry != null) {x

a a part of semantic unitpart of semantic unit

Not Appropriate for Refactoring!

Page 10: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Single Clone Metric (2/2)

Clone sets whose POP(S) is higherThey Include many language-dependent code clones

10

/* Code Clone in a clone set whose POP(S) is the first highest in Ant 1.7.0 */ out.println("\">");

out.println("");

out.print("<!ELEMENT project (target | "); out.print(TASKS); out.print(" | "); out.print(TYPES);

Not Appropriate for Refactoring!

Page 11: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Key Idea

It is not appropriate to extract refactorable code clones using just a single clone metric According to our experiences

We propose a method based on combined clone metricsTo improve the weakness of single-metric-based

extraction

11

Page 12: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Combined Clone Metrics

Clone sets whose RNR(S), POPS(S) are higherEach code clone organizes a single semantic units

12

/* Code Clone in a clone set whose RNR(S), POP(S) are higher than others*/if (ifProperty != null && p.getProperty(ifProperty) == null) { return false; } else if (unlessProperty != null && p.getProperty(unlessProperty) != null) { return false; }

return true; }

Appropriate for Refactoring!

Page 13: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Case Study (1/2)

Goal: validating our key ideaUsing combined clone metrics is a feasible

method to extract code clone for refactoring

Target SystemIndustrial Java software developed by NEC110KLOC, 736 clone sets

13

Page 14: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Case Study (2/2)

Experimental Step1. Selected 62 clone sets from CCFinder's

output using clone metrics.

2. Conducted a survey about these clone sets and got feedback from a developer.

14

Source files

CCFinderClone sets using clone metrics

Survey

Feed back

Page 15: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Subject Code Clones (1/2)

Clone sets whose either clone metric value is highClone sets whose LEN(S) value is top 10 highClone sets whose RNR(S) value is top 10 highClone sets whose POP(S) value is top 10 high

15

Page 16: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Subject Code Clones (2/2)

Clone sets whose combined clone metrics values are high15 clone sets whose LEN(S) and RNR(S) values

are high rank in the top 157 clone sets whose LEN(S) and POP(S) values

are high rank in the top 1518 clone sets whose RNR(S) and POP(S) values

are high rank in the top 151 clone set whose LEN(S), RNR(S) and POP(S)

values are high rank in the top 15

16

Page 17: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Results of Case Study (1/2)

17

#Selected Clone Sets: The number of selected clones #Refactoring: The number of clone sets marked as

“Perform refactoring“ in survey

Filtering#Selected Clone Sets

#Refactoring Precision

Each Single Clone metric 30 14 0.47

Combined Clone metrics 41 34 0.87

Page 18: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Results of Case Study (2/2)

18

Precision : “How many refactoring candidates were accepted by a developer?“

Combined clone metrics is more accepted as refactoring candidates by a developer

#Refactoring

#Selected Clone SetsPrecision =

Filtering#Selected Clone Sets

#Refactoring Precision

Each Single Clone metric 30 14 0.47

Combined Clone metrics 41 34 0.87

Page 19: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Summary and Future Work

SummaryOur Industrial case study shows that our key idea

is appropriate.

Future WorkInvestigate about recallConduct case studies of open source softwareSuggest a new metric

19

Page 20: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University 20

Thank You

Page 21: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Clone sets whose RNR(S) is higher than others

Each code clone in a clone set S consists of more non-repeated token sequences

21

/* Code Clone in a clone set whose RNR(S) is the second highest in Ant 1.7.0 */ else { // is the zip file in the cache ZipFile zipFile = (ZipFile) zipFiles.get(file); if (zipFile == null) { zipFile = new ZipFile(file); zipFiles.put(file, zipFile); } ZipEntry entry = zipFile.getEntry(resourceName); if (entry != null) {/* … */

Page 22: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Clone sets whose RNR(S) is lower than others

Consists of more repeated token sequences Involve in language-dependent code clone

22

/* Code Clone in a clone set whose RNR(S) is the lowest in Ant 1.7.0 */ String sosCmdDir = null; …… skip code….  

   private String filename = null;

private boolean noCompress = false; private boolean noCache = false; private boolean recursive = false; private boolean verbose = false;/* … */

Consecutive variable declarations

Page 23: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Survey Format: About Clone set XXX

(1) Do you think that this clone set need a practice?    [] Yes [] No (→ Jump to next clone set)

 (2) If you marked “Yes” in your answer to (1), what practice is

appropriate for this clone set?   [] Refactoring

  [] Write comments about code clones, but don’t perform refactoring.

[] Change nothing.

[] Others. (           

(3) Write the reason why did you mark in your answer to (2)

  Reason :

23

Page 24: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Results, and Precision of each clone set in the survey

24

Filtering #Selected Clone Sets

#Refactoring Precision

Clone sets whose LEN(S) value is top 10 high 10 7 0.70Clone sets whose RNR(S) value is top 10 high 10 4 0.40Clone sets whose POP(S) value is top 10 high 10 3 0.30Clone sets whose LEN(S) and RNR(S) values are high rank in the top 15

15 13 0.87

Clone sets whose LEN(S) and POP(S) values are high rank in the top

7 6 0.86

RNR(S) and POP(S) values are high rank in the top 15

18 14 0.78

Clone sets whose 1 clone set whose LEN(S), RNR(S), and POP(S) values are high rank in the top 15

1 1 1.00

Page 25: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Clone metric: RNR(S) (1/2)

File:F1: a b c a b,F2: c c* c* a b,F3: d a b, e fF4: c c* d e f

Superscript * indicated that the token is in a repeated token sequence

RNR(S1) of Clone Set S1 is

25

RNR(S1) =            • 100 = 100

2 + 2 + 2 + 22 + 2 + 2 + 2

Clone Set:S1: { , , , }ab ab ab ab

Page 26: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Clone metric: RNR(S) (2/2)

File:F1: a b c a b,F2: c c* c* a b,F3: d a b, e fF4: c c* d e f

Superscript * indicated that the token is in a repeated token sequence

RNR(S2) of Clone Set S2 is

26

Clone Set:S2: { , , }c c* c* c* c c*

RNR(S2) = • 100 = 33.31 + 0 + 12 + 2 + 2

Page 27: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Subject Code Clones

62 clone sets clone sets whose individual clone metric value is high

SLEN Clone sets whose LEN(S) value is top 10 high.SRNR Clone sets whose RNR(S) value is top 10 high.SPOP Clone sets whose POP(S) value is top 10 high.

clone sets whose combined clone metrics values are highSLEN∙RNR 15 clone sets whose LEN(S) and RNR(S) values are high

rank in the top 15.SLEN∙POP 7 clone sets whose LEN(S) and POP(S) values are high

rank in the top 15.SRNR∙POP 18 clone sets whose RNR(S) and POP(S) values are high

rank in the top 15.SLEN∙RNR∙POP 1 clone set whose LEN(S), RNR(S) and POP(S) values

are high rank in the top 15.

27

Page 28: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

| SRNR ∩ SPOP ∩ SRNR ∙ POP| = 1 | SRNR ∩ SRNR ∙ POP| = 2 | S POP ∩ SRNR ∙ POP| = 2 | SLEN ∙ RNR ∩ SLEN ∙ POP ∩ SRNR ∙ POP

∩ SLEN ∙ RNR ∙ POP| = 1

CS セミナー 2010/12/01

28

The Number of Duplicate Clone Set

Page 29: Extracting Code Clones  for Refactoring Using  Combinations of Clone Metrics

Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Example of clone set that are not selected…

It is too short to organize a semantic unit. RNR metric sometimes extract unintentional code

clones E.g., Language-dependent code clones

29

boolean isEqual(final DeweyDecimal other) { final int max = Math.max(other.components.length, components.length);

for (int i = 0; i < max; i++) { final int component1 = (i < components.length) ? components[ i ] : 0; final int component2 = (i < other.components.length) ? other.components[ i ] : 0; if (