コードクローン解析に基づくリファクタリング支援 (Refactoring Support Based on...

23
1 Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University コココココココココココココココココココココココ (Refactoring Support Based on Code Clone Analysis) ココ ココ ココ ココ ココ ココ ココ ココ (Yoshiki Higo, Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue) ココココ ココココココココココ (Graduate School of Information Science and Technology, Osaka University) ココココココココ ココココ (Presto, Japan Science and Technology Agency) {y-higo,kamiya,kusumoto,inoue}@ist.osaka-u.ac.jp

description

コードクローン解析に基づくリファクタリング支援 (Refactoring Support Based on Code Clone Analysis). 肥後 芳樹,神谷 年洋,楠本 真二,井上 克郎 (Yoshiki Higo, Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue) 大阪大学 大学院情報科学研究科 (Graduate School of Information Science and Technology, Osaka University) 科学技術振興機構 さきがけ - PowerPoint PPT Presentation

Transcript of コードクローン解析に基づくリファクタリング支援 (Refactoring Support Based on...

Page 1: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

1Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

コードクローン解析に基づくリファクタリング支援 (Refactoring Support Based on Code Clone Analysis)

肥後 芳樹,神谷 年洋,楠本 真二,井上 克郎(Yoshiki Higo, Toshihiro Kamiya, Shinji Kusumoto, Katsuro Inoue)

大阪大学 大学院情報科学研究科(Graduate School of Information Science and Technology,

Osaka University)科学技術振興機構 さきがけ

(Presto, Japan Science and Technology Agency)

{y-higo,kamiya,kusumoto,inoue}@ist.osaka-u.ac.jp

Page 2: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

2Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Background

What is code clone? a code fragment that has identical or similar fragments in the same or different files in a systemintroduced in the source program because of various reasons such as reusing code by `copy-and-paste’makes software maintenance more difficult.

copy-and-pastecopy-and-paste

Page 3: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

3Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Requirements for Code Clone Detection

Appropriate code clones should be detected in compliance with demands.

To understand the amount and distribution of code clones, it is desirable to detect all code clones

To remove code clones (Restructuring or Refactoring), it is useful to detect code clones that can be removed, and also removing them improves software maintainability

Page 4: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

4Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Research Objective and Approach

We aim to extract code clones which can be easily refactored

ApproachTo detect code clones efficiently, we use a code clone detection tool, CCFinder. Then, we extract the specific code clones easily refactored and provide applicable refactoring patterns for the code clones.Finally, we develop a refactoring support tool and apply it to an open source program.

Page 5: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

5Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Refactoring Process Support

Commonly used refactoring processStep 1: Determine where refactoring should be appliedStep 2: Determine which refactoring patterns can/should be

appliedStep 3: Investigate the effectiveness of the refactoring patternsStep 4: Modify source codeStep 5: Conduct regression tests

Proposed Method supports Steps1 and 2High scalability: it take less of high time complexity.Detect fine-graded clone: it detect more fine-graded code clone than method unit.

Page 6: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

6Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Outline of CCFinder

CCFinder directly compares source code on token unit, and detects code clones

Normalization of name spaceReplacement of names defined by userRemoval of table initializationConsideration of module delimiter

CCFinder can analyze the system of millions line scale in practical use time

Page 7: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

7Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Source files

Lexical analysis

Transformation

Token sequence

Match detection

Transformed token sequence

Clones on transformed sequence

Formatting

Clone pairs

1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. }10. static void goo(String [] a) throws RESyntaxException {11. RE exp = new RE("[0-9,]+");12. int sum = 0;13. for (int i = 0; i < a.length; ++i)14. if (exp.match(a[i]))15. sum += parseNumber(exp.getParen(0));16. System.out.println("sum = " + sum);17. }

static void foo ( ) { String a

[ ] = new String [ ] { "123,400" ,

"abc" , "orange 100" } ;

int sum = 0

; for ( int i = 0 ; i <

a . length ; ++ i )

sum

+= pat . getParen 0

; System . out . println ( "sum = "

+ sum ) ; }

throws RESyntaxException

Sample . parseNumber (

) )

if pat

. match a [ i ]( ) )

org . apache . regexp

. RE pat = new org . apache . regexp

. RE ( "[0-9,]+" ) ;

static void goo (

) {

String

a [ ]

int sum = 0

; for ( int i = 0 ; i <

a . length ; ++ i )

System . out . println ( "sum = " + sum

) ; }

throws RESyntaxException

if exp

. match a [ i ]( ) )

exp =

new RE ( "[0-9,]+" ) ;

(

RE

sum

+= exp . getParen 0

;

parseNumber ( ) )(

(

(

[ ] = new String [ ] {

} ;

int sum = 0

; for ( int i = 0 ; i <

a . length ; ++ i )

sum

+= pat . getParen 0

; System . out . println ( "sum = "

+ sum ) ; }

Sample . parseNumber (

) )

if pat

. match a [ i ]( ) )

pat = new

RE ( "[0-9,]+" ) ;

static void goo (

) {

String

a [ ]

int sum = 0

; for ( int i = 0 ; i <

a . length ; ++ i )

System . out . println ( "sum = " + sum

) ; }

throws RESyntaxException

if exp

. match a [ i ]( ) )

exp =

new RE ( "[0-9,]+" ) ;

(

RE

sum

+= exp . getParen 0

;

parseNumber ( (

(

(

static void foo ( ) { String athrows RESyntaxException

$

RE

$ . ) )

Lexical analysis

Transformation

Token sequence

Match detection

Transformed token sequence

Clones on transformed sequence

Formatting

[ ] = new String [ ] {

} ;

int sum = 0

; for ( int i = 0 ; i <

a . length ; ++ i )

sum

+= pat . getParen 0

; System . out . println ( "sum = "

+ sum ) ; }

Sample . parseNumber (

) )

if pat

. match a [ i ]( ) )

pat = new

RE ( "[0-9,]+" ) ;

static void goo (

) {

String

a [ ]

int sum = 0

; for ( int i = 0 ; i <

a . length ; ++ i )

System . out . println ( "sum = " + sum

) ; }

throws RESyntaxException

if exp

. match a [ i ]( ) )

exp =

new RE ( "[0-9,]+" ) ;

(

RE

sum

+= exp . getParen 0

;

parseNumber ( ) )(

(

(

static void foo ( ) { String athrows RESyntaxException

$

RE

$ .

[ ] = [ ] {

} ;

=

; for ( = ; <

. ; ++ )

+= .

; . . (

+ ) ; }

. (

) )

if

. [ ]( ) )

=

( ) ;

static (

) {[ ]

=

; ( = ; <

. ; ++ )

. . ( +

) ; }

throws

if

. [ ]( ) )

=

new ( ) ;

(

+= .

;

( ) )(

(

(

static $ ( ) {throws

$

$ .

$ $ $ $

$ $

$ $

$ $ $ $ $

$ $ $ $

$ $ $ $

$ $ $ $

$ $ $ $ $

$ $ $ $

$ $ $ $

$ $ $ $

$ $ $ $ $

$ $ $ $

$ $ $ $

$ $ $ $

$ $ $ $

$ $ $ $ $

new

forfor

new

[ ] = [ ] {

} ;

=

; for ( = ; <

. ; ++ )

+= .

; . . (

+ ) ; }

. (

) )

if

. [ ]( ) )

=

( ) ;

static (

) {[ ]

=

; ( = ; <

. ; ++ )

. . ( +

) ; }

throws

if

. [ ]( ) )

=

new ( ) ;

(

+= .

;

( ) )(

(

(

static $ ( ) {throws

$

$ .

$ $ $ $

$ $

$ $

$ $ $ $ $

$ $ $ $

$ $ $ $

$ $ $ $

$ $ $ $ $

$ $ $ $

$ $ $ $

$ $ $ $

$ $ $ $ $

$ $ $ $

$ $ $ $

$ $ $ $

$ $ $ $

$ $ $ $ $

Lexical analysis

Transformation

Token sequence

Match detection

Transformed token sequence

Clones on transformed sequence

Formatting

1. static void foo() throws RESyntaxException { 2. String a[] = new String [] { "123,400", "abc", "orange 100" }; 3. org.apache.regexp.RE pat = new org.apache.regexp.RE("[0-9,]+"); 4. int sum = 0; 5. for (int i = 0; i < a.length; ++i) 6. if (pat.match(a[i])) 7. sum += Sample.parseNumber(pat.getParen(0)); 8. System.out.println("sum = " + sum); 9. }10. static void goo(String [] a) throws RESyntaxException {11. RE exp = new RE("[0-9,]+");12. int sum = 0;13. for (int i = 0; i < a.length; ++i)14. if (exp.match(a[i]))15. sum += parseNumber(exp.getParen(0));16. System.out.println("sum = " + sum);17. }

Lexical analysis

Transformation

Token sequence

Match detection

Transformed token sequence

Clones on transformed sequence

Formatting

CCFinder:Clone Detection Process

Page 8: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

8Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Definitions:Clone Pair and Clone Set

Clone Pair: a pair of identical or similar fragmentsClone Set: a set of identical or similar fragments

CCFinder detects code clones as a clone pairAfter detection process, clone pairs are transformed into clone sets

C1

C5

C4

C3

C2

Clone Pair Clone Set

(C1, C4) {C1, C4, C5}

(C1, C5) {C2, C3}

(C2, C3)

(C4, C5)

Page 9: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

9Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Extraction of code clones easily refactored

Structural code clones are regarded as the target of refactoringDetect clone pairs by CCFinderTransform the detected clone pairs into clone setsExtract structural parts as structural code clones from the detected clone sets

What is a structural code clone ?example: Java language

Declaration: class declaration, interface declarationMethod: method body, constructor, static initializerstatement: do, for, if, switch, synchronized, try, while

Page 10: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

10Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

609: reset();610: grammar = g;611: // Lookup make-switch threshold in the grammar generic options612: if (grammar.hasOption("codeGenMakeSwitchThreshold")) {613: try {614: makeSwitchThreshold = grammar.getIntegerOption("codeGenMakeSwitchThreshold");615: //System.out.println("setting codeGenMakeSwitchThreshold to " + makeSwitchThreshold);616: } catch (NumberFormatException e) {617: tool.error(618: "option 'codeGenMakeSwitchThreshold' must be an integer",619: grammar.getClassName(),620: grammar.getOption("codeGenMakeSwitchThreshold").getLine()621: );622: }623: }624:625: // Lookup bitset-test threshold in the grammar generic options626: if (grammar.hasOption("codeGenBitsetTestThreshold")) {627: try {628: bitsetTestThreshold = grammar.getIntegerOption("codeGenBitsetTestThreshold");

623: }624:625: // Lookup bitset-test threshold in the grammar generic options626: if (grammar.hasOption("codeGenBitsetTestThreshold")) {627: try {628: bitsetTestThreshold = grammar.getIntegerOption("codeGenBitsetTestThreshold");629: //System.out.println("setting codeGenBitsetTestThreshold to " + bitsetTestThreshold);630: } catch (NumberFormatException e) {631: tool.error(632: "option 'codeGenBitsetTestThreshold' must be an integer",633: grammar.getClassName(),634: grammar.getOption("codeGenBitsetTestThreshold").getLine()635: );636: }637: }638:639: // Lookup debug code-gen in the grammar generic options640: if (grammar.hasOption("codeGenDebug")) {641: Token t = grammar.getOption("codeGenDebug");642: if (t.getText().equals("true")) {

fragment 1

fragment 2

Code clones which CCFinder detects

Code clones which proposed method detects

Page 11: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

11Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

1527: if ( inputState.guessing==0 ) {1528: t=a.getText();1529: }1530: {1531: _loop84:1532: do {1533: if ((LA(1)==COMMA)) {1534: match(COMMA);1535: id();1536: if ( inputState.guessing==0 ) {1537: t+=","+b.getText();1538: }1539: }

1007: if ( inputState.guessing==0 ) {1008: buf.append(a.getText());1009: } 1010: {1011: _loop144:1012: do {1013: if ((LA(1)==WILDCARD)) {1014: match(WILDCARD);1015: a=id();1016: if ( inputState.guessing==0 ) {1017: buf.append('.'); buf.append(a.getText());1018: }1019: }

fragment 3

fragment 4Code clones which CCFinder detects

Page 12: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

12Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Provision of applicable refactoring patterns

Following refactoring patterns[1][2] can be used to remove code sets including structural code clones

Extract Class,Extract Method,Extract Super Class,Form Template Method,Move Method,Parameterize Method,Pull Up Constructor,Pull Up Method,

For each clone set, the proposed method determines which refactoring pattern is applicable by using several metrics.

[1]: M. Fowler: Refactoring: Improving the Design of Existing Code, Addison-Wesley, 1999.[2]: http://www.refactoring.com/, 2004.

Page 13: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

13Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Metrics(1):Volume Metrics for Clone SetLEN, POP, DFL

LEN(S): is the average length of token sequence for a clone set SPOP(S): is the number of elements (code fragments) of a clone set SDFL(S): indicates an estimation of how many tokens would be removed from source files when all code fragments in a clone set S are reconstructed

new sub routinecaller statements

Page 14: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

14Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Metrics(2): Coupling Metrics for Clone SetNRV, NSV

NRV(S): represents the average number of externally defined variables referred in the fragment of a clone set S

NSV(S): represents the average number of externally defined variables assigned to in the fragment of a clone set S

Definition

Clone set S includes fragment f1, f2, ・・・ , fn

si is the number of externally defined variable which fragment fi refersti is the number of externally defined variable which fragment fi assigns

int a , b, c; … if( … ){ …;

… = b + c; a = …; …;

} …

assignment

reference

Fragment f1

example :・ Clone set S includes fragments f1 and f2. ・ In fragment f1 , externally defined variable b and c are referred and a is assigned to.・ Fragment f2 is same as f1.

then , NRV(S) = ( 2 + 2 ) / 2 = 2 NSV(S) = ( 1 + 1 ) / 2 = 1

Page 15: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

15Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Metrics ( 3 ): Inheritance Metric for Clone SetDCH

DCH(S): represents the position and distance between each fragment of a clone set S

Definition

Clone set S includes fragment f1, f2, ・・・, fn

Fragment fi exists in class Ci

Class Cp is a class which locates lowest position in C1, C2, ・・・,Cn on class hierarchy

If no common parent class of C1 , C 2,・・・, Cn exists, the value of DCH(S) is ∞This metric is measured for only the class hierarchy where target software exists.

example 1:・ Clone set S includes fragments f1 and f2.・ If all fragments of clone set S are included in a same class,

then , DCH(S) = 0

class A

fragment f1

fragment f2

class A

class B class C

fragment f1  fragment f2

example 2 :・ Clone set S includes fragments f1 and f2.・ If all fragments of clone set S are included in a class and its direct child classes,

then , DCH(S) = 1

fragment f1  fragment f2

class A class B

example 3 :・ Clone set S includes fragments f1 and f2.・ If all classes which include f1 and f2 don’t have common parent class,

then , DCH(S) = ∞

Page 16: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

16Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Aries: Refactoring Support ToolOverview

Target: Java programsRuntime environment: JDK1.4 or aboveImplementation

Analysis component: Java 32,000 LinesCCFinder is used as code clone detection componentJavaCC is used to construct syntax and semantic analysis component

GUI component: Java14,000 Lines

User can specify target clone sets through GUI operations.

Page 17: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

17Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Case Study: AntOverview

Ant is one of build tools like ‘make’Input for Aries

Source files of Ant: 627LOC: about 180,000

It took 30 seconds to extract structural code clonesWe got 151 clone sets.Environment

OS: FreeBSD 4.9CPU: Xeon 2.8G x 2Memory: 4GB

Page 18: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

18Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Case Study: AntExtract Method (conditions)

To apply ‘Extract Method’ pattern, we filtered clone sets by using following conditions

The unit of clone is statement (do, for, if, …)

Set the value of DCH(S) = 0 All fragments of a clone set are included in a class

Set the value of NSV(S) < 2 Each fragment of a clone set assigns any value to 1 or no externally defined variable.

32 clone sets satisfied these conditions

Page 19: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

19Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Case Study: AntExtract Method(result)

32 clone set can be categorized as followings

category numberNo parameter, no return value 3

Addition of some parameters, no return value 18Addition of some parameters and return the value 7

Others 4if (!isChecked()) { // make sure we don't have a circular reference here        Stack stk = new Stack();        stk.push(this);        dieOnCircularReference(stk, getProject()); }

if (iSaveMenuItem == null) { try {         iSaveMenuItem = new MenuItem();               iSaveMenuItem.setLabel("Save BuildInfo To Repository"); } catch (Throwable iExc) {                handleException(iExc);  } }

assignment

if (name == null) { if (other.name != null) { return false;      }} else if (!name.equals(other.name)) { return false;}

// javacoptsif (javacopts != null && !javacopts.equals("")) { genicTask.createArg().setValue("-javacopts");       genicTask.createArg().setLine(javacopts); }

local variable

Page 20: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

20Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Conclusion

We haveproposed refactoring support method implemented a refactoring support tool, Ariesconducted a case study to Ant, which is an open source program, and most of filtered clone sets could be removed.

Page 21: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

21Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Future Works

As future works, we are going toevaluate whether or not each refactoring should be done as the viewpoint of software quality (support Step 3)find a group of clone sets that can be refactored at once to conduct refactoring more effectively

Commonly used refactoring processStep 1: Determine where refactoring should be appliedStep 2: Determine which refactoring patterns can/should be ap

pliedStep 3: Investigate the effectiveness of the refactoring patternsStep 4: Modify source codeStep 5: Conduct regression tests

Page 22: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

22Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Page 23: コードクローン解析に基づくリファクタリング支援  (Refactoring Support Based on Code Clone Analysis)

23Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Code clone detection for refactoring:Related Works

Detect similar sub-graphs as clone on program dependency graph [1].

High accuracy: This approach finds out data-dependence and control dependence in source codes.High time complexity: It takes O(n2) time to construct program dependency graph.

Detect similar methods and functions as clone using metrics [2].

Low accuracy: if the size of target method or function is small, the values of metric make no difference.detection unit restriction: only method and function unit clone can be detected.

[1] R. Komondoor and S. Horwitz, “Using slicing to identify duplication in   source code”, In Proc. of the 8th International Symposium on Static Analysis, Paris, France, July 16-18, 2001.[2] Magdalena Balazinska, Ettore Merlo, Michel Dagenais, Bruno Lague, and Lostas Kontogiannis, “Advanced Clone-Analysis to Support Object-Oriented System Refactoring”, WCRE 2000, pp. 98-107