Andrew G. West and Insup Lee August 28, 2012

40
Andrew G. West and Insup Lee August 28, 2012 Towards Content-Driven Reputation for Collaborative Code Repositories

description

Andrew G. West and Insup Lee August 28, 2012. Towards Content-Driven Reputation for Collaborative Code Repositories. Big Concept. Do the computed reputations accurately reflect user behavior? If so, how could such a system be useful in practice ? - PowerPoint PPT Presentation

Transcript of Andrew G. West and Insup Lee August 28, 2012

Page 1: Andrew G. West  and Insup Lee August 28, 2012

Andrew G. West and Insup LeeAugust 28, 2012

Towards Content-Driven Reputation for Collaborative Code Repositories

Page 2: Andrew G. West  and Insup Lee August 28, 2012

Big Concept

1. Do the computed reputations accurately reflect user behavior? If so, how could such a system be useful in practice?

2. What do inaccuracies teach us about differences in the evolution of code vs. natural language content? Adaptation?

2

Apply reputation algorithms developed for wikis in collaborative code repositories:

Page 3: Andrew G. West  and Insup Lee August 28, 2012

Motivations

Platform equivalence• Purely collaborative• Increasingly distributed;

collaboration between unknown/un-trusted parties

3

VehicleForge.mil [1]•Crowdsourcing a next generation military vehicle•Trust implications!

Page 4: Andrew G. West  and Insup Lee August 28, 2012

CONTENT-DRIVEN REPUATION

4

Page 5: Andrew G. West  and Insup Lee August 28, 2012

Content Driven Rep.

5

V1V0

Article Version History

Initialization

AuthorsA1

Mr. Franklin flew a kite

IDEA: Content that survives is good content. Good content is written/maintained by good authors.

V1: No reputation changes; no survival

Page 6: Andrew G. West  and Insup Lee August 28, 2012

Content Driven Rep.

6

V1 V2 V3V0

Article Version History

Initialization

AuthorsA1 A2 A3

V4A4

Mr. Franklin flew a kite

Your mom flew a plane

Damage

IDEA: When a subsequent editor allows content to survive, it has his/her implicit approval (and vice versa)

V2: Author A2 deletes most of A1’s content. Reputation of A1 is negatively impacted.

Page 7: Andrew G. West  and Insup Lee August 28, 2012

Content Driven Rep.

7

V1 V2 V3V0

Article Version History

Initialization Content Restoration

AuthorsA1 A2 A3

V4A4

Mr. Franklin flew a kite

Your mom flew a plane

Mr. Franklin flew a kite

Damage

IDEA: Survival is examined at depth

V3: Author A3 reverts A2’s content. Editor A1 gains reputation as his content is restored, A2 loses rep.

Page 8: Andrew G. West  and Insup Lee August 28, 2012

Content Driven Rep.

8

V1 V2 V3V0

Article Version History

Initialization Content Restoration

AuthorsA1 A2 A3

V4

Content Persistence

A4

Mr. Franklin flew a kite

Your mom flew a plane

Mr. Franklin flew a kite

Mr. Franklin flew a kite and …

Damage

IDEA: … and the process continues (depth=10)

V4: Author A1 and A3 accrue reputation, while A2 continues to receive reputation decrements.

Page 9: Andrew G. West  and Insup Lee August 28, 2012

In Practice

Implemented as WikiTrust [2, 3]• Token survival + edit distance captures novel

content as well as maintenance actions• Size of ∆ is: (1) proportional to degree of

change, (2) weighted by the rep. of the editor• Nice security properties–Implicit feedback–Symmetric evaluation–No self approval

9

Page 10: Andrew G. West  and Insup Lee August 28, 2012

WikiTrust Success

Live processing several language editions of Wikipedia; portable!

10

VANDALISM

Implementation [4] works on any MediaWiki installation

Page 11: Andrew G. West  and Insup Lee August 28, 2012

REPRESENTING AREPOSITORY ON

A WIKI PLATFORM

11

Page 12: Andrew G. West  and Insup Lee August 28, 2012

Repo. ↔ Wiki Model

12

1

2

3

4

6

7

5

9

tags/

trunk/

branches/ merge

Just replay history in a sequential fashion:•Repository ↔ wiki•Check-in ↔ edit•File ↔ article

Page 13: Andrew G. West  and Insup Lee August 28, 2012

Repo. ↔ Wiki Model

Minor accommodations: • Ignore tags• Ignore branches (merge

as a recommendation)• Multi-file check-in

13

1

2

3

4

6

7

5

9

tags/

trunk/

branches/ merge

Just replay history in a sequential fashion:•Repository ↔ wiki•Check-in ↔ edit•File ↔ article

Page 14: Andrew G. West  and Insup Lee August 28, 2012

Replay in Practice

1. [svnsync] produces local copy (not a checkout)2. [svn log] yields metadata script (see table)3. Pipe file versions into wiki via API

1. Log-in user (create account if needed)2. Use [svn cat path@id] syntax to yield content3. Make edit to article “path”. Logout.

14

ID USR COMMENT MOD PATH

1 U1 Initial check-in.A /trunk/core/header.c

A /trunk/core/misc.c

2 U2 Compilation error M /trunk/core/header.c

3 U1 Don’t need this D /trunk/core/misc.c

Page 15: Andrew G. West  and Insup Lee August 28, 2012

CASE STUDYINTRODUCTION

15

Page 16: Andrew G. West  and Insup Lee August 28, 2012

Mediawiki SVN• Case study repository: Mediawiki SVN [5]• http://hincapie.cis.upenn.edu/wiki_mediawiki/

16

PROPERTY ORIG MOD

Authors 326 271

Check-ins 91,808 53,715

File versions 585,629 117,432

… in trunk/ 420,613 117,432

Unique paths 138,741 7,521

… to PHP file 56,063 7,521

Further filtering:• Only PHP files

• Core language• No binary files• Tokenization

• Toss out i18n filesper late 2011

Page 17: Andrew G. West  and Insup Lee August 28, 2012

Mediawiki SVN• Case study repository: Mediawiki SVN [X]• http://hincapie.cis.upenn.edu/wiki_mediawiki/

17

PROPERTY ORIG MOD

Authors 326 271

Check-ins 91,808 53,715

File versions 585,629 117,432

… in trunk/ 420,613 117,432

Unique paths 138,741 7,521

… to PHP file 56,063 7,521

Further filtering:• Only PHP files

• Core language• No binary files• Tokenization

• Toss out i18n files

Wiki database is givento WikiTrust implementation:

Revision #A by J had ∆+0.75 on reputation of X=12.05Revision #B by K had ∆-42.00 on reputation of Y=0.5

Revision #B by K had ∆+16.75 on reputation of Z=1000.1… … …

Recall: An edit can change up to 10 reputations!

Page 18: Andrew G. West  and Insup Lee August 28, 2012

General Results (1)

18

Distribution of Final User Reputations • Reputations

lie on [0,20k]• 0.0 is the

initial rep. • ≈15 users

w/max. rep. Not always those w/most revs.

Page 19: Andrew G. West  and Insup Lee August 28, 2012

General Results (2)

19

Distribution of Update ∆s, by Magnitude • Majority of

updates are positive; evidence of a healthy community

• Most freq. update is 1-10 pt. increment

Page 20: Andrew G. West  and Insup Lee August 28, 2012

Example Reputations

20

Page 21: Andrew G. West  and Insup Lee August 28, 2012

EVALUATING REPUTATION ACCURACY

21

Page 22: Andrew G. West  and Insup Lee August 28, 2012

Evaluation Process

Find edits (Ex) where:

• Subsequent edit (Ex+1) resulted in non-trivial rep. loss for author

• Manually inspect comment, Bugzilla, diffs, and ask:“Would editor Ax+1 consider the previous change CONSTRUCTIVE, or UNCONSTRUCTIVE”?

• Could be a subjective mess, but…22

Ex+1

Ex

Non-trivialcontentremoval

Was this removal the result of ineptitude by the prior editor?

Page 23: Andrew G. West  and Insup Lee August 28, 2012

Classifying Rep. Loss (1)

23

Surprising number of obviously “bad” actions resulting in reverts. Editor calls out previous edit and/or editor explicitly:

“Password in plaintext! … DOESN'T WORK … don't put it in trunk!”“massive breakage with incomplete experimental changes”

“revert … spewing giant red HTML all over everything”

“failed, possibly other problems. NEEDS PARSER TESTS”“ten billion extra callouts …. clutter things up and trigger errors”

“… no apparent purpose … more complex and prone to breakage”

Page 24: Andrew G. West  and Insup Lee August 28, 2012

Classifying Rep. Loss (2)

24

Some cases are more ambiguous. The editor erred but its not immediately clear there should be significant penalty (NONFATAL):

Code showing no immediate errors:• But reverted (or branched) for testing

Issues unrelated to functional code: • Whitespace, comment/string changes

Page 25: Andrew G. West  and Insup Lee August 28, 2012

Evaluation Results

Per a conservative approach, anything not in the other two sets is CONSTRUCTIVE:

25

UNCONSTRUCTIVE NON-FATAL CONSTRUCTIVE51% 19% 30%

63% accuracy if we discount the “non-fatal” cases70% accuracy if we interpret them as “unconstructive”Interpret how you wish; purposely a naïve application

Concentrate on false-positives:Can the algorithm be improved?

Page 26: Andrew G. West  and Insup Lee August 28, 2012

IDENTIFYING & FIXINGFALSE POSITIVES +

EVALUATION

26

Page 27: Andrew G. West  and Insup Lee August 28, 2012

False Positives (1)

SVN does not handle RENAME elegantly:

27

file.c

file_renamed.cADD

DEL

Consequences: Authors of [file.c] punished; provenance lost; renamer gets all credit.

Solutions: Detect via hash; simple wiki “move”

Page 28: Andrew G. West  and Insup Lee August 28, 2012

False Positives (2.1)

28

INTER-DOCUMENT REORGANIZATION is problematic for WikiTrust

file1.c >>

file2.c >>

file3.c >> ...

Entire code-base as one giant doc. –global diff!

func_b(){…}func_c(){…}

file_1.c

func_c(){…}……

file_2.c

--- ∆ +++ ∆

Solution: Examine all diff ∆; sub-string matching; replay history. Intra-doc reorg. is a non-issue!

Page 29: Andrew G. West  and Insup Lee August 28, 2012

False Positives (2.2)

29

INTER-DOCUMENT REORGANIZATION is problematic for WikiTrust

file1.c >>

file2.c >>

file3.c >> ...

Entire code-base as one giant doc.

Solution: Intra-document reorg. is non-issue!; Global diff; substring matching; replay history.

func_b(){…}func_c(){…}

file_1.cfunc_c(){…}……

file_2.c

--- ∆ +++ ∆

[This is the content block being moved]

A1 – V1

A2 – V2

A3– V3

[This is the same block 3 edits ago]

Destination doc. history

A1

A2

A3

!

Page 30: Andrew G. West  and Insup Lee August 28, 2012

False Positives (2.3)

30

INTER-DOCUMENT REORGANIZATION is problematic for WikiTrust

file1.c >>

file2.c >>

file3.c >> ...

Entire code-base as one giant doc.

Solution: Intra-document reorg. is non-issue!; Global diff; substring matching; replay history.

func_b(){…}func_c(){…}

file_1.cfunc_c(){…}……

file_2.c

--- ∆ +++ ∆

TRANSCLUSION!

A1

A2

A3 text{{sect}} text

A1

A2

A3

sec. txt. sec. txt.sec. txt.

New doc.Old doc.

Page 31: Andrew G. West  and Insup Lee August 28, 2012

False Positives (3)

REVERT CHAINS cause big penalties:

31

+++ BIG CODE CHANGES

“Revert: Needs testing first”

+++ BIG CODE CHANGES

identicalnearly identical

V0 V1 V2 V3

Consequences: At V2, A1 loses reputation (a NONFATAL).

Solution: Revert chains rare; manual inspection?

testing done

At V3, A2 is wrongly punished.

Page 32: Andrew G. West  and Insup Lee August 28, 2012

False Positives (4)

• Initially 30 false positive cases– If “solutions” were implemented– This number would be just 10– Suggestions accuracies of 80-90%

• And those 10 cases?– Benign code evolution– Feature requests; method deprecation; no fault

• Results similar for [ruby] and [httpd]32

Page 33: Andrew G. West  and Insup Lee August 28, 2012

Better Evaluation

• POC evaluation lacking in many ways– Not enough examples. Subjective.– Says nothing about true negatives

• Bug attribution is extremely difficult– Corpus: “X erred at rev. Y with severity {L,M,H}”– If it could be automated; problem solved!– Work backwards from Bugzilla? Developers?– Reputation as a predictor of future loss events.

• Qualitative instead of quantitative measures33

Page 34: Andrew G. West  and Insup Lee August 28, 2012

Other Optimization

• Lots of free variables, weights, ceilings

34

// this is a loopfor(int i=0;i<10;i++) print(“Some text”); for ( int i = 0 ; i < 10 ; i++ ){

print( “” );}

Canonical code

for ( int i = 0; i < 10; i++ ){ print( “” );} for ( int i = 0 ; i < 10 ; i++ ){

print( “” );}

Tokenization

Page 35: Andrew G. West  and Insup Lee August 28, 2012

USE-CASES &CONCLUSIONS

35

Page 36: Andrew G. West  and Insup Lee August 28, 2012

Use-case: Small Projects

• Small/non-production proj.– Conflict, not just tokens!

• Undergraduate research– Who did all the work?

• Academic paper repositories– Automatic author order!

• Collaboration or conflict?– Graph of reputation events

36

A B

C D

Faction #1

Faction #2

-

+

+

++

--

-

-

-

Page 37: Andrew G. West  and Insup Lee August 28, 2012

Use-cases (2)MEDIAWIKI• Alert service/warnings (anti-vandal style)• Expediting code review• Permission granting/revocation

37

Page 38: Andrew G. West  and Insup Lee August 28, 2012

Use-cases (2)MEDIAWIKI• Alert service/warnings (anti-vandal style)• Expediting code review• Permission granting/revocation

VEHICLEFORGE.MIL• Access control for users/commits• Wrap content-persistent reputation with metadata

features for a stronger classifier [6]• Robustness considerations (i.e., reach-ability)

38

Page 39: Andrew G. West  and Insup Lee August 28, 2012

Conclusions• Despite high-(er) barriers to entry, bad things still

happen in production repositories!• Content-persistence is a reasonably accurate way to

identify these instances ex post facto• False positives indicate code uniqueness:

– 1. Non-functional aspects are non-trivial (WS, comments)– 2. Inter-document reorganization is common– 3. Quality-assurance is more than surface level

• Evaluation needs to be more rigorous• A variety of use-cases if it becomes production-ready

39

Page 40: Andrew G. West  and Insup Lee August 28, 2012

References

40

[1] Lohr, Steve. “Pentagon Pushes Crowdsourced Manufacturing”. New York Times “Bits Blog”. April 5, 2012.

[2] Adler, B.T. and L. de Alfaro. “A Content-Driven Reputation System for Wikipedia”. In WWW 2007: Proc. of the 16th Intl. World Wide Web Conference.

[3] Adler, B.T., et al. “Measuring Author Contributions to Wikipedia”. In WikiSym 2008: Proc. of the 3rd Intl. Symposium on Wikis and Open Collaboration.

[4] WikiTrust online. http://www.wikitrust.net/

[5] Mediawiki SVN. http://svn.wikimedia.org/viewvc/mediawiki/ (note: this an archive of that resource, Git is the currently used repository software)

[6] Adler, B.T. et al. “Wikipedia Vandalism Detection: Combining Natural Language, Metadata, and Reputation Features”. In CICLing 2011: Proc. of the 12th Intl. Conference on Intelligent Text Processing and Computational Linguistics.

[Ø] Mediawiki Developer Hub. http://www.mediawiki.org/wiki/Developer_hub