Reducing Redundancies in Multi-Revision Code Analysis

Carol V. Alexandru, Sebastiano Panichella, Harald C. GallSoftware Evolution and Architecture Lab

University of Zurich, Switzerland{alexandru,panichella,gall}@ifi.uzh.ch

22.02.2017

SANER’17

Klagenfurt, Austria

The Problem Domain

• Static analysis (e.g. #Attr., McCabe, coupling...)

The Problem Domain

v0.7.0 v1.0.0 v1.3.0 v2.0.0 v3.0.0 v3.3.0 v3.5.0

The Problem Domain

v0.7.0 v1.0.0 v1.3.0 v2.0.0 v3.0.0 v3.3.0 v3.5.0

• Many revisions, fine-grained historical data

A Typical Analysis Process

select project

checkout

select project

select revision

checkout analysis tool

select project

select revision

applytool

storeanalysis

results

select project

select revision

applytool

storeanalysis

results

more revisions?

select project

select revision

applytool

storeanalysis

results

more revisions?

more projects?

Redundancies all over...

Redundancies in historical code analysis

Impact on Code Study Tools

Few files change

Only small parts of a file change

Across RevisionsImpact on Code

Study Tools

Few files change

Study Tools

Repeated analysis of "known" code

Few files change

Changes may not even affect results

Study Tools

Storing redundant results

Few files change

Across Languages

Each language has their own toolchain

Yet they share many metrics

Study Tools

Few files change

Across Languages

Study Tools

Re-implementing identical analyses

Generalizability is expensive

Few files change

Across Languages

Study Tools

Re-implementing identical analyses

Generalizability is expensive

Important!

Techniques implemented

in LISA

Your favouriteanalysis features

Pick what you like!

#1: Avoid Checkouts

Avoid checkouts

checkout

read write

Avoid checkouts

checkout

read write

analyze

Avoid checkouts

checkout

For every file: 2 read ops + 1 write opCheckout includes irrelevant filesNeed 1 CWD for every revision to be analyzed in parallel

read write

analyze

Avoid checkouts

clone analyze

Avoid checkouts

clone analyze

Only read relevant files in a single read opNo write opsNo overhead for parallization

Avoid checkouts

clone analyze

Analysis Tool

File Abstraction Layer

Avoid checkouts

clone analyze

Analysis Tool

E.g. for the JDK Compiler:

class JavaSourceFromCharrArray(name: String, val code: CharBuffer)extends SimpleJavaFileObject(URI.create("string:///" + name), Kind.SOURCE) {override def getCharContent(): CharSequence = code

Avoid checkouts

clone analyze

Analysis Tool

E.g. for the JDK Compiler:

class JavaSourceFromCharrArray(name: String, val code: CharBuffer)extends SimpleJavaFileObject(URI.create("string:///" + name), Kind.SOURCE) {override def getCharContent(): CharSequence = code

#2: Use a multi-revision representationof your sources

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

rev. 1

rev. 2

rev. 3

rev. 4

rev. 1

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

rev. 2

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

rev. 3

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4

rev. range [1-4]

rev. range [1-2]

Merge ASTs

rev. 1

rev. 2

rev. 3

rev. 4AspectJ (~440k LOC):

1 commit: 2.2M nodesAll >7000 commits: 6.5M nodes

Merge ASTs

rev. 1

rev. 2

rev. 3

Merge ASTs

rev. 1

rev. 2

rev. 3

#3: Store AST nodes only ifthey're needed for analysis

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

System.out.println(i)

What's the complexity (1+#forks) and name for each method and class?

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

140 AST nodes(using ANTLR)

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

parseCompilationUnit

TypeDeclaration

Modifiers

public

Members

Method

Modifiers

public

Parameters ReturnType

PrimitiveType

Statements

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

parse filtered parse

TypeDeclaration

Method

DemoForStatement

IfStatement

ConditionalExpression

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

What's the complexity (1+#forks) and name for eachmethod and class?

TypeDeclaration

Method

DemoForStatement

IfStatement

public class Demo {

public void run() {

for (int i = 1; i< 100; i++) {

if (i % 3 == 0 || i % 5 == 0) {

What's the complexity (1+#forks) and name for eachmethod and class?

TypeDeclaration

Method

DemoForStatement

IfStatement

#4: Use non-duplicative data structuresto store your results

rev. 1

rev. 2

rev. 3

rev. 4

rev. 1

rev. 2

rev. 3

rev. 4

rev. 1

rev. 2

rev. 3

rev. 4

InnerClass

rev. 1

rev. 2

rev. 3

rev. 4 [1-1]

InnerClass

LISA also does:#5: Parallel Parsing#6: Asynchronous graph computation#7: Generic graph computationsapplying to ASTs from compatiblelanguages

To Summarize...

The LISA Analysis Process

select project

parallel parseinto mergedgraph

select project

Language Mappings(Grammar to Analysis)

determines which ASTnodes are loaded

ANTLRv4Grammar used by

Generates Parser

select project

storeanalysis

results

Analysis formulated as Graph Computation

Language Mappings(Grammar to Analysis) used by

Async. compute

determines whichdata is persisted

Generates Parser

runson graph

select project

storeanalysis

results

more projects?

Analysis formulated as Graph Computation

Language Mappings(Grammar to Analysis) used by

Async. compute

determines whichdata is persisted

Generates Parser

runson graph

How well does it work, then?

Marginal cost for +1 revision

41.670

4.6330.525 0.109 0.082 0.071 0.052 0.041 0.032 0.033

1 10 100 1000 2000 3000 4000 5000 6000 7000

Average Parsing+Computation time per Revision when analyzing n revisions of AspectJ (10 common metrics)

# of revisions

Overall Performance Stats

Language Java C# JavScript

#Projects 100 100 100

#Revisions 646'261 489'764 204'301

#Files (parsed!) 3'235'852 3'234'178 507'612

#Lines (parsed!) 1'370'998'072 961'974'773 194'758'719

Total Runtime (RT)¹ 18:43h 52:12h 29:09h

Median RT¹ 2:15min 4:54min 3:43min

Tot. Avg. RT per Rev.² 84ms 401ms 531ms

Med. Avg. RT per Rev.² 30ms 116ms 166ms

¹ Including cloning and persisting results² Excluding cloning and persisting results

What's the catch?(There are a few...)

The (not so) minor stuff

• Must implement analyses from scratch

• No help from a compiler

• Non-file-local analyses need some effort

The (not so) minor stuff

• Must implement analyses from scratch

• No help from a compiler

• Non-file-local analyses need some effort

• Moved files/methods etc. add overhead

• Uniquely identifying files/entities is hard

• (No impact on results, though)

Language matters

Javascript C# Java

E.g.: Javascript takes longer because:• Larger files, less modularization• Slower parser (automatic semicolon-insertion)

LISA is E X TREME

complexfeature-richheavyweight

simplegeneric

lightweight

Thank you for your attention

SANER ‘17, Klagenfurt, 22.02.2017

Read the paper: http://t.uzh.ch/Fj

Try the tool: http://t.uzh.ch/Fk

Get the slides: http://t.uzh.ch/Fm

Contact me: alexandru@ifi.uzh.ch

Reducing Redundancies in Multi-Revision Code Analysis

Presentations & Public Speaking

Transcript of Reducing Redundancies in Multi-Revision Code Analysis

satrust.comsatrust.com/wp-content/uploads/...2-revision-pack.docx · Web viewMeasuring development and reducing the gap (Tourism in Kenya) NIC case study (India) C: Challenge of

Guide to Redundancies and Reductions-in-FoRce in - DLA Piper

How Redundancies Increase Your Antifragility

PR BUD Fundswearing apparel for a financial contribution from the EGF following 1 161 redundancies in the economic sector classified under the NACE Revision 2 Division 14 (Manufacture

********************************************************** 02 53 16.pdf · 1.6.2 Redundancies 1.6.3 Displays and Data ... UNIFIED FACILITIES GUIDE SPECIFICATIONS ... ASME PTC 19.3

R041 Revision BOOKLET - King's Liverpool · Web viewR041 Revision BOOKLET R041 Revision BOOKLET R041 Revision BOOKLET Key Stage Four Physical Education: Year 11 Sport Science: Reducing

Port state inspections pocket checklist Revision 3 · 2018-08-05 · pocket checklist Revision 3 Reducing the risk of port state control detentions ... officers on ships classed by

Efficient Discovery of XML Data Redundancies

Exploring the Use of Multiple Modular Redundancies for ...

On-line Elimination of Local Redundancies in Evolving Fuzzy Systems

Census Enterprise Data Collection and Processing (CEDCaP)...2016/05/25 · but not limited to, reducing redundancies through shared services, emphasizing standards-based software,

Collective Redundancies: Information, Consultation and Protection By Giles Powell.

Today’s agenda: Cutting jargon, trimming sentences, and avoiding redundancies.

Prelims Revision Test Corrections 1 25 - · PDF filePrelims Revision Test Corrections 1 – 25 ... It has the objective of reducing maternal and infant mortality by promoting institutional

· Web viewDocument signed 30 August 1993 Privatisation complete Labour force level 145 Forced redundancies 0 Volunteer redundancies 0

Year 13 Summer Homework - The Hazeley Academy · A level Biology Summer homework . ... focus on Benedict's solution for reducing sugars and non-reducing sugars, ... the revision required

all... · Web viewRevise VM-31 throughout all sections to achieve the following: (1) minimize redundancies between the Overview Section and the Main Report by reducing Overview Section

Oppose LCC Course Redundancies

Chapter 13 Genetic Redundancies and Their Evolutionary ...

Global Optimization by Suppression of Partial Redundancies について