Code coverage for MSR Researches [Work in Progress]

Code Coveragefor MSR Researchers

Mauricio [email protected]

Monday, November 18, 13

mailto:[email protected]


What is Code Coverage?

• Describes how much a production code is tested by the test suite.

• It basically counts the numbers of executed lines (when running the test suite) divided by the number of total lines.


Why Do We Need This?

• It is hard to calculate code coverage when studying a large quantity of repositories.

• Compiled code needed

• Test suite execution needed

• As we know, every project contains a different way to compile/run.


Statical Analysis

• Statical Analysis would solve this problem.

• It is impossible to execute the code statically.

• We need heuristics!


Our idea

• A production method contains a certain level of complexity (which can be measured by McCabe’s number)

• public void a() { if(x) return 1; else return 2;}

• If a method contains 2 different paths, then it probably needs two different tests.


Our formula

• Method-level: Qty of tests / McCabe’s number

• Class-Level: Sum(Qty of tests per method) / Sum(McCabe’s number per method)


Identifying test methods

@Test public void testaOMetodo2() { A a = new A(); int resultado = a.fazAlgo(); Assert.assertEquals(1, resultado); }

@Test public void testaOMetodo() { A a = new A(); int resultado = a.getB().fazAlgo(); Assert.assertEquals(1, resultado); }

1st impl.tests fazAlgo()

2nd impl.tests getB(),

fazAlgo()


Comparing the solution

• I want to compare to Emma (a tool that does dynamic analysis on the source code)

• I don’t want to replace the tool (it does not make sense)

• I want to discover the error average

• If it is small, then we can use it.


Calculating the difference

• All charts were based on the difference between our calculated number minus Emma’s number.

• It means that a “0” means that the numbers were the same

• A negative number indicates the our tool calculates a smaller code coverage than Emma.

• A positive number, the other way around.


A few examples


Spearman Correlation


Metric Miner


Gnarus


Caelum Web


Discussion

• Looks like the tool can differ from dynamic analysis by 25%~30%.

• Questions:

• How can I eliminate big mistakes?

• How can I determine if the tool is valid or not?


Advantages

• Really fast. It does not need to compile and run the tests.

• If the test fails, dynamic analysis may fail. Static analysis do not.


Disadvantages

• It is an heuristic.

• The implementation is very complicated.

• There might be bugs on the implementation.

• There are a few things that is pretty hard to identify. Mainly inheritance and polymorphism.

• AOP code.


Wanna help?

• github.com/mauricioaniche/gelato2

• github.com/metricminer-msr/codemetrics

• My goal: MSR MSR MSR!


Thanks!

• [email protected]




Code coverage for MSR Researches [Work in Progress]

Technology

Transcript of Code coverage for MSR Researches [Work in Progress]