Bug Localization with Machine Learning Techniques Wujie Zheng [email protected].
-
Upload
noel-taylor -
Category
Documents
-
view
231 -
download
2
Transcript of Bug Localization with Machine Learning Techniques Wujie Zheng [email protected].
Background of Bug Localization Software is far from bug-free Debugging is a methodical process of finding
and correcting the bugs in a program Manual debugging is laborious and expensive It is possible to automatic or semi-automatic this
process Bug localization is to find a set or a ranking of
source code locations that are likely buggy through automatic analysis
Program Slicing
Debugging with Program Slicing Start from the inputs or the variables that cause the failure (but not root
cause), find all statements that may cause the failure Program Dependence Graph (PDG)
Control dependence Data dependence
Static Slice vs. Dynamic Slice Static slice: all statements that may affect the value of a variable at a
program point for any arbitrary execution of the program Dynamic slice: all statements that actually affect the value of a variable at a
program point for a particular execution of the program Forward Slice vs. Backward Slice
Forward slice: the statements that are affected by the value of variable v at statement s
Backward slice: the statements that affect the value of variable v at statement s
PDG void main() {
int sum = 0; int i = 1; while (i<11) {
sum = add(sum,i); i = add(i, 1);
} printf(“sum=%d\n”, sum); printf(“i=%d\n”, i);
} static int add(int a, int b) {
return (a+b); }
In the illustration above, control dependence edges are shown in blue and datadependence edges are shown in green. Intraprocedural dependences are shown with solid lines while interprocedural dependences are shown with dashed lines.
Experimental Methods
Delta Debugging Generalize and simplify some failing test case to a
minimal test case that still produces the failure Separate the test case into two sub tests, choose
the sub test that fails, and repeat the process. When both sub tests pass, perform another divide-and-test again
Essentially a binary search algorithm
Machine Learning Methods
Problem Setting A set of failing and passing test cases are
observed (test cases as the samples) The statements are instrumented and some
traces are collected (statements as the features) Find a set or a ranking of the statements that are
likely buggy
Machine Learning Methods
Direct correlation analysis Tarantula LIBLIT05
Build a classification model of the test cases (feature selection) Logistic regression Decision tree
Build a behavior model of the statements from passing test cases, and then check violations in failing test cases Dynamic invariants SOBER PPDG
Logistic regression The logistic function
Maximize the likelihood of the training set
The regularization term
The input features are normalized first The larger the resulting coefficient is, the
more suspicious the statement is
Dynamic invariants
Given a set of test cases, mine the implicit rules over the variables
The templates Invariants over any variable: x=a … Invariants over two numeric variables: y=ax+b … Invariants over three numeric variables: y=ax+by+c … …
SOBER
SOBER [Liu05]
the probability density function of the evaluation bias of P on passing runs and failing runs respectively
The bug relevance score of P is then defined as the difference between them
Probabilistic Program Dependence Graph (PPDG) Conditional Independence
The state of a statement is independent of the previous statements conditioned on its immediate parents
Dependence Network Allow cycles, suitable for loops in programs
Usage in debugging The distributions of conditional probabilities are learned
from the passing runs The lower the conditional probability of a state has, the
more suspicious the statement it belongs to is
Our Work
Belongs to the group of “Build a classification model of the test cases (feature selection)” Association Rule Mining
The procedure Mine all the strong (high frequency and high confidence)
association rules X=>fail Select the rule X’=> with highest confidence (the best
classification model of this kind) Output the statement set X’ as the results
Problems: It is hard to justify that such rule (only conjunction phrase) based
model is suitable The resulting rule may contain lots of statements. Further ranking
scheme is still needed High computational overhead
Our Work
Belongs to the group of “Build a classification model of the test cases (feature selection)” Feature Subset Selection
The procedure Iteratively evaluates a candidate subset of features, then modifies
the subset and evaluates if the new subset is an improvement over the old.
Greedy forward selection Evaluation of the subsets requires a scoring metric that grades a
subset of features. F-measure (i.e., harmonic average of precision and recall,
1/(1/precision+1/recall)) Problems
Too simple, just a traditional method
How to improve the work?
Adopt the association rule mining method to mine the behavior models Association rule mining is better for mining
frequent patterns than building classification models
Difficulties: Usage pattern may dominate, while failure statements
related pattern may be not frequent If we consider relative frequent pattern for every
statements, there may be too many results to be mined and used effectively
Consider program dependence graph may help
How to improve the work?
Consider the debugging problem of specific bugs such as data race Data race is an important kinds of bugs, and it is
difficult to debug Sequential pattern mining? Combine dynamic analysis with machine learning
Existing dynamic analysis methods can provide valuable features, e.g., the potential data race in a certain execution (many false positives)
Other topics of automated debugging Failure classification
NLP Failure trace clustering NLP + Failure trace clustering
Failure replaying Record the failures in user sites compactly and
reproduce it in developer sites Debugging with replaying
Traverse to any point of an execution without restarting the program
Query the trace database