Integrating Artificial Intelligence Software Testingin.bgu.ac.il/en/engn/ise/QT/Documents/Artificial...
Transcript of Integrating Artificial Intelligence Software Testingin.bgu.ac.il/en/engn/ise/QT/Documents/Artificial...
Integrating Artificial Intelligenceiin
Software Testing
Roni Stern and Meir Kalech, ISE department, BGU
Niv Gafni, Yair Ofir and Eliav Ben-Zaken, Software Eng., BGU
1
AbstractAbstract
Di i
Artificial Intelligence
DiagnosisPlanning
Software Engineering
TestingTesting
2
Testing is Important• Quite a few software development paradigms• All of them recognize the importance of testing• All of them recognize the importance of testing• There are several types of testing
i i bl k b (f i l) i– E.g., unit testing, black-box (functional) testing ….
Tester• Runs tests • Follows a test plan
Programmer• Write programs• Follows spec
4
• Follows a test plan• Goal: find bugs!
• Follows spec.• Goal: fix bugs!
Handling a BugHandling a Bug
1. Bugs are reported by the tester- “Screen A crashed on step 13 of test suite 4…”p
2. Prioritized bugs are assigned to programmers3. Bugs are diagnosed
- What caused the bug?g
4. Bugs are fixedH f ll- Hopefully…
5
Why is Debugging Hard?Why is Debugging Hard?
• The developer needs to reproduce the bug• Reproducing (correctly) a bug is non-trivialReproducing (correctly) a bug is non trivial
– Bug reports may be inaccurate or missingS f h h i l– Software may have stochastic elements• Bug occurs only once in a while
– Bugs may appear only in special cases
Let the tester provide more information!
6
Introducing AI!Introducing AI!es
ter Run a test suite
Fi d b mer Diagnose
Fi h bTe Find a bug
rogr
am
Fix the bug
Pr
Test
er Run a test suiteFind a bug
AI Compute Diagnoses m
mer Fix the bug
T Find a bugPlan next tests
Prog
ram
8
Test Diagnose and Plan (TDP)Test, Diagnose and Plan (TDP)r r
Test
er Run a test suiteFind a bug
AI Compute DiagnosesPl t t t am
mer Fix the bug
Plan next tests
Prog
ra
1. The tester runs a set of planned tests (test suite)2. The tester finds a bug3. AI generates possible diagnoses4. If there is only one candidate – pass to programmer5 Else AI plans new tests to prune candidates5. Else, AI plans new tests to prune candidates
9
DiagnosisDiagnosis
Find the reason of a problem given observed symptoms
Requirement: knowledge of the diagnosed systemGi b• Given by experts
• Learned by AI techniques
10
Model Based DiagnosisModel-Based Diagnosis
• Given: a model of how the system works
I f h t tInfer what went wrong
• Example:Example:
IF ok(battery) AND ok(ignition) THEN start(engine)
• What if the engine doesn’t start?
11
Where is MBD applied?Where is MBD applied?
i• Printers (Kuhn and de Kleer 2008)
• Vehicles(Pernestål et. al. 2006)
• Robotics (Steinbauer 2009)(Steinbauer 2009)
• Spacecraft - Livingstone (Williams and Nayak, 1996; Bernard et al., 1998)
….
12
Software is Difficult to ModelSoftware is Difficult to Model
• Need to code how the software should behave– Specs are complicated to modelp p
• Static code analysis ignores runtimeP l hi– Polymorphism
– Instrumentation
…
13
Zoltar [Ab t l 2011’]Zoltar [Abrue et. al. 2011’]
• Construct a model from the observations• Observations should include execution tracesObservations should include execution traces
– Functions visited during executionOb d b / b– Observed outcome: bug / no bug
• Weaker modelOk(function1) function1 outputs valid valueA bug entails that at least one comp was not OkA bug entails that at least one comp was not Ok
14
Execution MatrixExecution Matrix
• A key component in Zoltar is the execution matrix• It is built from the observed execution traces
– Observation 1 (BUG) : F1 F5 F6 – Observation 2 (BUG) : F2 F5( )– Observation 3 (OK) : F2 F6 F7 F8
F1 F2 F3 F4 F5 F6 F7 F8 BugF1 F2 F3 F4 F5 F6 F7 F8 Bug
Obs1 1 0 0 0 1 1 0 0 1
Obs2 0 1 0 0 1 0 0 0 1
Obs3 0 1 0 0 0 1 1 1 0Obs3 0 1 0 0 0 1 1 1 0
15
Diagnosis = Hitting Sets of ConflictsDiagnosis = Hitting Sets of Conflicts
In every BUG trace at least one function is faulty• Observations:Observations:
– Observation 1 (BUG) : F1 F5 F6 Ob i 2 (BUG) F2 F5– Observation 2 (BUG) : F2 F5
• Conflicts:– Ok(F1) AND Ok(F5) AND Ok(F6)
Ok(F2) AND Ok(F5) Hitting sets of Conflicts– Ok(F2) AND Ok(F5)
• Possible diagnoses: {F5} ,{F1,F2},{F6,F2}
Hitting sets of Conflicts
16
Software Diagnosis with ZoltarSoftware Diagnosis with Zoltar
P i itiBug Report Prioritize Diagnosesg
ZoltarSet of
Candidate Diagnoses
17
Plan More TestsPlan More Tests
Bug Report
AIEngineSet of
Possible Diagnoses
Suggest New Test to P C did
18
Prune Candidates
Plan More TestsPlan More Tests
Bug Report
AIEngine A single diagnosisA single diagnosis
Suggest New Test to P C did
19
Prune Candidates
KnowledgeKnowledge
• Most probable cause: Eat Pill• Most easy to check: StopMost easy to check: Stop
30
ObjectiveObjective
Plan a sequence of testsf d h b dto find the best diagnosis
with minimal tester actionswith minimal tester actions
31
1 Highest Probability (HP)1. Highest Probability (HP)
h b f f• Compute the prob. of every function Ci
= Sum of prob. of candidates containing Cip g i
• Plan a test to check the most probable function
0.2F1
0.4 0.2
0 4
0.4F5F2
F6
32
0.4
1 Highest Probability (HP)1. Highest Probability (HP)
h b f f• Compute the prob. of every function Ci
= Sum of prob. of candidates containing Cip g i
• Plan a test to check the most probable function
0.2F1
0.4 0.6
0 4F5
F2
F6
33
0.4
2 Lowest Cost (LC)2. Lowest Cost (LC)
d l f h ( )• Consider only functions Ci with 0<P(Ci)<1• Test the closest function from this set
F1
F5F2
F6
34
3 Entropy-based3. Entropy-based• A test α = a set of components
Ω set of candidates that assume α will pass– Ω+ = set of candidates that assume α will pass – Ω- = set of candidates that assume α will fail– Ω? = set of candidates that are indifferent to α?
• Prob. α will pass = prob. of candidates in Ω+
• Choose test with lowest Entropy(Ω+ ,Ω ,Ω?)py( + , - , ?)= highest information gain
0.2F1
0.4 0.2
0 4
0.4F5F2
F6
35
0.4
4 Planning Under Uncertainty4. Planning Under Uncertainty
• Outcome of test is unknownwe would like to minimize expected cost
• Formulate as Markov Decision Process (MDP)– States: set of performed tests and their outcomesStates: set of performed tests and their outcomes– Actions: Possible tests – Transition: Pr(Ω ) Pr(Ω ) and Pr(Ω?)Transition: Pr(Ω+), Pr(Ω-) and Pr(Ω?)– Reward (cost): Cost of performed tests
• Current solver uses myopic sampling• Current solver uses myopic sampling
36
Preliminary Experimental ResultsPreliminary Experimental Results
• Synthetic code• SetupSetup
– Generated random call graphs with 300 nodesG d f h d ll h– Generate code from the random call graphs
– Injected 10/20 random bugs – Generate 15 random initial tests
• Run until a diagnosis with prob >0 9 is found• Run until a diagnosis with prob.>0.9 is found– Results on 100 instances
37
Preliminary Experimental ResultsPreliminary Experimental Results
• HP and Entropy perform worse than (LC) and MDP• probability based (HP and Entropy) perform worse than the cost based approach. • MDP which considers cost and probability outperforms the others
38
• MDP which considers cost and probability outperforms the others• 20 bugs is more costly for all algorithms than 10 bugs
Preliminary Experimental Results NUnitsPreliminary Experimental Results - NUnits
• Real code• Well-known testing framework for C#Well known testing framework for C#• Setup
– Generated call graphs with 302 nodes from NUnits– Injected 10,15,20,25,30 random bugs j g– Generate 15 random initial tests
• Run until a diagnosis with prob >0 9 is found• Run until a diagnosis with prob.>0.9 is found– Results on 110 instances
39
Preliminary Experimental Results NUnitsPreliminary Experimental Results - NUnits
• Similar results• The cost increases with the probability bound
40
p y• MDP which considers cost and probability outperforms the others
ConclusionConclusion
• The test, diagnose and plan (TDP) paradigm:– AI diagnoses observed bug with MBD algorithmg g g– AI plans further tests to find correct diagnosis
E t t i AI• Empowers tester using AI– Bug diagnosis done by tester+AI
41