Iterative Context Bounding for Systematic Testing of Multithreaded Programs
description
Transcript of Iterative Context Bounding for Systematic Testing of Multithreaded Programs
Iterative Context Bounding for Systematic Testing of Multithreaded Programs
Madan MusuvathiShaz Qadeer
Microsoft Research
Testing multithreaded programs is HARD
Specific thread interleavings expose subtle errorsTesting often misses these errors
Even when found, errors are hard to debugNo repeatable traceSource of the bug is far away from where it manifests
Current practiceConcurrency testing == Stress testing
Example: testing a concurrent queueCreate 100 threads performing queue operationsRun for days/weeksPepper the code with sleep( random() )
Stress increases the likelihood of rare interleavingsMakes any error found hard to debug
CHESS: Unit testing for concurrencyExample: testing a concurrent queue
Create 1 reader thread and 1 writer threadExhaustively try all thread interleavings
Run the test repeatedly on a specialized scheduler
Explore a different thread interleaving each timeUse model checking techniques to avoid redundancy
Check for assertions and deadlocks in every runThe error-trace is repeatable
State space explosion
x = 1;y = 1;x = 1;y = 1;
x = 2;y = 2;x = 2;y = 2;
2,12,1
1,01,0
0,00,0
1,11,1
2,22,2
2,22,22,12,1
2,02,0
2,12,12,22,2
1,21,2
2,02,0
2,22,2
1,11,1
1,11,1 1,21,2
1,01,0
1,21,2 1,11,1
y = 1;y = 1;
x = 1;x = 1;
y = 2;y = 2;
x = 2;x = 2;
Init state: x = 0, y = 0
x = 2; … … … … … y = 2;
x = 2; … … … … … y = 2;
State space explosion
x = 1; … … … … …y = 1;
x = 1; … … … … …y = 1;
…
n threads
k steps each
Number of executions = O( nnk )
Exponential in both n and kTypically: n < 10 k > 100
Limits scalability to large programs (large k)
Techniques
Iterative context boundingStrategy for searching large state spaces
State space optimizationReduces the size of the state space
x = 1;if (p != 0) { x = p->f;}
x = 1;if (p != 0) { x = p->f;}
Iterative context bounding
x = p->f;} x = p->f;}
x = 1;if (p != 0) {x = 1;if (p != 0) {
p = 0;p = 0;
preemption
non-preemption
Iterative context-bounding algorithmThe scheduler has a budget of c preemptions
Nondeterministically choose the preemption points
Resort to non-preemptive scheduling after c preemptionsRun each thread to the next yield point
Once all executions explored with c preemptionsTry with c+1 preemptions
Iterative context-bounding has desirable propertiesProperty 0: Easy to implement
Property 1: Polynomial state spacen threads, k steps each, c preemptions
Number of executions <= nkCc . (n+c)!
= O( (n2k)c. n! )
Exponential in n and c, but not in k
x = 1; … … … … …y = 1;
x = 1; … … … … …y = 1;
x = 2; … … … … … y = 2;
x = 2; … … … … … y = 2;
x = 1; … … … …
x = 1; … … … …
x = 2; … … …
x = 2; … … …
…y = 1; …y = 1;
… … … …
y = 2;y = 2;
• Choose c preemption points
• Permute n+c atomic blocks
Property 2: Deep exploration possible with small boundsA context-bounded execution has unbounded depth
A thread may execute unbounded number of steps within each context
Can reach a terminating state from an arbitrary state with zero preemptionsPerform non-preemptive schedulingLeave the number of non-preemptions unbounded
Property 3: Coverage metricIf search terminates with c preemptions,
any remaining error must require at least c+1 preemptions
Intuitive estimate forthe complexity of the bugs remaining in the programthe chance of their occurrence in practice
Property 4: Finds the ‘simplest’ error traceFinds the smallest number of preemptions to the
error
Number of preemptions better metric of error complexity than execution length
Property 5: Lots of bugs with small number of preemptions
Program KLOC Max Num Threads
Bugs Reachable with Preemption Count
0 1 2 3 Total
Bluetooth 0.4 3 0 1 0 0 1
Work-Stealing Queue
1.3 3 0 1 2 0 3
Transaction Manager
7.0 2 0 0 2 1 3
APE 18.9 4 2 1 1 - 4
Dryad Channels 16.0 5 1 5 1 - 7
Most states are covered with small number of preemptions
Coverage vs Time (Dryad)
Techniques
Iterative context-boundingStrategy for searching large state spaces
State space optimization
Optimization for race-free programs Insert context-switches only at synchronization points
Massive state-space reductionNum steps (k) = num synch. operations (not memory accesses)
Run data-race detection to check race-free assumptionGoldilocks algorithm [PLDI ’07] implemented for x86
Theorem: When search terminates for context-bound cEither find an erroneous executionOr find a data-raceOr the program has no errors reachable with c preemptions
ConclusionIterative context-bounding algorithm
Effective search strategy for multi-threaded bugs
Exposes many concurrency bugs
Implemented in the CHESS model checking toolApplying CHESS to Windows drivers, SQL, Cosmos,
Singularity
Visit http://research.microsoft.com/projects/CHESS/
Extra Slides
Partial-order reductionMany thread interleavings are equivalent
Accesses to separate memory locations by different threads can be reordered
Avoid exploring equivalent thread interleavings
Optimistic dynamic partial-order reduction Algorithm [Bruening ‘99] :
Assume the program is data-race freeContext switch only at synchronization pointsCheck for data-races in each execution
Theorem [Stoller ‘00] :If the algorithm terminates without reporting racesThen the program has no assertion failures
Massive reduction:k = number of synchronization accesses (not memory
accesses)
Combining with context-boundingAlgorithm:
Assume the program is data-race freeContext switch only at synchronization pointsExplore executions with c preemptionsCheck for data-races in each execution
Theorem:If the algorithm terminates without reporting races, Then the program has no assertion failures reachable with c
preemptionsRequires that a thread can block only at synchronization points