CHESS : Systematic Testing of Concurrent Programs Madan Musuvathi Shaz Qadeer Microsoft Research.
CHESS Finding and Reproducing Heisenbugs Tom Ball, Sebastian Burckhardt Madan Musuvathi, Shaz Qadeer...
-
date post
20-Dec-2015 -
Category
Documents
-
view
217 -
download
0
Transcript of CHESS Finding and Reproducing Heisenbugs Tom Ball, Sebastian Burckhardt Madan Musuvathi, Shaz Qadeer...
CHESSFinding and Reproducing
Heisenbugs
Tom Ball, Sebastian BurckhardtMadan Musuvathi, Shaz Qadeer
Microsoft Research
Interns: Gerard Basler (ETH Zurich),Katie Coons (U. T. Austin),
Tayfun Elmas (Koc University),P. Arumuga Nainar (U. Wisc. Madison),
Iulian Neamtiu (U. Maryland, U.C. Riverside)
Concurrency is HARDRare thread interleavings can result in bugs
These bugs are hard to find, reproduce, and debugHeisenbugs: Observing the bug can “fix” it !
A huge productivity problemDevelopers and testers can spend weeks chasing a single
Heisenbug
Demo
Let’s find a simple concurrency bug
CHESS motivationToday:
concurrency testing == stress testing
Stress increases the interleaving variety, butNot predictable → HeisenbugsNot systematic → poor coverage of interleavings
Don’t stress, use CHESSBasic primitive: Drive a program along an interleaving
of choiceInterleaving can be decided by a program or a userDoing this today is surprisingly hard
Use model checking techniques to systematically enumerate thread interleavings
CHESS architecture
CHESSScheduler
MemoryModelbugs
Monitors
Coverage
Repro
TestingDataraces
Debugging Visualization
UnmanagedProgram
Windows
ManagedProgram
.NET CLR
• Record the interleaving executed• Drive the program along an interleaving
Talk outlineIntroduction
Preemption bounding [PLDI ‘07]Tackling state space explosion
Fair stateless model checking [PLDI ‘08]Handling cycles in states spaces
CHESS architecture details [OSDI ‘08]
Enumerating thread interleavings
x = 1;y = 1;
x = 2;y = 2;
2,1
1,0
0,0
1,1
2,2
2,22,1
2,0
2,12,2
Thread 1 Thread 2
1,2
2,0
2,2
1,1
1,1 1,2
1,0
1,2 1,1
y = 1;
x = 1;
y = 2;
x = 2;
Stateless model checking [Verisoft]Systematically enumerate all paths in a state-space graph
Don’t capture program states Capturing states is extremely hard for large programsState = globals, heap, stack, registers, kernel, filesystem,
other processes, other machines,…
Very effective on acyclic state spaces Termination is guaranteed
Potentially revisits program statesPartial-order reduction alleviates redundant exploration
x = 1; … … … … … y = k;
State space explosionThread 1 Thread n
x = 1; … … … … …y = k;
…
n threads
k steps each
Number of executions = O( nnk )
Exponential in both n and kTypically: n < 10 k > 100
Limits scalability to large programs
Goal: Scale CHESS to large programs (large k)
x = 1;if (p != 0) { x = p->f;}
Preemption bounding Prioritize executions with small number of preemptions
Preemption is a context switch forced by the scheduler Unexpected by the programmere.g. Time-slice expiration
Hypothesis: most concurrency bugs result from few preemptions
x = p->f;}
x = 1;if (p != 0) {
p = 0;
Thread 1 Thread 2
preemption
non-preemption
Polynomial state spaceTerminating program with fixed inputs and deterministic threads
n threads, k steps each, c preemptionsNumber of executions <= nkCc . (n+c)!
= O( (n2k)c. n! )
Exponential in n and c, but not in k
x = 1; … … … … …y = k;
x = 1; … … … … … y = k;
Thread 1 Thread 2
x = 1; … … … …
x = 1; … … …
…y = k;
… …
y = k;
• Choose c preemption points
• Permute n+c atomic blocks
Find lots of bugs with 2 preemptionsProgram Lines of code Bugs
Work Stealing Q 4K 4
CDS 6K 1
CCR 9K 3
ConcRT 16K 4
Dryad 18K 7
APE 19K 4
STM 20K 2
TPL 24K 9
PLINQ 24K 1
Singularity 175K 2
37 (total)
Acknowledgement: testers from PCP team
Good coverage metricWhen CHESS completes search with c preemptionsAny remaining bug requires c+1 or more preemptions
Two preemptions sufficient to reproduced all stress-test failures, reported so far
Talk outlineIntroduction
Preemption bounding [PLDI ‘07]Tackling state space explosion
Fair stateless model checking [PLDI ‘08]Handling cycles in states spaces
CHESS architecture details [OSDI ‘08]
Concurrent programs have cyclic state spaces
SpinlocksNon-blocking algorithmsImplementations of synchronization primitivesPeriodic timers…
L1: while( ! done) { L2: Sleep(); }
M1: done = 1;
Thread 1 Thread 2 ! done L2
! doneL1
done L2
doneL1
A demonic scheduler unrolls any cycle ad-infinitum
! done
done! done
done! done
done
while( ! done){ Sleep();}
done = 1;
Thread 1 Thread 2
! done
Depth bounding
! done
done! done
done! done
done! done
Prune executions beyond a bounded number of steps
Depth bound
Problem 1: Ineffective state coverage
! done
! done
! done
! done
Bound has to be large enough to reach the deepest bug Typically, greater than 100
synchronization operations
Every unrolling of a cycle redundantly explores reachable state space
Depth bound
Problem 2: Cannot find livelocksLivelocks : lack of progress in a program
temp = done;while( ! temp){ Sleep();}
done = 1;
Thread 1 Thread 2
Fair stateless model checking
Make stateless model checking effective on cyclic state spacesEffective state coverageDetect livelocks
Key idea
This test terminates only when the scheduler is fairFairness is assumed by programmers
All cycles in correct programs are unfair A fair cycle is a livelock
while( ! done){ Sleep();}
done = 1;
Thread 1 Thread 2
! done! done
donedone
Key idea
This test terminates only when the scheduler is fairFairness is assumed by programmers
CHESS should only explore fair schedules
while( ! done){ Sleep();}
done = 1;
Thread 1 Thread 2
! done! done
donedone
What notion of fairness?
Weak fairnessForall t :: GF ( enabled(t) scheduled(t) )A thread that remains enabled should eventually be
scheduled
A weakly-fair scheduler will eventually schedule Thread 2Example: round-robin, FIFO wait queues
while( ! done){ Sleep();}
done = 1;
Thread 1 Thread 2
Weak fairness does not suffice
Lock( l );While( ! done){ Unlock( l ); Sleep(); Lock( l );}Unlock( l );
Lock( l );done = 1;Unlock( l );
Thread 1 Thread 2
en = {T1, T2}
T1: Sleep()T2: Lock( l )
en = {T1, T2}
T1: Lock( l )T2: Lock( l )
en = { T1 }
T1: Unlock( l )T2: Lock( l )
en = {T1, T2}
T1: Sleep()T2: Lock( l )
Strong Fairness Forall t :: GF enabled(t) GF scheduled(t) A thread that is enabled infinitely often is scheduled infinitely often
Thread 2 is enabled and competes for the lock infinitely often Example: a round-robin scheduler with priorities [Apt & Olderog ‘83]
Lock( l );While( ! done){ Unlock( l ); Sleep(); Lock( l );}Unlock( l );
Lock( l );done = 1;Unlock( l );
Thread 1 Thread 2
Constructing a strongly fair schedulerA round-robin scheduler is not strongly fair
It is only weakly fair
Extend a round-robin scheduler with priorities [Apt & Olderog ‘83]
CHESS also needs to be demonicCannot generate all fair schedules
There are infinitely many, even for simple programs
It is sufficient to generate enough fair schedules to Explore all states (safety coverage)Explore at least one fair cycle, if any (livelock coverage)
Do it without capturing the program states
Fair stateless model checkingGiven a concurrent program Q and a safety property P
Q does not necessarily have an acyclic state space
Determine Q satisfies P and Q is fair-terminating (livelock-free)
Without capturing program states
(Good) Programs indicate lack of progress
Good Samaritan assumption:Forall threads t : GF scheduled(t) GF yield(t)A thread when scheduled infinitely often yields the processor infinitely
often
Examples of yield:Sleep(), ScheduleThread(), asm {rep nop;}Thread completion
while( ! done){ Sleep();}
done = 1;
Thread 1 Thread 2
Robustness of the Good Samaritan assumptionA violation of the Good Samaritan assumption is a
performance error
Programs are parsimonious in the use of yieldsA Sleep() almost always indicates a lack of progressImplies that the thread is stuck in a state-space cycle
while( ! done){ ;}
done = 1;
Thread 1 Thread 2
Fair demonic scheduler (outline)Maintain a priority-order (a partial-order) on threads
A < B means that A will not be scheduled in a state where B is enabled
Threads get a lower priority only when they yieldScheduler is fully demonic on yield-free paths
A thread loses its priority once it executesRemove all edges t < A when A executes
Four outcomes of the semi-algorithmTerminates without finding any errorsTerminates with a safety violationDiverges with an infinite execution
that violates the GS assumption (a performance error)that is strongly-fair (a livelock)
In practice: detect infinite executions by a very long execution
CoverageTheorem: The algorithm achieves full coverage
if every state is reachable by a yield-free path, and Exists a fair cycle iff exists a fair cycle with at most one
yield per thread
Results: Achieves more coverage faster
With fairness
Without fairness, with depth bound
20 30 40 50 60
States Explored 1726 871 1505 1726 1307 683
PercentageCoverage 100% 50% 87% 100% 76% 40%
Time(secs) 143 97 763 2531 >5000 >5000
Work stealing queue with one stealer
Livelocks in Singularity(both fixed)A thread needlessly burns its CPU quantum in a spin-
loop “it's a bug that we think we have seen in practice, but
that would have been very difficult to find through normal means” [Dean Tribble]
An infinite loop in the Promise implementationManifested as a non-reproducible problem in an existing
stress-testCHESS found the bug in a simple test harness with a
repeatable error-trace
Talk outlineIntroduction
Preemption bounding [PLDI ‘07]Tackling state space explosion
Fair stateless model checking [PLDI ‘08]Handling cycles in states spaces
CHESS architecture details [OSDI ‘08]
CHESS architecture recap
CHESSScheduler
MemoryModelbugs
Monitors
Coverage
Repro
TestingDataraces
Debugging Visualization
UnmanagedProgram
Windows
ManagedProgram
.NET CLR
• Record the interleaving executed• Drive the program along an interleaving
Capture the ‘happens-before’ graph Happens-before graph captures all communication between threads in
a concurrent execution
Abstracts time: For a given input, two executions that result in the same happens-before graph are behaviorally equivalent
x = 1
t = x;
wait(e)
setEvent(e)
Enforce a single-threaded executionHappens-before graph is a partial-order
can be converted to a totally-ordered single threaded execution
Big performance winData-accesses are automatically ordered by synchronization
eventsDon’t need to instrument data-accesses
Cannot explore non-sequentially consistent executions of a programResulting from relaxed memory model of the hardwareSober: A tool that detects the presence of such executions [CAV
‘08]
Directing the executionGiven a happens-before graphBlock the execution of a synchronization if it produces
an edge not in the graph
Need to understand the precise semantics of synchronization operations
ConclusionMessage to concurrency programmers
Think seriously about interleaving coverage
Message to system/PL researchersConcurrency APIs should have a clear specification of the
nondeterminism exposed
Don’t stress, use CHESSCHESS binary available for academic use
http://research.microsoft.com/CHESSCHESS will be shipped for commercial use, very soon
http://msdn.microsoft.com/devlabs
CHESS is extensibleUse CHESS scheduler for concurrency toolsPlug in new search algorithms
Questions
A stress test fails…
CHESS reproduces the bug in 2 mins