CHESS Finding and Reproducing Heisenbugs Tom Ball, Sebastian Burckhardt Madan Musuvathi, Shaz Qadeer...

CHESSFinding and Reproducing

Heisenbugs

Tom Ball, Sebastian BurckhardtMadan Musuvathi, Shaz Qadeer

Microsoft Research

Interns: Gerard Basler (ETH Zurich),Katie Coons (U. T. Austin),

Tayfun Elmas (Koc University),P. Arumuga Nainar (U. Wisc. Madison),

Iulian Neamtiu (U. Maryland, U.C. Riverside)

Concurrency is HARDRare thread interleavings can result in bugs

These bugs are hard to find, reproduce, and debugHeisenbugs: Observing the bug can “fix” it !

A huge productivity problemDevelopers and testers can spend weeks chasing a single

Heisenbug

Demo

Let’s find a simple concurrency bug

CHESS motivationToday:

concurrency testing == stress testing

Stress increases the interleaving variety, butNot predictable → HeisenbugsNot systematic → poor coverage of interleavings

Don’t stress, use CHESSBasic primitive: Drive a program along an interleaving

of choiceInterleaving can be decided by a program or a userDoing this today is surprisingly hard

Use model checking techniques to systematically enumerate thread interleavings

CHESS architecture

CHESSScheduler

MemoryModelbugs

Monitors

Coverage

Repro

TestingDataraces

Debugging Visualization

UnmanagedProgram

Windows

ManagedProgram

.NET CLR

• Record the interleaving executed• Drive the program along an interleaving

Talk outlineIntroduction

Preemption bounding [PLDI ‘07]Tackling state space explosion

Fair stateless model checking [PLDI ‘08]Handling cycles in states spaces

CHESS architecture details [OSDI ‘08]

Enumerating thread interleavings

x = 1;y = 1;

x = 2;y = 2;

2,1

1,0

0,0

1,1

2,2

2,22,1

2,0

2,12,2

Thread 1 Thread 2

1,2

2,0

2,2

1,1

1,1 1,2

1,0

1,2 1,1

y = 1;

x = 1;

y = 2;

x = 2;

Stateless model checking [Verisoft]Systematically enumerate all paths in a state-space graph

Don’t capture program states Capturing states is extremely hard for large programsState = globals, heap, stack, registers, kernel, filesystem,

other processes, other machines,…

Very effective on acyclic state spaces Termination is guaranteed

Potentially revisits program statesPartial-order reduction alleviates redundant exploration

x = 1; … … … … … y = k;

State space explosionThread 1 Thread n

x = 1; … … … … …y = k;

…

n threads

k steps each

Number of executions = O( nnk )

Exponential in both n and kTypically: n < 10 k > 100

Limits scalability to large programs

Goal: Scale CHESS to large programs (large k)

x = 1;if (p != 0) { x = p->f;}

Preemption bounding Prioritize executions with small number of preemptions

Preemption is a context switch forced by the scheduler Unexpected by the programmere.g. Time-slice expiration

Hypothesis: most concurrency bugs result from few preemptions

x = p->f;}

x = 1;if (p != 0) {

p = 0;

Thread 1 Thread 2

preemption

non-preemption

Polynomial state spaceTerminating program with fixed inputs and deterministic threads

n threads, k steps each, c preemptionsNumber of executions <= nkCc . (n+c)!

= O( (n2k)c. n! )

Exponential in n and c, but not in k

x = 1; … … … … …y = k;

x = 1; … … … … … y = k;

Thread 1 Thread 2

x = 1; … … … …

x = 1; … … …

…y = k;

… …

y = k;

• Choose c preemption points

• Permute n+c atomic blocks

Find lots of bugs with 2 preemptionsProgram Lines of code Bugs

Work Stealing Q 4K 4

CDS 6K 1

CCR 9K 3

ConcRT 16K 4

Dryad 18K 7

APE 19K 4

STM 20K 2

TPL 24K 9

PLINQ 24K 1

Singularity 175K 2

37 (total)

Acknowledgement: testers from PCP team

Good coverage metricWhen CHESS completes search with c preemptionsAny remaining bug requires c+1 or more preemptions

Two preemptions sufficient to reproduced all stress-test failures, reported so far

Concurrent programs have cyclic state spaces

SpinlocksNon-blocking algorithmsImplementations of synchronization primitivesPeriodic timers…

L1: while( ! done) { L2: Sleep(); }

M1: done = 1;

Thread 1 Thread 2 ! done L2

! doneL1

done L2

doneL1

A demonic scheduler unrolls any cycle ad-infinitum

! done

done! done

done! done

done

while( ! done){ Sleep();}

done = 1;

Thread 1 Thread 2

! done

Depth bounding

! done

done! done

done! done

done! done

Prune executions beyond a bounded number of steps

Depth bound

Problem 1: Ineffective state coverage

! done

! done

! done

! done

Bound has to be large enough to reach the deepest bug Typically, greater than 100

synchronization operations

Every unrolling of a cycle redundantly explores reachable state space

Depth bound

Problem 2: Cannot find livelocksLivelocks : lack of progress in a program

temp = done;while( ! temp){ Sleep();}

done = 1;

Thread 1 Thread 2

Fair stateless model checking

Make stateless model checking effective on cyclic state spacesEffective state coverageDetect livelocks

Key idea

This test terminates only when the scheduler is fairFairness is assumed by programmers

All cycles in correct programs are unfair A fair cycle is a livelock


done = 1;

Thread 1 Thread 2

! done! done

donedone

Key idea

This test terminates only when the scheduler is fairFairness is assumed by programmers

CHESS should only explore fair schedules


done = 1;

Thread 1 Thread 2

! done! done

donedone

What notion of fairness?

Weak fairnessForall t :: GF ( enabled(t) scheduled(t) )A thread that remains enabled should eventually be

scheduled

A weakly-fair scheduler will eventually schedule Thread 2Example: round-robin, FIFO wait queues


done = 1;

Thread 1 Thread 2

Weak fairness does not suffice

Lock( l );While( ! done){ Unlock( l ); Sleep(); Lock( l );}Unlock( l );

Lock( l );done = 1;Unlock( l );

Thread 1 Thread 2

en = {T1, T2}

T1: Sleep()T2: Lock( l )

en = {T1, T2}

T1: Lock( l )T2: Lock( l )

en = { T1 }

T1: Unlock( l )T2: Lock( l )

en = {T1, T2}

T1: Sleep()T2: Lock( l )

Strong Fairness Forall t :: GF enabled(t) GF scheduled(t) A thread that is enabled infinitely often is scheduled infinitely often

Thread 2 is enabled and competes for the lock infinitely often Example: a round-robin scheduler with priorities [Apt & Olderog ‘83]

Lock( l );While( ! done){ Unlock( l ); Sleep(); Lock( l );}Unlock( l );

Lock( l );done = 1;Unlock( l );

Thread 1 Thread 2

Constructing a strongly fair schedulerA round-robin scheduler is not strongly fair

It is only weakly fair

Extend a round-robin scheduler with priorities [Apt & Olderog ‘83]

CHESS also needs to be demonicCannot generate all fair schedules

There are infinitely many, even for simple programs

It is sufficient to generate enough fair schedules to Explore all states (safety coverage)Explore at least one fair cycle, if any (livelock coverage)

Do it without capturing the program states

Fair stateless model checkingGiven a concurrent program Q and a safety property P

Q does not necessarily have an acyclic state space

Determine Q satisfies P and Q is fair-terminating (livelock-free)

Without capturing program states

(Good) Programs indicate lack of progress

Good Samaritan assumption:Forall threads t : GF scheduled(t) GF yield(t)A thread when scheduled infinitely often yields the processor infinitely

often

Examples of yield:Sleep(), ScheduleThread(), asm {rep nop;}Thread completion


done = 1;

Thread 1 Thread 2

Robustness of the Good Samaritan assumptionA violation of the Good Samaritan assumption is a

performance error

Programs are parsimonious in the use of yieldsA Sleep() almost always indicates a lack of progressImplies that the thread is stuck in a state-space cycle

while( ! done){ ;}

done = 1;

Thread 1 Thread 2

Fair demonic scheduler (outline)Maintain a priority-order (a partial-order) on threads

A < B means that A will not be scheduled in a state where B is enabled

Threads get a lower priority only when they yieldScheduler is fully demonic on yield-free paths

A thread loses its priority once it executesRemove all edges t < A when A executes

Four outcomes of the semi-algorithmTerminates without finding any errorsTerminates with a safety violationDiverges with an infinite execution

that violates the GS assumption (a performance error)that is strongly-fair (a livelock)

In practice: detect infinite executions by a very long execution

CoverageTheorem: The algorithm achieves full coverage

if every state is reachable by a yield-free path, and Exists a fair cycle iff exists a fair cycle with at most one

yield per thread

Results: Achieves more coverage faster

With fairness

Without fairness, with depth bound

20 30 40 50 60

States Explored 1726 871 1505 1726 1307 683

PercentageCoverage 100% 50% 87% 100% 76% 40%

Time(secs) 143 97 763 2531 >5000 >5000

Work stealing queue with one stealer

Livelocks in Singularity(both fixed)A thread needlessly burns its CPU quantum in a spin-

loop “it's a bug that we think we have seen in practice, but

that would have been very difficult to find through normal means” [Dean Tribble]

An infinite loop in the Promise implementationManifested as a non-reproducible problem in an existing

stress-testCHESS found the bug in a simple test harness with a

repeatable error-trace

CHESS architecture recap

CHESSScheduler

MemoryModelbugs

Monitors

Coverage

Repro

TestingDataraces

Debugging Visualization

UnmanagedProgram

Windows

ManagedProgram

.NET CLR

• Record the interleaving executed• Drive the program along an interleaving

Capture the ‘happens-before’ graph Happens-before graph captures all communication between threads in

a concurrent execution

Abstracts time: For a given input, two executions that result in the same happens-before graph are behaviorally equivalent

x = 1

t = x;

wait(e)

setEvent(e)

Enforce a single-threaded executionHappens-before graph is a partial-order

can be converted to a totally-ordered single threaded execution

Big performance winData-accesses are automatically ordered by synchronization

eventsDon’t need to instrument data-accesses

Cannot explore non-sequentially consistent executions of a programResulting from relaxed memory model of the hardwareSober: A tool that detects the presence of such executions [CAV

‘08]

Directing the executionGiven a happens-before graphBlock the execution of a synchronization if it produces

an edge not in the graph

Need to understand the precise semantics of synchronization operations

ConclusionMessage to concurrency programmers

Think seriously about interleaving coverage

Message to system/PL researchersConcurrency APIs should have a clear specification of the

nondeterminism exposed

Don’t stress, use CHESSCHESS binary available for academic use

http://research.microsoft.com/CHESSCHESS will be shipped for commercial use, very soon

http://msdn.microsoft.com/devlabs

CHESS is extensibleUse CHESS scheduler for concurrency toolsPlug in new search algorithms

http://research.microsoft.com/CHESS



http://msdn.microsoft.com/devlabs

Questions

A stress test fails…

CHESS reproduces the bug in 2 mins

CHESS Finding and Reproducing Heisenbugs Tom Ball, Sebastian Burckhardt Madan Musuvathi, Shaz Qadeer...

Documents

Transcript of CHESS Finding and Reproducing Heisenbugs Tom Ball, Sebastian Burckhardt Madan Musuvathi, Shaz Qadeer...