Deterministic Execution of Nondeterministic Shared-Memory Programs
description
Transcript of Deterministic Execution of Nondeterministic Shared-Memory Programs
Deterministic Execution of Nondeterministic Shared-Memory
Programs
Dan Grossman
University of Washington
Dagstuhl Seminar on
Design and Validation of Concurrent Systems
August 2009
September 2009 Dan Grossman: Determinism 2
What if…
What if you could run the same multithreaded program on the same inputs twice and know you would get the same results?
• What exactly does that mean?• Why might you want that?• How can we do that (semi-efficiently)?
But first:– Some background on me and “the talks I’m not giving”– Key terminology and perspectives
• More important than technical details at this event
September 2009 Dan Grossman: Determinism 3
Biography / group names
Me: • “Programming-languages person”• Type systems, compilers for memory-safe C dialect 200-2004• 30% 80% focus on multithreading, 2005-• Co-advising 3-4 students with computer architect Luis Ceze, 2007-
Two groups for “marketing purposes”• WASP, wasp.cs.washington.edu
• SAMPA, sampa.cs.washington.edu
September 2009 Dan Grossman: Determinism 4
The talk you won’t seevoid transferFrom(int amt, Acct other){ atomic{ other.withdraw(amt); this.deposit(amt); }}
“Transactions are to shared-memory concurrency as garbage
collection is to memory management” [OOPSLA 07]
Semantic problems with nontransactional accesses: worse than locks!– Fix with stronger guarantees and compiler opts [PLDI07]– Or static type system, formal semantics, and proof [POPL08]– Or more dynamic approach adapting to Haskell [submitted]– …
Prototypes for OCaml, Java, Scheme, and Haskell
September 2009 Dan Grossman: Determinism 5
This talk…
Take an arbitrary C/C++ program with POSIX threads– Locks, barriers, condition variables, data races, whatever
Compile it funny
Link it against a funny run-time system
Get deterministic behavior– Well, as deterministic as a sequential C program
Joint work: Luis Ceze, Tom Bergan, Joe Devietti, Owen Anderson
September 2009 Dan Grossman: Determinism 6
Terminology
Essential perspectives, not just definitions
• Parallelism vs. concurrency– Or different terms if you prefer
• Sequential semantics vs. determinism vs. nondeterminism– What is an input?
• Level of abstraction– Which one do you care about?
September 2009 Dan Grossman: Determinism 7
Concurrency
Working “definition”:
Software is concurrent if a primary intellectual challenge is responding to external events from multiple sources in a timely manner.
Examples: operating system, shared hashtable, version control
Key challenge is responsiveness – often leads to threads or asynchrony
Correctness usually requires synchronization (e.g., locks)
September 2009 Dan Grossman: Determinism 8
Parallelism
Working “definition”:
Software is parallel if a primary intellectual challenge is using extra computational resources to do more useful work per unit time.
Examples: scientific computing, most graphics, a lot of servers
Key challenge is Amdahl’s Law– No sequential bottlenecks, no imbalanced load
When pure fork-join isn’t correct, need synchronization
September 2009 Dan Grossman: Determinism 9
The confusion
• First, this use of terms isn’t standard
• Many systems are both– And it’s really a matter of degree
• Similar lower-level mechanisms, such as threads and locks– And similar errors (race conditions, deadlocks, etc.)
• Our work determinizes these lower-level mechanisms, so we determinize concurrent and parallel applications– But purely parallel ones probably benefit less
September 2009 Dan Grossman: Determinism 10
Terminology
Essential perspectives, not just definitions
• Parallelism vs. concurrency– Or different terms if you prefer
• Sequential semantics vs. determinism vs. nondeterminism– What is an input?
• Level of abstraction– Which one do you care about?
September 2009 Dan Grossman: Determinism 11
Sequential semantics
• Some languages can have results defined purely sequentially, but are designed to have better parallel-performance guarantees (thanks to a cost model)– Examples: DPJ, Cilk, NESL, …
• For correctness, reason sequentially• For performance, reason in parallel
• Really designed for parallelism, not concurrency
• Not our work
September 2009 Dan Grossman: Determinism 12
Sequential isn’t always deterministic
[Surprisingly easy to forget this]
int f1(){ print(“A”); print(“B”); return 0; }
int f2(){ print(“C”); print(“D”); return 0; }
int g() { return f1() + f2(); }
Must g() print ABCD?• Java: yes• C/C++: no, CDAB allowed, but not ACBD, ACDB, etc.
September 2009 Dan Grossman: Determinism 13
Another exampleDijkstra’s guarded-command conditionals
if x % 2 == 1 -> y := x - 1
[] x < 10 -> y := 7
[] x >= 10 -> y := 0
fi
We might still expect a particular language implementation (compiler) to be deterministic– May choose any deterministic result consistent with the
nondeterministic semantics– Presumably doesn’t change choice across executions, but
may across compiles (including “butterfly effects”)– Our work does this
September 2009 Dan Grossman: Determinism 14
Why helpful?
So programmer gets a deterministic executable, but doesn’t know which one– Key degree of freedom for automated performance
Still helpful for:– Whole-program testing and debugging– Automated replicas– In general, repeatability and reducing possible executions
September 2009 Dan Grossman: Determinism 15
Define deterministic, part 1
Deterministic: “outputs depend only on inputs”
• That’s right, but means must clearly specify what is an input (and an output)– Can define away anything you want– Example: All syscall results are inputs, so seeding the
pseudorandom number generator with time-of-day is “deterministic”
• We mean what you think we mean– Inputs: command-line, I/O, syscalls– Not inputs: cache state, hardware timing, thread scheduler
September 2009 Dan Grossman: Determinism 16
Terminology
Essential perspectives, not just definitions
• Parallelism vs. concurrency– Or different terms if you prefer
• Sequential semantics vs. determinism vs. nondeterminism– What is an input?
• Level of abstraction– Which one do you care about?
September 2009 Dan Grossman: Determinism 17
Define deterministic, part 2
“Is it deterministic?” depends crucially on your abstraction level– Another obvious easy-to-forget thing
Examples:• File systems• Memory-allocation (Java vs. C)• Set implemented as a list • Quantum mechanics
Our work:• The “language level”: state of logical memory, program output• Application may care only about a higher level (future work)
September 2009 Dan Grossman: Determinism 18
Okay… how?Trade-off between complexity and performance:
PERFO
RMANCE
COMPLEXITYPerformance:
– Overhead (single-thread slowdown)– Scalability (minimize extra synchronization, waiting)
September 2009 Dan Grossman: Determinism 19
Starting serial
Determinization is easy!– Run one thread at a time in round-robin order– Context-switch after N basic blocks for deterministic N
• Cannot use a timer; use compiler and run-time– Races in source program are irrelevant; locks still respected
Example with 3 threads running (time moves with arrows)
load A
store B
store C
load B
load A
store C
… … …
T1 T2 T3 1 quantum
1 round
September 2009 Dan Grossman: Determinism 20
Parallel quanta• The quanta in a round can start to run in parallel provided they
stop before any communication occurs (see how next)– So each round has two stages, parallel then serial
load A store C
load B
load A
store B store C
T1 T2 T3
Parallel stage endswith global barrier
Serial stage ends;next round starts
… …
…
September 2009 Dan Grossman: Determinism 21
Is that legal?
– Can produce different result than serial execution– In fact, execution not necessarily equivalent with any
serialization of quanta
But it doesn’t matter as long as we are deterministic! Just need:• Parallel stages do no communication• Parallel stages end at deterministic points
load A store C
load B
load A
store B store C
T1 T2 T3
September 2009 Dan Grossman: Determinism 22
Performance
Keys to scalability:
1. Run almost everything in the parallel stage
2. Keep quanta balanced– Assume (1), use rough instruction costs
load A store C
load B
load A
store B store C
T1 T2 T3
September 2009 Dan Grossman: Determinism 23
Memory ownership
To avoid communication during parallel stage:• Every memory location is “shared” or “owned by 1 thread T”
– Dynamic table checked and updated during execution• Can read only memory that is shared or owned-by-you• Can write only memory owned-by-you• Locks: just like memory locations + blocking ends quantum
In our example, perhaps A is shared, B and C are owned by T2
load A store C
load B
load A
store B store C
T1 T2 T3
September 2009 Dan Grossman: Determinism 24
Changing ownership
Policy: For each location (any deterministic granularity is correct),• First owner is first thread to allocate in the location• On read in serial stage, if owned-by-other set to shared• One write in serial stage, set to owned-by-self
Correctness:1. Ownership immutable in parallel stages (so no communication)2. Serial-stage changes are deterministic
So many, many polices are correct– Chose the obvious one for temporal locality + read-sharing– Must have good locality for scalability!
September 2009 Dan Grossman: Determinism 25
Overhead
Significant overhead:– All reads/writes consult ownership information– All basic blocks subtract from a thread-local quantum counter
Reduce via:– Lots of run-time engineering and data structures (not too
much magic, but most important)– Obvious compiler optimizations like escape analysis and
hoisting counter-subtractions– Specialized compiler optimizations like Subsequent Access
Optimization: Don’t recheck same ownership unless a quantum boundary might intervene.
• Correctness of this is a subtle argument and slightly affects the ownership-change policy (deterministically!)
September 2009 Dan Grossman: Determinism 26
Brittle
Change any line of code, command-line argument, environment variable, etc. and you can get a different deterministic program
We are mostly robust to memory-safety errors ,
except – Bounds errors that corrupt ownership information– Bounds errors that write to another thread’s allegedly-thread-
local data
September 2009 Dan Grossman: Determinism 27
Results
Overhead: Varies a lot, but about 3x at 8 threads
Scalability: Varies a lot, but on average with parsec suite (*)
nondet 8 threads vs. nondet 2 threads = 2.4 (linear = 4)
det 8 threads vs. det 2 threads = 2.0
det 8 threads vs. nondet 2 threads = 0.91 (range 0.41 - 2.75)
“How do you want to spend Moore’s Dividend?”
* subset runnable: no mpi, no C++ exceptions, no 32-bit assumptions
September 2009 Dan Grossman: Determinism 28
Buffering
Actually, ownership is only one approach
Second approach relies on buffering and a commit stage• Even higher overhead (to consult buffers)• Even better scalability (block only for synchronization & commits)
And a third hybrid approach
Hopefully more details soon
September 2009 Dan Grossman: Determinism 29
Conclusion
The fundamental assumption that nondeterministic shared-memory programs must be run nondeterministically is false
A fun problem to throw principled compiler and run-time optimizations at.
Could dramatically change how we test and debug parallel and concurrent programs
Most-related work:– Kendo from MIT: done concurrently (in parallel? ), requires
knowing about data races statically, different approach– Colleagues in ASPLOS09: hardware support for ownership– Record & replay systems:we can replay without the record