Deterministic Execution of Nondeterministic Shared-Memory Programs

29
Deterministic Execution of Nondeterministic Shared-Memory Programs Dan Grossman University of Washington Dagstuhl Seminar on Design and Validation of Concurrent Systems August 2009

description

Deterministic Execution of Nondeterministic Shared-Memory Programs. Dan Grossman University of Washington Dagstuhl Seminar on Design and Validation of Concurrent Systems August 2009. What if…. - PowerPoint PPT Presentation

Transcript of Deterministic Execution of Nondeterministic Shared-Memory Programs

Page 1: Deterministic Execution of Nondeterministic Shared-Memory Programs

Deterministic Execution of Nondeterministic Shared-Memory

Programs

Dan Grossman

University of Washington

Dagstuhl Seminar on

Design and Validation of Concurrent Systems

August 2009

Page 2: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 2

What if…

What if you could run the same multithreaded program on the same inputs twice and know you would get the same results?

• What exactly does that mean?• Why might you want that?• How can we do that (semi-efficiently)?

But first:– Some background on me and “the talks I’m not giving”– Key terminology and perspectives

• More important than technical details at this event

Page 3: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 3

Biography / group names

Me: • “Programming-languages person”• Type systems, compilers for memory-safe C dialect 200-2004• 30% 80% focus on multithreading, 2005-• Co-advising 3-4 students with computer architect Luis Ceze, 2007-

Two groups for “marketing purposes”• WASP, wasp.cs.washington.edu

• SAMPA, sampa.cs.washington.edu

Page 4: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 4

The talk you won’t seevoid transferFrom(int amt, Acct other){ atomic{ other.withdraw(amt); this.deposit(amt); }}

“Transactions are to shared-memory concurrency as garbage

collection is to memory management” [OOPSLA 07]

Semantic problems with nontransactional accesses: worse than locks!– Fix with stronger guarantees and compiler opts [PLDI07]– Or static type system, formal semantics, and proof [POPL08]– Or more dynamic approach adapting to Haskell [submitted]– …

Prototypes for OCaml, Java, Scheme, and Haskell

Page 5: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 5

This talk…

Take an arbitrary C/C++ program with POSIX threads– Locks, barriers, condition variables, data races, whatever

Compile it funny

Link it against a funny run-time system

Get deterministic behavior– Well, as deterministic as a sequential C program

Joint work: Luis Ceze, Tom Bergan, Joe Devietti, Owen Anderson

Page 6: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 6

Terminology

Essential perspectives, not just definitions

• Parallelism vs. concurrency– Or different terms if you prefer

• Sequential semantics vs. determinism vs. nondeterminism– What is an input?

• Level of abstraction– Which one do you care about?

Page 7: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 7

Concurrency

Working “definition”:

Software is concurrent if a primary intellectual challenge is responding to external events from multiple sources in a timely manner.

Examples: operating system, shared hashtable, version control

Key challenge is responsiveness – often leads to threads or asynchrony

Correctness usually requires synchronization (e.g., locks)

Page 8: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 8

Parallelism

Working “definition”:

Software is parallel if a primary intellectual challenge is using extra computational resources to do more useful work per unit time.

Examples: scientific computing, most graphics, a lot of servers

Key challenge is Amdahl’s Law– No sequential bottlenecks, no imbalanced load

When pure fork-join isn’t correct, need synchronization

Page 9: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 9

The confusion

• First, this use of terms isn’t standard

• Many systems are both– And it’s really a matter of degree

• Similar lower-level mechanisms, such as threads and locks– And similar errors (race conditions, deadlocks, etc.)

• Our work determinizes these lower-level mechanisms, so we determinize concurrent and parallel applications– But purely parallel ones probably benefit less

Page 10: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 10

Terminology

Essential perspectives, not just definitions

• Parallelism vs. concurrency– Or different terms if you prefer

• Sequential semantics vs. determinism vs. nondeterminism– What is an input?

• Level of abstraction– Which one do you care about?

Page 11: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 11

Sequential semantics

• Some languages can have results defined purely sequentially, but are designed to have better parallel-performance guarantees (thanks to a cost model)– Examples: DPJ, Cilk, NESL, …

• For correctness, reason sequentially• For performance, reason in parallel

• Really designed for parallelism, not concurrency

• Not our work

Page 12: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 12

Sequential isn’t always deterministic

[Surprisingly easy to forget this]

int f1(){ print(“A”); print(“B”); return 0; }

int f2(){ print(“C”); print(“D”); return 0; }

int g() { return f1() + f2(); }

Must g() print ABCD?• Java: yes• C/C++: no, CDAB allowed, but not ACBD, ACDB, etc.

Page 13: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 13

Another exampleDijkstra’s guarded-command conditionals

if x % 2 == 1 -> y := x - 1

[] x < 10 -> y := 7

[] x >= 10 -> y := 0

fi

We might still expect a particular language implementation (compiler) to be deterministic– May choose any deterministic result consistent with the

nondeterministic semantics– Presumably doesn’t change choice across executions, but

may across compiles (including “butterfly effects”)– Our work does this

Page 14: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 14

Why helpful?

So programmer gets a deterministic executable, but doesn’t know which one– Key degree of freedom for automated performance

Still helpful for:– Whole-program testing and debugging– Automated replicas– In general, repeatability and reducing possible executions

Page 15: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 15

Define deterministic, part 1

Deterministic: “outputs depend only on inputs”

• That’s right, but means must clearly specify what is an input (and an output)– Can define away anything you want– Example: All syscall results are inputs, so seeding the

pseudorandom number generator with time-of-day is “deterministic”

• We mean what you think we mean– Inputs: command-line, I/O, syscalls– Not inputs: cache state, hardware timing, thread scheduler

Page 16: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 16

Terminology

Essential perspectives, not just definitions

• Parallelism vs. concurrency– Or different terms if you prefer

• Sequential semantics vs. determinism vs. nondeterminism– What is an input?

• Level of abstraction– Which one do you care about?

Page 17: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 17

Define deterministic, part 2

“Is it deterministic?” depends crucially on your abstraction level– Another obvious easy-to-forget thing

Examples:• File systems• Memory-allocation (Java vs. C)• Set implemented as a list • Quantum mechanics

Our work:• The “language level”: state of logical memory, program output• Application may care only about a higher level (future work)

Page 18: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 18

Okay… how?Trade-off between complexity and performance:

PERFO

RMANCE

COMPLEXITYPerformance:

– Overhead (single-thread slowdown)– Scalability (minimize extra synchronization, waiting)

Page 19: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 19

Starting serial

Determinization is easy!– Run one thread at a time in round-robin order– Context-switch after N basic blocks for deterministic N

• Cannot use a timer; use compiler and run-time– Races in source program are irrelevant; locks still respected

Example with 3 threads running (time moves with arrows)

load A

store B

store C

load B

load A

store C

… … …

T1 T2 T3 1 quantum

1 round

Page 20: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 20

Parallel quanta• The quanta in a round can start to run in parallel provided they

stop before any communication occurs (see how next)– So each round has two stages, parallel then serial

load A store C

load B

load A

store B store C

T1 T2 T3

Parallel stage endswith global barrier

Serial stage ends;next round starts

… …

Page 21: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 21

Is that legal?

– Can produce different result than serial execution– In fact, execution not necessarily equivalent with any

serialization of quanta

But it doesn’t matter as long as we are deterministic! Just need:• Parallel stages do no communication• Parallel stages end at deterministic points

load A store C

load B

load A

store B store C

T1 T2 T3

Page 22: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 22

Performance

Keys to scalability:

1. Run almost everything in the parallel stage

2. Keep quanta balanced– Assume (1), use rough instruction costs

load A store C

load B

load A

store B store C

T1 T2 T3

Page 23: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 23

Memory ownership

To avoid communication during parallel stage:• Every memory location is “shared” or “owned by 1 thread T”

– Dynamic table checked and updated during execution• Can read only memory that is shared or owned-by-you• Can write only memory owned-by-you• Locks: just like memory locations + blocking ends quantum

In our example, perhaps A is shared, B and C are owned by T2

load A store C

load B

load A

store B store C

T1 T2 T3

Page 24: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 24

Changing ownership

Policy: For each location (any deterministic granularity is correct),• First owner is first thread to allocate in the location• On read in serial stage, if owned-by-other set to shared• One write in serial stage, set to owned-by-self

Correctness:1. Ownership immutable in parallel stages (so no communication)2. Serial-stage changes are deterministic

So many, many polices are correct– Chose the obvious one for temporal locality + read-sharing– Must have good locality for scalability!

Page 25: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 25

Overhead

Significant overhead:– All reads/writes consult ownership information– All basic blocks subtract from a thread-local quantum counter

Reduce via:– Lots of run-time engineering and data structures (not too

much magic, but most important)– Obvious compiler optimizations like escape analysis and

hoisting counter-subtractions– Specialized compiler optimizations like Subsequent Access

Optimization: Don’t recheck same ownership unless a quantum boundary might intervene.

• Correctness of this is a subtle argument and slightly affects the ownership-change policy (deterministically!)

Page 26: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 26

Brittle

Change any line of code, command-line argument, environment variable, etc. and you can get a different deterministic program

We are mostly robust to memory-safety errors ,

except – Bounds errors that corrupt ownership information– Bounds errors that write to another thread’s allegedly-thread-

local data

Page 27: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 27

Results

Overhead: Varies a lot, but about 3x at 8 threads

Scalability: Varies a lot, but on average with parsec suite (*)

nondet 8 threads vs. nondet 2 threads = 2.4 (linear = 4)

det 8 threads vs. det 2 threads = 2.0

det 8 threads vs. nondet 2 threads = 0.91 (range 0.41 - 2.75)

“How do you want to spend Moore’s Dividend?”

* subset runnable: no mpi, no C++ exceptions, no 32-bit assumptions

Page 28: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 28

Buffering

Actually, ownership is only one approach

Second approach relies on buffering and a commit stage• Even higher overhead (to consult buffers)• Even better scalability (block only for synchronization & commits)

And a third hybrid approach

Hopefully more details soon

Page 29: Deterministic Execution of Nondeterministic Shared-Memory Programs

September 2009 Dan Grossman: Determinism 29

Conclusion

The fundamental assumption that nondeterministic shared-memory programs must be run nondeterministically is false

A fun problem to throw principled compiler and run-time optimizations at.

Could dramatically change how we test and debug parallel and concurrent programs

Most-related work:– Kendo from MIT: done concurrently (in parallel? ), requires

knowing about data races statically, different approach– Colleagues in ASPLOS09: hardware support for ownership– Record & replay systems:we can replay without the record