Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A....

28
Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A....

Page 1: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Execution Replay for

Multiprocessor Virtual Machines

George W. DunlapDominic Lucchetti

Michael A. FettermanPeter M. Chen

Page 2: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Big ideas

• Detection and replay of memory races is possible on commodity hardware

• Overhead high for some workloads

• …but surprisingly low for other workloads

Page 3: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Execution Replay

CPU

Memory

Disk

Network

Keyboard, mouse

Interrupts

Page 4: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Uses of Execution Replay

• Reconstructing state– Fault tolerance

• Reconstructing execution– Debugging– Realistic trace generation

• Both– Intrusion analysis

Page 5: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Single-processor Replay• Basic principles well understood

– Log all non-deterministic inputs– Timing of asynchronous events

• Minimal overhead (Dunlap02)– 13% worst case– Log for months or years

• Available commercially– VMWare: Record/Replay

Page 6: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Replay for Multiprocessors• Memory races in multiprocessor VMs• The Ordering Requirement• The CREW Protocol

– Implementing with page protections– Relation to the Ordering Requirement– Generating constrants from CREW events

• DMA-capable devices and CREW• Performance

Page 7: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

The Multiprocessor Challenge

• Interleaved reads and writes– Fine-grained non-determinism– Much more difficult

• Existing solutions– Hardware modification– Software instrumentation

• SMP-ReVirt– Hardware MMU to detect sharing

Page 8: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Multiprocessor Replay

P2

Memory

P1

P1 P2

n=3n=5

if (n<4)

Page 9: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Ordering Memory Accesses

• Preserving order will reproduce execution– a→b: “a happens-before b”– Ordering is transitive: a→b, b→c means

a→c

• Two instructions must be ordered if:– they both access the same memory, and– one of them is a write

Page 10: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Constraints: Enforcing order

• To guarantee a→d:– a→d– b→d– a→c– b→c

• Suppose we need b→c– b→c is necessary– a→d is redundant

P1

a

b

c

d

P2

overconstrained

Page 11: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

CREW Protocol

• Each shared object in one of two states:– Concurrent-Read: all processors can read,

none can write– Exclusive-Write: one processor (the

owner) can read and write; others have no access

Page 12: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

CREW protocol, con’t• Enforced with hardware MMU

– Read/write– Read-only– None

• Change CREW states on demand– Fault, fixup, re-execute

• CREW event– Increasing or reducing permission due to CREW

state changes

Page 13: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

CREW Property

• If two instructions on different processors: – access the same page,– and one of them is a write,– there will be a CREW event on each

processor between them.

Page 14: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Generating Constraints• State: Concurrent Read

– All processors read-only

• d*: CREW fault• New state: P2 Exclusive• r: privilege reduction

– Read to None

• i: privilege increase– Read to Read/write

• Log timing of r and i• Constraint:

– r → i

P1

a

d

P2

ri

d*

Page 15: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Direct Memory Access

• Device accesses memory directly

• Logically another processor– Reads and writes need to be ordered– IOMMU: can’t fault/fixup/re-execute

• Observation: Transaction model

• Device: non-preemptible actor

Page 16: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Prototype: SMP-ReVirt

• Modified Xen hypervisor

• Implement logging, CREW protocol

• Details in paper

Page 17: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Evaluation questions

• What is the overhead?

• What affects performance?– In paper

• When might I want to use MP?– Log with 1, 2, or N cpus?

Page 18: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Evaluation Workloads

• SPLASH2 parallel application suite– FMM, LU, ocean, radix, water-spatial,

radiosity

• Kernel-build

• Dbench

Page 19: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Predicting results• Key changes in sharing attributes

– 4096-byte sharing granularity– “Miss” is very expensive

• SPLASH2– Good: high spatial locality / low false sharing– Bad: random access patterns / high false sharing

• The Linux kernel– Tuned to 16-byte cacheline– Involving the kernel may be expensive

Page 20: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Single-processor Xen guests

1.001.04

1.01 1.001.03

1.13

1.001.05

0

0.2

0.4

0.6

0.8

1

1.2

FMM LU ocean radix water-spatial

kernel-build

radiosity dbench

Norm

aliz

ed r

untim

e

Unmodified 1-cpu guest

Logging 1-cpuguest

`

Page 21: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Log Growth RateWorkload Log growth(GB/day) Days to fill 300GB

FMM 0.234 1280

LU 0.237 1261

Ocean 0.232 1295

Radix 0.292 1025

Water-spatial 0.232 1296

Kernel-build 0.564 531

Radiosity 0.231 1295

Dbench 0.557 538

Page 22: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

2-processor Xen guests

1.51

1.001.08

1.601.48

2.10

1.90

1.76

1.96

1.741.83

1.99

0

0.5

1

1.5

2

2.5

FMM LU ocean radix water-spatial kernel-build

No

rma

lize

d r

un

tim

e

Unmodified 2-cpuguest

Logging 2-cpu guest

Logging 1-cpu guest

Page 23: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

2-processor, con’t

8.70

7.21

1.85 1.88

0123456789

10

radiosity dbench

No

rma

lize

d r

un

tim

e

Unmodified 2-cpu guest

Logging 2-cpu guest

Logging 1-cpu guest

Page 24: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Log Growth RateWorkload Log growth(GB/day) Days to fill 300GB

FMM 34.5 8.7

LU 3.2 92.7

Ocean 4.3 69.1

Radix 39.8 7.5

Water-spatial 36.3 8.25

Kernel-build 43.3 6.9

Radiosity 88.4 3.4

Dbench 77.0 3.9

Page 25: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

4-processor Xen guests

7.36

1.12 1.28

4.20

1.72

9.03

0

2

4

6

8

10

FMM LU ocean radix water-spatial kernel-build

Nor

mal

ized

run

time

Unmodified domain, 4 cpus

CREW logging, 4 cpus

CREW logging, 2 cpus*

CREW logging, 1 cpu

Page 26: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Recap• Memory races in multiprocessor VMs• The Ordering Requirement• The CREW Protocol

– Implementing with page protections– Relation to the Ordering Requirement– Generating constrants from CREW events

• DMA-capable devices and CREW• Performance

Page 27: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Big ideas

• Detection and replay of memory races is possible on commodity hardware

• Overhead high for some workloads

• …but surprisingly low for other workloads

Page 28: Execution Replay for Multiprocessor Virtual Machines George W. Dunlap Dominic Lucchetti Michael A. Fetterman Peter M. Chen.

Questions