Talk Outline

26
Parallel and Distributed Systems Laboratory Paradise: A Toolkit for Building Reliable Concurrent Systems Trace Verification for Parallel Systems Vijay K. Garg Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78712 email: [email protected]

description

Trace Verification for Parallel Systems Vijay K. Garg Department of Electrical and Computer Engineering The University of Texas at Austin Austin, TX 78712 email: [email protected]. Talk Outline. Motivation and Overview Instrumentation Clock : Tracking Dependency Property Checking - PowerPoint PPT Presentation

Transcript of Talk Outline

Page 1: Talk Outline

Parallel and Distributed Systems Laboratory

Paradise: A Toolkit for Building Reliable Concurrent Systems

Trace Verification for Parallel Systems

Vijay K. GargDepartment of Electrical and Computer Engineering

The University of Texas at AustinAustin, TX 78712

email: [email protected]

Page 2: Talk Outline

2

Talk Outline

Motivation and Overview

Instrumentation

– Clock : Tracking Dependency

Property Checking

– Sensor : Detecting Global Properties

– Slicer : Computation Slicing

Page 3: Talk Outline

4

Motivation: Reliable System

Concurrent systems are prone to errors.

– Concurrency, nondeterminism, process and channel failures

Techniques to ensure correctness

Modeling: Model Checking and Formal Verification

Bug Hunting: Simulation, Debugging and Verification

Fault-Tolerance

Page 4: Talk Outline

5

Paradise Environment

Program Monitor

Slicer

Predicate

Observe

Control

Page 5: Talk Outline

6

Talk Outline

Motivation and Overview

Instrumentation

– Clock : Tracking Dependency

Property Checking

– Sensor : Detecting Global Properties

– Slicer : Computation Slicing

Page 6: Talk Outline

7

Trace Model: Total Order vs Partial Order

Total order: interleaving of events in a trace

Partial order: Lamport’s happened-before model

f2e1

CS2 CS1

f1 e2

P1

P2

Partial Order Trace

CS2

CS1

e1 e2

f1 f2

e2e1

CS1CS2

f1 f2

Successful Trace

Specification:CS1 Λ CS2

¬CS2 ¬CS1

¬ CS1

¬CS2

¬CS1 ¬ CS2

Faulty Trace

Page 7: Talk Outline

8

Tracking Dependency

computation: a set of events ordered by “happened before” relation

Problem: Timestamp events to answer

– e happened before f ?

– e concurrent with f ?

Page 8: Talk Outline

9

Clocks in a Distributed System

Result: s happened before t i the vector at s is less than the vector at t.

Vector Clocks [Fidge 89, Mattern 89]

P1

(1,0,0) (2,1,0) (3,1,0)

P2

(0,1,0) (0,2,0)

P3

(0,0,1) (0,0,2) (2,1,3)

Page 9: Talk Outline

10

Dynamic Chain Clocks

Problem with vector clocks: scalability, dynamic process structure

Idea: Computing the “chains” in an online fashion [Aggarwal and Garg PODC 05] for relevant events

a

f

eb

d

c

h

g

a b c d

e f g h

A computation with 4 processes The relevant subcomputation

P1

P2

P3

P4

Page 10: Talk Outline

11

Experimental Results

Simulation of a computation with 1% relevant events

Measured

– number of components vs number of threads

– total time overhead vs number of threads

Page 11: Talk Outline

12

Talk Outline

Motivation and Overview

Instrumentation

– Clock : Tracking Dependency

Property Checking

– Sensor : Detecting Global Properties

– Slicer : Computation Slicing

Page 12: Talk Outline

13

Global Property Detection

Predicate: A global condition expressed using variables on processes

– e.g., more than one process is in critical section, there is no token in the system

Problem: find a global state that satisfies the given predicate

P1

P2

G1 G2

Critical section

Page 13: Talk Outline

14

The Main Difficulty in Partial Order

Algorithm for general predicate [Cooper and Marzullo 91]

Too many global states : A computation may contain as many as O(kn) global states

• k: maximum number of events on a process• n: number of processes

e1 e2

f1 f2

T┴

P1

P2 {e1, ┴} {f1, ┴}

{e1, f1, ┴}

{e2, e1, f1, ┴}

{e2, e1, f2, f1, ┴

{e1, f2, f1, ┴}

{e2, e1, ┴}

{┴}

Page 14: Talk Outline

15

Efficient Predicate Detection for Special Cases

stable predicate: [Chandy and Lamport 85] • once the predicate becomes true, it stays true e.g.,

deadlock

unstable predicate:• observer independent predicate [Charron-Bost et al 95]

occurs in one interleaving occurs in all interleavings e.g., any disjunction of local predicate

• linear predicate [Chase and Garg 95] e.g., conjunctive predicates such as there is no leader in

the system• relational predicate: x1 + x2 +…+ xn ≥ k [Chase and Garg

95] e.g., violation of k-mutual exclusion

Page 15: Talk Outline

16

Algorithms for Conjunctive Predicates

Centralized Algorithm [Garg and Waldecker 92] Each non-checker process maintains its local vector and sends to the checker process the chain clock whenever

– local predicate is true

– at most once in each message interval.

Time complexity: Checker requires at most O(n2m) comparisons.

– token based algorithm [Garg and Chase 95]

– completely distributed algorithm [Garg and Chase 95]

– keeping queues shorter [Chiou and Korfhage 95]

– avoiding control messages [Hurfin, Mizuno, Raynal, Singhal 96]

Page 16: Talk Outline

17

Other Special Classes of Predicates

Relational Predicates

– Let xi: number of token at Pi

– Σxi < k: loss of tokens

– Algorithms: max-flow techniques [Groselj 93, Chase and Garg 95, Wu and Chen 98]

– Dilworth's partition [Tomlinson and Garg 96]

Page 17: Talk Outline

18

Talk Outline

Motivation and Overview

Instrumentation

– Clock : Tracking Dependency

Property Checking

– Sensor : Detecting Global Properties

– Slicer : Computation Slicing

Page 18: Talk Outline

19

The Main Idea of Computation Slicing

Partial order trace

slice

state explosion

keep all red global statesslicing

Page 19: Talk Outline

20

How does Computation Slicing Help?

Partial order trace

slice

retain all global states satisfying b1

slicing for b1

check b1 Λ b2

check b2

satisfy b1

Page 20: Talk Outline

21

Example

Detect predicate (x*y + z < 5) Λ (x ≥1) Λ (z ≤3)

P1

P2

P3

x

y

z

a

1

b

2

c

-1

d

0

e

0

f

2

g

1

h

3

u

4

v

1

w

2

x

4

{a,e,f,u,v} {b}

{w} {g}

Computation

Slice with respect to (x ≥1) Λ (z ≤3)

Page 21: Talk Outline

22

Computation Slice

computation slice: a sub-computation such that: [Mittal and Garg 01]

1. it contains all global states of the computation satisfying the given predicate, and

2. it contains the least number of global states

Page 22: Talk Outline

23

POTA Architecture [Sen Dissertation 04]

Instrumentor

Specification

SlicerPredicate Detector

Trace

Slice

Predicate (Specification)

TranslatorExecuteProgram

ExecuteSPIN

Program

InstrumentedProgram

Promela

Trace Slice

yes/witnessno/counterexample

no/counterexample

yes

Analyzer

Page 23: Talk Outline

24

Results

Efficient polynomial-time algorithms for computing the slice for:

– linear predicates: [Garg and Mittal 01]• time-complexity: O(n2m)

– general predicate:• Theorem: Given a computation, if a predicate b can be

detected efficiently then the slice for b can also be computed efficiently. [Mittal,Sen and Garg 03]

– combining slices: Boolean operators

– temporal logic operators: EF, AG, EG

– approximate slice: For arbitrary boolean expression

• n: number of processes• m: number of events

Page 24: Talk Outline

25

Experiments: Dining Philosophers Trace Verification

POTA: Partial Order Trace Analyzer (based on slicing) [Sen and Garg 03]

SPIN: A widely used model checking tool [Holzmann 97]

– SPIN: 250 seconds for n = 6, runs out of memory for n > 6.

– POTA: can handle n= 200. Used 400 seconds.

Predicate: Two neighboring dining philosophers do not eat concurrently

Page 25: Talk Outline

26

Conclusions

Bug-hunting in concurrent systems

Total order vs. Partial Order

Abstraction like slicing to combat state space explosion problem

Page 26: Talk Outline

27

Questions

?