Real-Time Distributed Discrete-Event Execution with Fault Tolerance

15
IEEE Real-Time and Embedded Technology and Applications Symposium, April 2008 Real-Time Distributed Real-Time Distributed Discrete-Event Execution with Discrete-Event Execution with Fault Tolerance Fault Tolerance Thomas Huining Feng and Edward A. Lee Center for Hybrid and Embedded Software Systems EECS, UC Berkeley {tfeng, eal}@eecs.berkeley.edu

description

Real-Time Distributed Discrete-Event Execution with Fault Tolerance. Thomas Huining Feng and Edward A. Lee Center for Hybrid and Embedded Software Systems EECS, UC Berkeley {tfeng, eal}@eecs.berkeley.edu. Distributed Discrete-Event Execution Strategy. - PowerPoint PPT Presentation

Transcript of Real-Time Distributed Discrete-Event Execution with Fault Tolerance

Page 1: Real-Time Distributed Discrete-Event Execution with Fault Tolerance

The 14th IEEE Real-Time and Embedded Technology and Applications Symposium, April 2008

Real-Time Distributed Discrete-Event Real-Time Distributed Discrete-Event Execution with Fault ToleranceExecution with Fault Tolerance

Thomas Huining Feng and Edward A. Lee

Center for Hybrid and Embedded Software Systems

EECS, UC Berkeley

{tfeng, eal}@eecs.berkeley.edu

Page 2: Real-Time Distributed Discrete-Event Execution with Fault Tolerance

RTAS 2008 Feng & Lee, CHESS, EECS, UC Berkeley

…, e2 = (null, t2)

Distributed Discrete-Event Execution Strategy

• Execution strategy decides whether/when it is safe to process an input event.

• Conventional: Compute can process top event e1 if e2 has a greater time stamp.

• Null message (null, t2)Cons: overhead, sensitive to faults, lack of real-time property

2

AA

BB

Sensor

Sensor

Source

Merge

…, e1 = (v1, t1)CC

Compute Actuator…

…, e2 = (v2, t2)

i1

i2

Page 3: Real-Time Distributed Discrete-Event Execution with Fault Tolerance

RTAS 2008 Feng & Lee, CHESS, EECS, UC Berkeley

Overview of Our Approach

• Leverage time-synchronized platforms

• Eliminate null messages

• Potentially improves concurrency

• Decompose assertions of real-time properties

• Recover software components from faults

3

AA

BB

Sensor

Sensor

Source

Merge

CC

Compute Actuator

Page 4: Real-Time Distributed Discrete-Event Execution with Fault Tolerance

RTAS 2008 Feng & Lee, CHESS, EECS, UC Berkeley

• n cameras located around a football field, all connected to a central computer.

• Events at blue ports satisfy t ≤ τ (t – time stamp of any event; τ – real time)

• Events at red ports satisfy t ≥ τ

Reference Application: Distributed Cameras

4

Page 5: Real-Time Distributed Discrete-Event Execution with Fault Tolerance

RTAS 2008 Feng & Lee, CHESS, EECS, UC Berkeley

Reference Application: Distributed Cameras

Problems to solve:– Make event-processing decisions locally

– Guarantee timely command delivery to the Devices

– Guarantee real-time update at the Display

– Tolerate images loss or corruption at Image Processor

5

Page 6: Real-Time Distributed Discrete-Event Execution with Fault Tolerance

RTAS 2008 Feng & Lee, CHESS, EECS, UC Berkeley

Minimum Model-Time Delay δ

δ : P × P → R+ U {∞} returns the minimum model-time delay between any two ports.

(P – set of ports; R+ – set of non-negative reals.)

Example: δ(i5, o1) = min{δ5+δ1, δ5+δ4+δ2}, where δ1 , …, δ6 R+ are pre-defined.

6

i1

i2

i3

o1

o2

i5 o5δ5

δ6

i6

i4δ4

δ4'

o3

o4

δ1

δ2δ3

Page 7: Real-Time Distributed Discrete-Event Execution with Fault Tolerance

RTAS 2008 Feng & Lee, CHESS, EECS, UC Berkeley

Intuition of Execution Strategy

When is it safe to process e = (v, t) at i1?

1.future events at i1, i2 and i3 have time stamps ≥ t (conventional), or

2.future events at i1 and i2 have time stamps ≥ t, or

3.future events at i1 have time stamps ≥ t, andfuture events at i2 depend on events at i4 with time stamps ≥ t – δ4, or

4.future events at i1 and i2 depend on events at i5 and i6 with time stamps ≥ t – min{δ5, δ6, δ5 + δ4, δ6 + δ4}.

e = (v, t)

7

i1

i2

i3

o1

o2

i5 o5δ5

δ6

i6

i4δ4

δ4'

o3

o4

δ1

δ2δ3

Page 8: Real-Time Distributed Discrete-Event Execution with Fault Tolerance

RTAS 2008 Feng & Lee, CHESS, EECS, UC Berkeley

Relevant Dependency

i ~ i' iff they are input of the same actor and affect a common output. An equivalence class is a transitive closure of ~.

Construct a collapsed graph, and compute relevant dependency between equivalence classes.

d(ε', ε) = mini'ε', iε {δ(i', i)}8

i1

i2

i3

o1

o2

i5 o5δ5

δ6

i6

i4δ4

δ4'

o3

o4

δ1

δ2δ3

min{δ5 , δ

6}

ε3

ε1

ε2

min{δ5, δ6, δ5 + δ4, δ6 + δ4}ε4

δ4

δ4'

min{δ5 + δ4', δ6 + δ4'}

Page 9: Real-Time Distributed Discrete-Event Execution with Fault Tolerance

RTAS 2008 Feng & Lee, CHESS, EECS, UC Berkeley

Dependency Cut

A dependency cut for ε is a minimal but complete set of equivalence classes that needs to be considered to process an event at ε.

Example: C1 and C2 are both dependency cuts for ε1.

9

min{δ5 , δ

6}

ε3

ε1

ε2

min{δ5, δ6, δ5 + δ4, δ6 + δ4}ε4

δ4

δ4'

min{δ5 + δ4', δ6 + δ4'}

C1

C2

Page 10: Real-Time Distributed Discrete-Event Execution with Fault Tolerance

RTAS 2008 Feng & Lee, CHESS, EECS, UC Berkeley

Execution Strategy

Determine top event e = (v, t) at ε1 safe to process

• If we choose C1: future events at ε1 have time stamps ≥ t.

• If we choose C2: for any ε C2, future events at in ε1 depend on events at ε with time stamps ≥ t – d(ε, ε1).

• In general, we can freely choose any dependency cut.

10

min{δ5 , δ

6}

ε3

ε1

ε2

min{δ5, δ6, δ5 + δ4, δ6 + δ4}ε4

δ4

δ4'

min{δ5 + δ4', δ6 + δ4'}

C1

C2

Page 11: Real-Time Distributed Discrete-Event Execution with Fault Tolerance

RTAS 2008 Feng & Lee, CHESS, EECS, UC Berkeley

Implementation of the Execution Strategy

• n + 1 platforms with synchronized clocks (IEEE 1588).

• Choose dependency cuts at platform boundary.

• A queue stores events local to the platform.

• At real time τ, future events have time stamps ≥ τ – dn.

11

network latency dn

…Queue

Page 12: Real-Time Distributed Discrete-Event Execution with Fault Tolerance

RTAS 2008 Feng & Lee, CHESS, EECS, UC Berkeley

Tolerating Loss of Images

• Start the composition as soon as the starting packets are received.

• Create a checkpoint at the beginning (small constant overhead)

• Backtrack when fault is detected (linear in memory locations)

• In most cases, discard the checkpoint (garbage collection)

connection lost

Camera 1

Camera 2

12

start start

Page 13: Real-Time Distributed Discrete-Event Execution with Fault Tolerance

RTAS 2008 Feng & Lee, CHESS, EECS, UC Berkeley

A Program Transformation Approach

Before Transformation After Transformation

int s;void f(int i) { s = i;}

int s;void f(int i) { $ASSIGN$s(i);}

An assignment is transformed into a function call to record the old value:

private final int $ASSIGN$s(int newValue) { if ($CHECKPOINT != null && $CHECKPOINT.getTimestamp() > 0) { $RECORD$s.add(null, s, $CHECKPOINT.getTimestamp()); } return s = newValue;}

This incurs a constant overhead.

13

Page 14: Real-Time Distributed Discrete-Event Execution with Fault Tolerance

RTAS 2008 Feng & Lee, CHESS, EECS, UC Berkeley

Observation:

The overhead for each basic operation isconstant.

A Program Transformation Approach

Before Transformation After Transformation

int s;void f(int i) { s = i;}

int s;void f(int i) { $ASSIGN$s(i);}

Image img;int partNum;void consume(Packet p1, Packet p2) { if (img == null) { img = new Image(); partNum = 0; } img.parts[partNum] = compose(p1, p2); partNum++;}

Image img;int partNum;void consume(Packet p1, Packet p2) { if (img == null) { $ASSIGN$img(new Image()); $ASSIGN$partNum(0); } img.$ASSIGN$parts(partNum, compose(p1, p2)); $ASSIGN$SPECIAL$partNum(11, -1);}

0: +=1: -=

...11: ++12: --

14

Value, not used for ++.

Page 15: Real-Time Distributed Discrete-Event Execution with Fault Tolerance

RTAS 2008 Feng & Lee, CHESS, EECS, UC Berkeley

Conclusion and Future Work

• Advantages– Eliminate null messages

– Decompose real-time schedulability analysis

– Advance the system even when some platforms fail

– Tolerate faults without sacrificing real-time properties

• Future Work– Examine different choices of dependency cuts

– Develop static WCET (worst-case execution time) analysis to guarantee real-time properties on each platform

– Build an implementation to support a variety of real applications

– Exploit parallelism with multi-core platforms

15