Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi...

34
Comparison-Based System-Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo San ti Reliable Distributed Syst ems, 2001. Proceedings. 2 0 th IEEE Symposium on, 200 1

Transcript of Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi...

Page 1: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Comparison-Based System-Level Fault Diagnosis in Ad Hoc NetworksStefano Chessa, Paolo Santi

Reliable Distributed Systems, 2001. Proceedings. 20th IEEE Symposium on, 2001

Page 2: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Ad-Hoc Networks

Networks of mobile, untethered units communicating via radio transmitters/receivers.Also called multi-hop packet radioThe communication paradigm:one-to-manyNot shared

Page 3: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

The system model

n mobile hosts.Communication graph G(t)=(V(t), L(t)) at time tL(t) set of logical links at time tG(t) is undirected(u,v)L(t) if u is adjacent to v at time t

Page 4: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

The system model (continue)

N(u,t) neighbor set of u at time tEach mobile has a unique identifierThere exits a link-level protocol providing the following services A MAC protocol is executed to solve contentions

over logical links. The protocol provides a 1-hop reliable broadcas

t primitive, called 1_rb(.), to the upper level. The receiver of a message knows the identity of

the sender.

Page 5: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

The invalidation rule of the gMM model

u v w Comparison outcome of v and w generated by u

Fault-free

Fault-free

Fault-free

0

Fault-free

Faulty Fault-free

1

Fault-free

Fault-free

Faulty 1

Fault-free

Faulty Faulty 1

Faulty any any x

Page 6: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Hard faulted

Crashed or battery depletionIs unable to communicate with the rest of the system.Is unable to respond to the test request.

Page 7: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Soft faulted

SubtleProduce random and independent results

Page 8: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Two models

Two implementations of model under the hypothesis of fixed and time-varying topology.In both implementations we assume that faults are static

Page 9: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Fixed topology comparison protocol

N(u,t)=N(u,t’)=N(u) for t<t’t+Tout

Test request generationTest request receptionTest response receptionTimeout reception

Page 10: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Test request generation

At time t, unit u generates a test sequence number i, a test task Ti, the expected result Ru,i and sends the message m=(u,i,Ti) to N(u). Message m is called a test request(u,i) is the header of the test requestUnit u sends a message to the timer

Page 11: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Test request reception

Any unit v in N(u), upon receiving m, generates the result Rv,i for Ti and invokes 1_rb(m’) at time t’, with t<t’t+Tout

m’ =(u,i,Rv,i) is called a test response(u,i) is the header of the test response

Page 12: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Test response reception

Any unit w in N(v), upon receiving m’=(u,i,Rv,i), does the following: If w=u If wu

wN(u)wN(u)

Page 13: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

w=u?

Rv,i = Ru,i ?

exists )( uNz

Rz,i = Rv,i

z is fault-free ?

Y

N

Y

v is fualt -freev is faulty

store Rv,i

both v,z are fualt -free

v is faulty

N Y

N

Y

N

Y)(uNw

N

Rv,i = Rw,i

Y

N

Y

N

Page 14: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.
Page 15: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.
Page 16: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Timeout reception

At time t+Tout the testing unit u receives the message from the timer and diagnosis as faulty all the units that did not reply to the test request.

Page 17: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Notations

Nu(v)=N(v)N(u)N2(u)={zV-N(u): |Nu(z)|2}

Page 18: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Theorem 1

Assume the network topology is fixed, and assume that the fault-free node u generates a test request at time t. Then at time t+Tout: Unit u has correctly diagnosed that state of all the

units in N(u). Any fault-free unit v in N(u) has correctly diagnosed

the state of the fault-free and soft-faulted units in Nu(v)

Any fault-free unit z in N2(u) has correctly diagnosed the state of the fault-free and soft-faulted units in Nu(z) if at least two units in Nu(z) are fault-free.

Page 19: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Time-varying topology comparison protocol

N(u,t)N(u,t+Tout)

Test request generationTest request receptionTest response receptionTimeout reception

Page 20: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Test request generation

At time t, unit u generates a test sequence number i, a test task Ti, the expected result Ru,i and sends the message m=(u,i,Ti) to N(u,t). Message m is called a test request(u,i) is the header of the test requestUnit u sends a message to the timer

Page 21: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Test request reception

Any unit v in N(u,t), upon receiving m, generates the result Rv,i for Ti and invokes 1_rb(m’) at time t’, with t<t’t+Tout

m’ =(u,i,Rv,i) is called a test response(u,i) is the header of the test response

Page 22: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Test response reception

Any unit w in N(v,t’), upon receiving m’=(u,i,Rv,i), does the following: If w=u If wu

wN(u,t)wN(u,t)

Page 23: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

w=u?

Rv,i = Ru,i ?

exists ),( tuNz

Rz,i = Rv,i

z is fault-free ?

Y

N

Y

v is fualt -freev is faulty

store Rv,i

both v,z are fualt -free

v is faulty

N Y

N

Y

N

Y),( tuNw

N

Rv,i = Rw,i

Y

N

Y

N

Page 24: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Timeout reception

At time t+Tout the testing unit u receives the message from the timer and diagnosis as faulty all the units that did not reply to the test request.

Page 25: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Theorem 2

Assume that the fault-free node u generates a test request at time t, and that the network topology can vary. Then, at time t+Tout unit u has correctly diagnosed the state of all the units in

)(uN r

S

Page 26: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Notations

)( in units faulted-soft

or free-fault ofset thebe )(

),(),()(

uN

uN

TuNtuNuN

S

r

S

outS

Page 27: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

The diagnosis protocol

A fault-free unit u generates a test request (u,i,Ti), sent it to N(u), and computes the expected results Ru,i. Then it sends a message to the timer.Unit u waits for the responses of units in N(u) and diagnosis their state according to the comparison protocol. When u has diagnosed the state of all units in N(u), it generates a dissemination message containing its local diagnosis.

Page 28: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

The diagnosis protocol (continue)

Then, unit u waits for dissemination messages generated by other fault-free mobiles in order to complete its diagnosis. The diagnosis protocol for u terminates when the state of all the units in the system has been identified.

Page 29: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

The diagnosis protocol (continue)

The diagnosis session starts when a fault-free unit initiates its diagnosis protocol, and ends when the diagnosis protocol execution is terminated by every fault-free unit.A diagnostic message can be a test request, a test response, a timeout message or a dissemination message.

Page 30: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

The diagnosis protocol (continue)

Dissemination messages are messages generated by fault-free units to propagate the diagnosis of their neighbors throughout the network.Any fault-free unit, upon receiving a dissemination message from a neighbor v, does not propagate it until mobile v has been diagnosed as fault-free.Every fault-free unit either propagates or discards any dissemination message in time at most Tout.

Page 31: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Lemma 1 (Dissemination correctness)

Let G=(V,L) be the graph representing the system at the time of diagnosis. If G is connected and the total number of faulty mobiles in the system is at most (G)-1, where (G) is the connectivity of G, then the dissemination message generated by a fault-free unit is correctly received by any other fault-free unit in the system in a finite time.

Page 32: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Theorem 3

Let G=(V,L) be the graph representing the system at the time of diagnosis. If G is connected and the total number of faulty mobiles in the system is at most (G)-1, then every fault-free unit correctly diagnosis the state of all the mobiles in the system in finite time.

Page 33: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Theorem 4 (communication complexity)

Let G=(V,L) be the graph representing the system at the time of diagnosis. The communication complexity of the diagnosis protocol is O(n(n+1+dmax)), where n=|V| and dmax is the maximum of the node degrees.

Page 34: Comparison-Based System- Level Fault Diagnosis in Ad Hoc Networks Stefano Chessa, Paolo Santi Reliable Distributed Systems, 2001. Proceedings. 20 th IEEE.

Theorem 5 (time complexity)

Let G=(V,L) be the graph representing the system at the time of diagnosis. Let Tgen be an upper bound to the elapsed time between the reception of the first diagnostic message and the generation of the test request, and let Tf be an upper bound to the time needed to propagate a dissemination message. The time complexity of the diagnosis protocol is O((Tgen+Tf)+Tout), where is the diameter of G.