Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property...

40
Election Distributed Systems

description

Detecting global properties Evaluation of predicates of system’s state –stable predicates: distributed garbage collection deadlock detection termination detection –non-stable (transient) predicates: distributed debugging –safety properties: “Nothing bad ever happens” –eg: mutual exclusion –liveness properties: “Something good eventually happens” –eg: fair scheduling

Transcript of Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property...

Page 1: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Election

Distributed Systems

Page 2: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Algorithms to Find Global States

• Why? To check a particular property exist or not in distributed system– (Distributed) garbage collection– (Distributed) deadlock detection, termination

• What?– Capture the instantaneous states of a collection of

processes– And the messages in transit on different

communication channels

Page 3: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Detecting global properties• Evaluation of predicates of system’s state

– stable predicates:• distributed garbage collection• deadlock detection• termination detection

– non-stable (transient) predicates:• distributed debugging

– safety properties:• “Nothing bad ever happens”

– eg: mutual exclusion

– liveness properties:• “Something good eventually happens”

– eg: fair scheduling

Page 4: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

”I can’t find a solution, I guess I’m just too dumb”

• Picture from Computers and Intractability, by Garey and Johnson

4

Page 5: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

”I can’t find an algorithm, because no such algorithm is possible”

• Picture from Computers and Intractability, by Garey and Johnson

5

Page 6: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

”I can’t find an algorithm, but neither can all these famous people.”

• Picture from Computers and Intractability, by Garey and Johnson

6

Page 7: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

A network partition

Crashedrouter

Page 8: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Garbage Collection

• An object is considered as garbage if there are no

longer any references to it anywhere in

distributed system

Page 9: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Detecting global propertiesp2p1

messagegarbage object

objectreference

a. Garbage collection

p2p1 wait-for

wait-forb. Deadlock

p2p1

activatepassive passivec. Termination

Page 10: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Garbage Collection

• Process P1 references object O1

• Process P2 references object O2

• Message M references object O3

• O4 is garbage object

Page 11: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Obvious First Solution…

• Synchronize clocks of all processes (which algorithm?)

• Ask all processes to record their states at known time t

• Problems?– Time synchronization possible only approximately– Does not record the state of messages in the channels

Page 12: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Snapshots (I)• Chandy & Lamport, 1985

Can you capture (record) the states of all processes and communication channels at exactly 10:04:50 am?

Is it even necessary to take such an exact snapshot?– Any process may initiate a snapshot at any time– Assumes strong connectivity

• At least one path between each process pair– Assumes unidirectional, FIFO channels– Assumes reliable delivery of messages– Records snapshot state locally at all processes

• No direct method for collecting state at a designated “collector” process

Page 13: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Snapshots (II)• Application of 2 rules at each process:

– Marker sending rule:• After Pi has recorded its state, it sends a “marker” message

over each of its outgoing channels (before sending any other message)

– Marker receipt rule: (c := the incoming channel)

• If Pi has not yet recorded its state:

– Pi records its state & records the state of channel c as “ ”

– Pi turns on recording of messages arriving over incoming channels

• Else:

– Pi records the state of channel c as the set of messages that it has received over c since it saved its state

Page 14: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Snapshots (III)

• A process that has received a “marker” message, records its state in finite time & relays the “marker” over its outgoing channels in finite time.

• Strongly connected network• The “marker” traverses each channel, exactly once.• When a process has received a “marker” over all its

incoming channels, its contribution to the snapshot protocol is complete.

In-transit messages are accounted for as belonging to the state of a channel between process.

Page 15: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Global State (I)

a) Organization of a process and channels for a distributed snapshot

Page 16: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Global State (II)

b) Process Q receives a marker for the first time and records its local statec) Q records all incoming messaged) Q receives a marker for its incoming channel and finishes recording the

state of the incoming channel

Page 17: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Elections• Choose a unique process to play a “role” or coordinator

– We require that the elected process be chosen as the one with the largest ID• Ids must be unique & totally ordered

– Eg: ID := <1/load, i>

• Requirements:– safety:

• A participant process has elected == P, where P is chosen as a non-crashed process with max. ID, or elected is undefined

– liveness:• All processes participate & eventually set ‘elected’, or crash

Page 18: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Why Election?• Example 1: Your Bank maintains

multiple servers, but for each customer, one of the servers is responsible, i.e., is the leader

• Example 2: In the sequencer-based algorithm for total ordering of multicasts,–What happens if the “special” sequencer

process fails?• Example 3: Coordinator-based mutual

exclusion: need to elect (and keep) one coordinator

Page 19: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Bully algorithm

• The process P sends election message to all processes with higher number (Assumes that each process knows which processes have higher IDs)

• If no one respond, P wins

• If one answer, it takes over

Page 20: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

The Bully algorithmUses timeouts to detect failures: T = 2Ttrans + Tprocess

3 msg types: coordinator: announcement to all processes with lower IDs election: sent to processes with higher IDs answer: answer to “election” - If not received within T, the sender of “election” sends “coordinator”. - Otherwise, the process waits for T’ to receive a “coordinator” msg. If no msg arrives, it begins a new election.

Not “safe” if thecrashed processesare replaced withprocesses with the

same ID !

Page 21: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

The Bully Algorithm (1)

• The bully election algorithm• Process 4 holds an election• Process 5 and 6 respond, telling 4 to stop• Now 5 and 6 each hold an election

Page 22: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Example: Bully Election

OKOK

P1 P

2P3P

4

P0

P5

1. P2 initiates election 2. P2 receives “replies

P1 P

2P3P

4

P0

P5

3. P3 & P4 initiate election

P1 P

2P3P

4

P0

P5

P1 P

2P3P

4

P0

P5

4. P3 receives reply

OK

ElectionElection

Election

ElectionElection

Election

P1 P

2P3P

4

P0

P5

5. P4 receives no reply

P1 P

2P3P

4

P0

P5

5. P4 announces itself

coordinat

or

answer=OK

Page 23: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

The bully algorithm

p1 p2

p3

p4

p1

p2

p3

p4

Ccoordinator

Stage 4

C

election

electionStage 2

p1

p2

p3

p4

C

election

answer

answer

electionStage 1

timeout

Stage 3

Eventually.....

p1

p2

p3

p4

election

answer

The election of coordinator p2, after the failure of p4 and then p3

Page 24: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

• N Processes are organized in a logical ring.– pi has a communication channel to p(i+1) mod N.

– All messages are sent clockwise around the ring.

• Any process that discovers a coordinator has failed initiates an “election” message that contains its own id:attr.

• When a process receives an election message, it compares the attr in the message with its own.– If the arrived attr is greater, the receiver forwards the message.– If the arrived attr is smaller and the receiver has not forwarded an election message earlier,

it substitutes its own id:attr in the message and forwards it. – If the arrived id:attr is that of the receiver, then this process’s attr must be the greatest, and

it becomes the new coordinator. This process then sends an “elected” message to its neighbor announcing the election result.

• When a process pi receives an elected message, it – sets its variable electedi id of the message.

– forwards the message if it is not the new coordinator.

Ring Election

Page 25: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Ring-based election

24

15

9

4

3

28

17

24

1

Any process can begin an election, bymarking itself as “participant” and thensending an “election” msg to its neighbor.

Upon receipt of an election msg: if (arrived ID < receiver’s ID and the receiver is not a participant) { Receiver is marked as a participant; Substitute ID in msg & forward it; } else if(receiver’s ID != arrived ID) { if(receiver is not a participant) { Receiver is marked as a participant; Forward msg; } } else { Receiver becomes coordinator; Send “elected” msg; }

# msgs: (N -1) + N + N

Page 26: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Election Example: Ring Algorithm• Election algorithm using a ring.

Page 27: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Example: Ring Election

Election: 2

Election: 4

Election: 4 Election: 3

Election: 4

P1 P

2P3P

4

P0

P5

1. P2 initiates election

P1 P

2P3P

4

P0

P5

2. P2 receives “election”, P4 dies

P1 P

2P3P

4

P0

P5

3. Election: 4 is forwarded for ever?

May not work when process failure occurs during the election!Consider above example where attr==highest id

Page 28: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Example: Ring Election

Election: 2

Election: 2, 3,4,0,1

Election: 2,3,4Election: 2,3

Coord(4): 2

Coord(4): 2,3

Coord(4) 2, 3,0,1

Election: 2

Election: 2,3

Election: 2,3,0

Election: 2, 3,0,1

Coord(3): 2

Coord(3): 2,3

Coord(3): 2,3,0

Coord(3): 2, 3,0,1

P1 P

2P3P

4

P0

P5

1. P2 initiates election

P1 P

2P3P

4

P0

P5

2. P2 receives “election”, P4 dies

P1 P

2P3P

4

P0

P5

3. P2 selects 4 and announces the result

P1 P

2P3P

4

P0

P5

4. P2 receives “Coord”, but P4 is not included

P1 P

2P3P

4

P0

P5

5. P2 re-initiates election

P1 P

2P3P

4

P0

P5

6. P3 is finally elected

Page 29: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Mutual Exclusion

• Bank Database: Think of two simultaneous deposits of $10,000 into your bank account, each from one ATM.

– Both ATMs read initial amount of $1000 concurrently from the bank server

– Both ATMs add $10,000 to this amount (locally at the ATM)

– Both write the final amount to the server– What’s wrong?

• The ATMs need mutually exclusive access to your account entry at the server

Page 30: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

When a process has to write or update shared data, it

enters a critical region to achieve mutual exclusion (no

other process uses shared data at same time)

Critical section problem: Mutual exclusion is required

to prevent interference and ensure consistency when

accessing the resources.

Mutual Exclusion

Page 31: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

– Semaphores, mutexes, etc. in local operating systems– Message-passing-based protocols in distributed systems:

• enter() the critical section• AccessResource() in the critical section• exit() the critical section

– Distributed mutual exclusion requirements:• Safety – At most one process may execute in CS at any

time• Liveness – Every request for a CS is eventually granted• Ordering (desirable) – Requests are granted in FIFO

order

Mutual Exclusion

Page 32: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Assumptions

• For all the algorithms studied, we make the following assumptions:–Each pair of processes is connected by reliable

channels (such as TCP). Messages are eventually delivered to recipients’ input buffer.

–Processes will not fail.

Page 33: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

• A central coordinator– Grants permission to enter process & keeps a queue of

requests to enter the other processes.– Ensures only one process at a time can access the data

• Operations (coordinator==server)– To enter a process Send a request to the server & wait for

token.– On exiting the CS Send a message to the server to release the

token.• Features:

– Safety, liveness and order are guaranteed – Synchronization delay: one round trip time (release + grant) – The coordinator becomes performance bottleneck and single

point of failure.

Centralized Algrithm

Page 34: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

A Centralized Algorithm

a) Process 1 asks the coordinator for permission to enter a critical region. Permission is grantedb) Process 2 then asks permission to enter the same critical region. The coordinator does not reply.c) When process 1 exits the critical region, it tells the coordinator, when then replies to 2

Page 35: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

A Distributed Algorithm

a) Two processes want to enter the same critical region at the same moment.b) Process 0 has the lowest timestamp, so it wins.c) When process 0 is done, it sends an OK also, so 2 can now enter the critical

region.

Page 36: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Distributed Algoithm

• The process send message with critical region, process number, and current time to all other process (multicast)

• If the receiver in not in critical region and does not want it, send OK

• If receiver is in critical region, not reply

• If receiver wants to enter critical region, compares time stamp, lowest one wins

Page 37: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

A ring-based algorithm

pn

p2

p3

p4

Token

p1 i has a channel to (i +1) mod N

Wait for token to pass, retain it to enter(), releaseit to neighbor when done

Does not preserve the relation

Continuously consumes B/W

Sync. Delay: 1 up to N msg’s

Page 38: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Mutual Exclusion Example: Token Ring Algorithm

a) An unordered group of processes on a network. b) A logical ring constructed in software.

Page 39: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Timestamp Approach Features:

Safety, liveness, and ordering (causal) are guaranteed.

It takes 2(N-1) messages per entry operation (N-1 multicast requests + N-1 replies); Client delay: one round-trip time

Synchronization delay: one message transmission time.

Page 40: Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property exist or not in distributed system –(Distributed) garbage.

Comparison

• A comparison of three mutual exclusion algorithms.

Algorithm Messages per entry/exit

Delay before entry (in message times) Problems

Centralized 3 2 Coordinator crash

Distributed 2 ( n – 1 ) 2 ( n – 1 ) Crash of any process

Token ring 1 to 0 to n – 1 Lost token, process crash