Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property...
-
Upload
darrell-hill -
Category
Documents
-
view
218 -
download
0
description
Transcript of Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property...
Election
Distributed Systems
Algorithms to Find Global States
• Why? To check a particular property exist or not in distributed system– (Distributed) garbage collection– (Distributed) deadlock detection, termination
• What?– Capture the instantaneous states of a collection of
processes– And the messages in transit on different
communication channels
Detecting global properties• Evaluation of predicates of system’s state
– stable predicates:• distributed garbage collection• deadlock detection• termination detection
– non-stable (transient) predicates:• distributed debugging
– safety properties:• “Nothing bad ever happens”
– eg: mutual exclusion
– liveness properties:• “Something good eventually happens”
– eg: fair scheduling
”I can’t find a solution, I guess I’m just too dumb”
• Picture from Computers and Intractability, by Garey and Johnson
4
”I can’t find an algorithm, because no such algorithm is possible”
• Picture from Computers and Intractability, by Garey and Johnson
5
”I can’t find an algorithm, but neither can all these famous people.”
• Picture from Computers and Intractability, by Garey and Johnson
6
A network partition
Crashedrouter
Garbage Collection
• An object is considered as garbage if there are no
longer any references to it anywhere in
distributed system
Detecting global propertiesp2p1
messagegarbage object
objectreference
a. Garbage collection
p2p1 wait-for
wait-forb. Deadlock
p2p1
activatepassive passivec. Termination
Garbage Collection
• Process P1 references object O1
• Process P2 references object O2
• Message M references object O3
• O4 is garbage object
Obvious First Solution…
• Synchronize clocks of all processes (which algorithm?)
• Ask all processes to record their states at known time t
• Problems?– Time synchronization possible only approximately– Does not record the state of messages in the channels
Snapshots (I)• Chandy & Lamport, 1985
Can you capture (record) the states of all processes and communication channels at exactly 10:04:50 am?
Is it even necessary to take such an exact snapshot?– Any process may initiate a snapshot at any time– Assumes strong connectivity
• At least one path between each process pair– Assumes unidirectional, FIFO channels– Assumes reliable delivery of messages– Records snapshot state locally at all processes
• No direct method for collecting state at a designated “collector” process
Snapshots (II)• Application of 2 rules at each process:
– Marker sending rule:• After Pi has recorded its state, it sends a “marker” message
over each of its outgoing channels (before sending any other message)
– Marker receipt rule: (c := the incoming channel)
• If Pi has not yet recorded its state:
– Pi records its state & records the state of channel c as “ ”
– Pi turns on recording of messages arriving over incoming channels
• Else:
– Pi records the state of channel c as the set of messages that it has received over c since it saved its state
Snapshots (III)
• A process that has received a “marker” message, records its state in finite time & relays the “marker” over its outgoing channels in finite time.
• Strongly connected network• The “marker” traverses each channel, exactly once.• When a process has received a “marker” over all its
incoming channels, its contribution to the snapshot protocol is complete.
In-transit messages are accounted for as belonging to the state of a channel between process.
Global State (I)
a) Organization of a process and channels for a distributed snapshot
Global State (II)
b) Process Q receives a marker for the first time and records its local statec) Q records all incoming messaged) Q receives a marker for its incoming channel and finishes recording the
state of the incoming channel
Elections• Choose a unique process to play a “role” or coordinator
– We require that the elected process be chosen as the one with the largest ID• Ids must be unique & totally ordered
– Eg: ID := <1/load, i>
• Requirements:– safety:
• A participant process has elected == P, where P is chosen as a non-crashed process with max. ID, or elected is undefined
– liveness:• All processes participate & eventually set ‘elected’, or crash
Why Election?• Example 1: Your Bank maintains
multiple servers, but for each customer, one of the servers is responsible, i.e., is the leader
• Example 2: In the sequencer-based algorithm for total ordering of multicasts,–What happens if the “special” sequencer
process fails?• Example 3: Coordinator-based mutual
exclusion: need to elect (and keep) one coordinator
Bully algorithm
• The process P sends election message to all processes with higher number (Assumes that each process knows which processes have higher IDs)
• If no one respond, P wins
• If one answer, it takes over
The Bully algorithmUses timeouts to detect failures: T = 2Ttrans + Tprocess
3 msg types: coordinator: announcement to all processes with lower IDs election: sent to processes with higher IDs answer: answer to “election” - If not received within T, the sender of “election” sends “coordinator”. - Otherwise, the process waits for T’ to receive a “coordinator” msg. If no msg arrives, it begins a new election.
Not “safe” if thecrashed processesare replaced withprocesses with the
same ID !
The Bully Algorithm (1)
• The bully election algorithm• Process 4 holds an election• Process 5 and 6 respond, telling 4 to stop• Now 5 and 6 each hold an election
Example: Bully Election
OKOK
P1 P
2P3P
4
P0
P5
1. P2 initiates election 2. P2 receives “replies
P1 P
2P3P
4
P0
P5
3. P3 & P4 initiate election
P1 P
2P3P
4
P0
P5
P1 P
2P3P
4
P0
P5
4. P3 receives reply
OK
ElectionElection
Election
ElectionElection
Election
P1 P
2P3P
4
P0
P5
5. P4 receives no reply
P1 P
2P3P
4
P0
P5
5. P4 announces itself
coordinat
or
answer=OK
The bully algorithm
p1 p2
p3
p4
p1
p2
p3
p4
Ccoordinator
Stage 4
C
election
electionStage 2
p1
p2
p3
p4
C
election
answer
answer
electionStage 1
timeout
Stage 3
Eventually.....
p1
p2
p3
p4
election
answer
The election of coordinator p2, after the failure of p4 and then p3
• N Processes are organized in a logical ring.– pi has a communication channel to p(i+1) mod N.
– All messages are sent clockwise around the ring.
• Any process that discovers a coordinator has failed initiates an “election” message that contains its own id:attr.
• When a process receives an election message, it compares the attr in the message with its own.– If the arrived attr is greater, the receiver forwards the message.– If the arrived attr is smaller and the receiver has not forwarded an election message earlier,
it substitutes its own id:attr in the message and forwards it. – If the arrived id:attr is that of the receiver, then this process’s attr must be the greatest, and
it becomes the new coordinator. This process then sends an “elected” message to its neighbor announcing the election result.
• When a process pi receives an elected message, it – sets its variable electedi id of the message.
– forwards the message if it is not the new coordinator.
Ring Election
Ring-based election
24
15
9
4
3
28
17
24
1
Any process can begin an election, bymarking itself as “participant” and thensending an “election” msg to its neighbor.
Upon receipt of an election msg: if (arrived ID < receiver’s ID and the receiver is not a participant) { Receiver is marked as a participant; Substitute ID in msg & forward it; } else if(receiver’s ID != arrived ID) { if(receiver is not a participant) { Receiver is marked as a participant; Forward msg; } } else { Receiver becomes coordinator; Send “elected” msg; }
# msgs: (N -1) + N + N
Election Example: Ring Algorithm• Election algorithm using a ring.
Example: Ring Election
Election: 2
Election: 4
Election: 4 Election: 3
Election: 4
P1 P
2P3P
4
P0
P5
1. P2 initiates election
P1 P
2P3P
4
P0
P5
2. P2 receives “election”, P4 dies
P1 P
2P3P
4
P0
P5
3. Election: 4 is forwarded for ever?
May not work when process failure occurs during the election!Consider above example where attr==highest id
Example: Ring Election
Election: 2
Election: 2, 3,4,0,1
Election: 2,3,4Election: 2,3
Coord(4): 2
Coord(4): 2,3
Coord(4) 2, 3,0,1
Election: 2
Election: 2,3
Election: 2,3,0
Election: 2, 3,0,1
Coord(3): 2
Coord(3): 2,3
Coord(3): 2,3,0
Coord(3): 2, 3,0,1
P1 P
2P3P
4
P0
P5
1. P2 initiates election
P1 P
2P3P
4
P0
P5
2. P2 receives “election”, P4 dies
P1 P
2P3P
4
P0
P5
3. P2 selects 4 and announces the result
P1 P
2P3P
4
P0
P5
4. P2 receives “Coord”, but P4 is not included
P1 P
2P3P
4
P0
P5
5. P2 re-initiates election
P1 P
2P3P
4
P0
P5
6. P3 is finally elected
Mutual Exclusion
• Bank Database: Think of two simultaneous deposits of $10,000 into your bank account, each from one ATM.
– Both ATMs read initial amount of $1000 concurrently from the bank server
– Both ATMs add $10,000 to this amount (locally at the ATM)
– Both write the final amount to the server– What’s wrong?
• The ATMs need mutually exclusive access to your account entry at the server
When a process has to write or update shared data, it
enters a critical region to achieve mutual exclusion (no
other process uses shared data at same time)
Critical section problem: Mutual exclusion is required
to prevent interference and ensure consistency when
accessing the resources.
Mutual Exclusion
– Semaphores, mutexes, etc. in local operating systems– Message-passing-based protocols in distributed systems:
• enter() the critical section• AccessResource() in the critical section• exit() the critical section
– Distributed mutual exclusion requirements:• Safety – At most one process may execute in CS at any
time• Liveness – Every request for a CS is eventually granted• Ordering (desirable) – Requests are granted in FIFO
order
Mutual Exclusion
Assumptions
• For all the algorithms studied, we make the following assumptions:–Each pair of processes is connected by reliable
channels (such as TCP). Messages are eventually delivered to recipients’ input buffer.
–Processes will not fail.
• A central coordinator– Grants permission to enter process & keeps a queue of
requests to enter the other processes.– Ensures only one process at a time can access the data
• Operations (coordinator==server)– To enter a process Send a request to the server & wait for
token.– On exiting the CS Send a message to the server to release the
token.• Features:
– Safety, liveness and order are guaranteed – Synchronization delay: one round trip time (release + grant) – The coordinator becomes performance bottleneck and single
point of failure.
Centralized Algrithm
A Centralized Algorithm
a) Process 1 asks the coordinator for permission to enter a critical region. Permission is grantedb) Process 2 then asks permission to enter the same critical region. The coordinator does not reply.c) When process 1 exits the critical region, it tells the coordinator, when then replies to 2
A Distributed Algorithm
a) Two processes want to enter the same critical region at the same moment.b) Process 0 has the lowest timestamp, so it wins.c) When process 0 is done, it sends an OK also, so 2 can now enter the critical
region.
Distributed Algoithm
• The process send message with critical region, process number, and current time to all other process (multicast)
• If the receiver in not in critical region and does not want it, send OK
• If receiver is in critical region, not reply
• If receiver wants to enter critical region, compares time stamp, lowest one wins
A ring-based algorithm
pn
p2
p3
p4
Token
p1 i has a channel to (i +1) mod N
Wait for token to pass, retain it to enter(), releaseit to neighbor when done
Does not preserve the relation
Continuously consumes B/W
Sync. Delay: 1 up to N msg’s
Mutual Exclusion Example: Token Ring Algorithm
a) An unordered group of processes on a network. b) A logical ring constructed in software.
Timestamp Approach Features:
Safety, liveness, and ordering (causal) are guaranteed.
It takes 2(N-1) messages per entry operation (N-1 multicast requests + N-1 replies); Client delay: one round-trip time
Synchronization delay: one message transmission time.
Comparison
• A comparison of three mutual exclusion algorithms.
Algorithm Messages per entry/exit
Delay before entry (in message times) Problems
Centralized 3 2 Coordinator crash
Distributed 2 ( n – 1 ) 2 ( n – 1 ) Crash of any process
Token ring 1 to 0 to n – 1 Lost token, process crash