Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property...

Post on 17-Jan-2018

220 views 0 download

description

Detecting global properties Evaluation of predicates of system’s state –stable predicates: distributed garbage collection deadlock detection termination detection –non-stable (transient) predicates: distributed debugging –safety properties: “Nothing bad ever happens” –eg: mutual exclusion –liveness properties: “Something good eventually happens” –eg: fair scheduling

Transcript of Election Distributed Systems. Algorithms to Find Global States Why? To check a particular property...

Election

Distributed Systems

Algorithms to Find Global States

• Why? To check a particular property exist or not in distributed system– (Distributed) garbage collection– (Distributed) deadlock detection, termination

• What?– Capture the instantaneous states of a collection of

processes– And the messages in transit on different

communication channels

Detecting global properties• Evaluation of predicates of system’s state

– stable predicates:• distributed garbage collection• deadlock detection• termination detection

– non-stable (transient) predicates:• distributed debugging

– safety properties:• “Nothing bad ever happens”

– eg: mutual exclusion

– liveness properties:• “Something good eventually happens”

– eg: fair scheduling

”I can’t find a solution, I guess I’m just too dumb”

• Picture from Computers and Intractability, by Garey and Johnson

4

”I can’t find an algorithm, because no such algorithm is possible”

• Picture from Computers and Intractability, by Garey and Johnson

5

”I can’t find an algorithm, but neither can all these famous people.”

• Picture from Computers and Intractability, by Garey and Johnson

6

A network partition

Crashedrouter

Garbage Collection

• An object is considered as garbage if there are no

longer any references to it anywhere in

distributed system

Detecting global propertiesp2p1

messagegarbage object

objectreference

a. Garbage collection

p2p1 wait-for

wait-forb. Deadlock

p2p1

activatepassive passivec. Termination

Garbage Collection

• Process P1 references object O1

• Process P2 references object O2

• Message M references object O3

• O4 is garbage object

Obvious First Solution…

• Synchronize clocks of all processes (which algorithm?)

• Ask all processes to record their states at known time t

• Problems?– Time synchronization possible only approximately– Does not record the state of messages in the channels

Snapshots (I)• Chandy & Lamport, 1985

Can you capture (record) the states of all processes and communication channels at exactly 10:04:50 am?

Is it even necessary to take such an exact snapshot?– Any process may initiate a snapshot at any time– Assumes strong connectivity

• At least one path between each process pair– Assumes unidirectional, FIFO channels– Assumes reliable delivery of messages– Records snapshot state locally at all processes

• No direct method for collecting state at a designated “collector” process

Snapshots (II)• Application of 2 rules at each process:

– Marker sending rule:• After Pi has recorded its state, it sends a “marker” message

over each of its outgoing channels (before sending any other message)

– Marker receipt rule: (c := the incoming channel)

• If Pi has not yet recorded its state:

– Pi records its state & records the state of channel c as “ ”

– Pi turns on recording of messages arriving over incoming channels

• Else:

– Pi records the state of channel c as the set of messages that it has received over c since it saved its state

Snapshots (III)

• A process that has received a “marker” message, records its state in finite time & relays the “marker” over its outgoing channels in finite time.

• Strongly connected network• The “marker” traverses each channel, exactly once.• When a process has received a “marker” over all its

incoming channels, its contribution to the snapshot protocol is complete.

In-transit messages are accounted for as belonging to the state of a channel between process.

Global State (I)

a) Organization of a process and channels for a distributed snapshot

Global State (II)

b) Process Q receives a marker for the first time and records its local statec) Q records all incoming messaged) Q receives a marker for its incoming channel and finishes recording the

state of the incoming channel

Elections• Choose a unique process to play a “role” or coordinator

– We require that the elected process be chosen as the one with the largest ID• Ids must be unique & totally ordered

– Eg: ID := <1/load, i>

• Requirements:– safety:

• A participant process has elected == P, where P is chosen as a non-crashed process with max. ID, or elected is undefined

– liveness:• All processes participate & eventually set ‘elected’, or crash

Why Election?• Example 1: Your Bank maintains

multiple servers, but for each customer, one of the servers is responsible, i.e., is the leader

• Example 2: In the sequencer-based algorithm for total ordering of multicasts,–What happens if the “special” sequencer

process fails?• Example 3: Coordinator-based mutual

exclusion: need to elect (and keep) one coordinator

Bully algorithm

• The process P sends election message to all processes with higher number (Assumes that each process knows which processes have higher IDs)

• If no one respond, P wins

• If one answer, it takes over

The Bully algorithmUses timeouts to detect failures: T = 2Ttrans + Tprocess

3 msg types: coordinator: announcement to all processes with lower IDs election: sent to processes with higher IDs answer: answer to “election” - If not received within T, the sender of “election” sends “coordinator”. - Otherwise, the process waits for T’ to receive a “coordinator” msg. If no msg arrives, it begins a new election.

Not “safe” if thecrashed processesare replaced withprocesses with the

same ID !

The Bully Algorithm (1)

• The bully election algorithm• Process 4 holds an election• Process 5 and 6 respond, telling 4 to stop• Now 5 and 6 each hold an election

Example: Bully Election

OKOK

P1 P

2P3P

4

P0

P5

1. P2 initiates election 2. P2 receives “replies

P1 P

2P3P

4

P0

P5

3. P3 & P4 initiate election

P1 P

2P3P

4

P0

P5

P1 P

2P3P

4

P0

P5

4. P3 receives reply

OK

ElectionElection

Election

ElectionElection

Election

P1 P

2P3P

4

P0

P5

5. P4 receives no reply

P1 P

2P3P

4

P0

P5

5. P4 announces itself

coordinat

or

answer=OK

The bully algorithm

p1 p2

p3

p4

p1

p2

p3

p4

Ccoordinator

Stage 4

C

election

electionStage 2

p1

p2

p3

p4

C

election

answer

answer

electionStage 1

timeout

Stage 3

Eventually.....

p1

p2

p3

p4

election

answer

The election of coordinator p2, after the failure of p4 and then p3

• N Processes are organized in a logical ring.– pi has a communication channel to p(i+1) mod N.

– All messages are sent clockwise around the ring.

• Any process that discovers a coordinator has failed initiates an “election” message that contains its own id:attr.

• When a process receives an election message, it compares the attr in the message with its own.– If the arrived attr is greater, the receiver forwards the message.– If the arrived attr is smaller and the receiver has not forwarded an election message earlier,

it substitutes its own id:attr in the message and forwards it. – If the arrived id:attr is that of the receiver, then this process’s attr must be the greatest, and

it becomes the new coordinator. This process then sends an “elected” message to its neighbor announcing the election result.

• When a process pi receives an elected message, it – sets its variable electedi id of the message.

– forwards the message if it is not the new coordinator.

Ring Election

Ring-based election

24

15

9

4

3

28

17

24

1

Any process can begin an election, bymarking itself as “participant” and thensending an “election” msg to its neighbor.

Upon receipt of an election msg: if (arrived ID < receiver’s ID and the receiver is not a participant) { Receiver is marked as a participant; Substitute ID in msg & forward it; } else if(receiver’s ID != arrived ID) { if(receiver is not a participant) { Receiver is marked as a participant; Forward msg; } } else { Receiver becomes coordinator; Send “elected” msg; }

# msgs: (N -1) + N + N

Election Example: Ring Algorithm• Election algorithm using a ring.

Example: Ring Election

Election: 2

Election: 4

Election: 4 Election: 3

Election: 4

P1 P

2P3P

4

P0

P5

1. P2 initiates election

P1 P

2P3P

4

P0

P5

2. P2 receives “election”, P4 dies

P1 P

2P3P

4

P0

P5

3. Election: 4 is forwarded for ever?

May not work when process failure occurs during the election!Consider above example where attr==highest id

Example: Ring Election

Election: 2

Election: 2, 3,4,0,1

Election: 2,3,4Election: 2,3

Coord(4): 2

Coord(4): 2,3

Coord(4) 2, 3,0,1

Election: 2

Election: 2,3

Election: 2,3,0

Election: 2, 3,0,1

Coord(3): 2

Coord(3): 2,3

Coord(3): 2,3,0

Coord(3): 2, 3,0,1

P1 P

2P3P

4

P0

P5

1. P2 initiates election

P1 P

2P3P

4

P0

P5

2. P2 receives “election”, P4 dies

P1 P

2P3P

4

P0

P5

3. P2 selects 4 and announces the result

P1 P

2P3P

4

P0

P5

4. P2 receives “Coord”, but P4 is not included

P1 P

2P3P

4

P0

P5

5. P2 re-initiates election

P1 P

2P3P

4

P0

P5

6. P3 is finally elected

Mutual Exclusion

• Bank Database: Think of two simultaneous deposits of $10,000 into your bank account, each from one ATM.

– Both ATMs read initial amount of $1000 concurrently from the bank server

– Both ATMs add $10,000 to this amount (locally at the ATM)

– Both write the final amount to the server– What’s wrong?

• The ATMs need mutually exclusive access to your account entry at the server

When a process has to write or update shared data, it

enters a critical region to achieve mutual exclusion (no

other process uses shared data at same time)

Critical section problem: Mutual exclusion is required

to prevent interference and ensure consistency when

accessing the resources.

Mutual Exclusion

– Semaphores, mutexes, etc. in local operating systems– Message-passing-based protocols in distributed systems:

• enter() the critical section• AccessResource() in the critical section• exit() the critical section

– Distributed mutual exclusion requirements:• Safety – At most one process may execute in CS at any

time• Liveness – Every request for a CS is eventually granted• Ordering (desirable) – Requests are granted in FIFO

order

Mutual Exclusion

Assumptions

• For all the algorithms studied, we make the following assumptions:–Each pair of processes is connected by reliable

channels (such as TCP). Messages are eventually delivered to recipients’ input buffer.

–Processes will not fail.

• A central coordinator– Grants permission to enter process & keeps a queue of

requests to enter the other processes.– Ensures only one process at a time can access the data

• Operations (coordinator==server)– To enter a process Send a request to the server & wait for

token.– On exiting the CS Send a message to the server to release the

token.• Features:

– Safety, liveness and order are guaranteed – Synchronization delay: one round trip time (release + grant) – The coordinator becomes performance bottleneck and single

point of failure.

Centralized Algrithm

A Centralized Algorithm

a) Process 1 asks the coordinator for permission to enter a critical region. Permission is grantedb) Process 2 then asks permission to enter the same critical region. The coordinator does not reply.c) When process 1 exits the critical region, it tells the coordinator, when then replies to 2

A Distributed Algorithm

a) Two processes want to enter the same critical region at the same moment.b) Process 0 has the lowest timestamp, so it wins.c) When process 0 is done, it sends an OK also, so 2 can now enter the critical

region.

Distributed Algoithm

• The process send message with critical region, process number, and current time to all other process (multicast)

• If the receiver in not in critical region and does not want it, send OK

• If receiver is in critical region, not reply

• If receiver wants to enter critical region, compares time stamp, lowest one wins

A ring-based algorithm

pn

p2

p3

p4

Token

p1 i has a channel to (i +1) mod N

Wait for token to pass, retain it to enter(), releaseit to neighbor when done

Does not preserve the relation

Continuously consumes B/W

Sync. Delay: 1 up to N msg’s

Mutual Exclusion Example: Token Ring Algorithm

a) An unordered group of processes on a network. b) A logical ring constructed in software.

Timestamp Approach Features:

Safety, liveness, and ordering (causal) are guaranteed.

It takes 2(N-1) messages per entry operation (N-1 multicast requests + N-1 replies); Client delay: one round-trip time

Synchronization delay: one message transmission time.

Comparison

• A comparison of three mutual exclusion algorithms.

Algorithm Messages per entry/exit

Delay before entry (in message times) Problems

Centralized 3 2 Coordinator crash

Distributed 2 ( n – 1 ) 2 ( n – 1 ) Crash of any process

Token ring 1 to 0 to n – 1 Lost token, process crash