Distributed Commit Dr. Yingwu Zhu. Failures in a distributed system Consistency requires agreement...
-
Upload
gilbert-oliver -
Category
Documents
-
view
222 -
download
0
Transcript of Distributed Commit Dr. Yingwu Zhu. Failures in a distributed system Consistency requires agreement...
Distributed Commit
Dr. Yingwu Zhu
Failures in a distributed system
• Consistency requires agreement among multiple servers– Is transaction X committed?– Have all servers applied update X to a replica?
• Achieving agreement w/ failures is hard– Impossible to distinguish host vs. network failures
• This class: – all-or-nothingall-or-nothing atomicity in distributed systems
Distributed Commit Problem
• Some applications perform operations on multiple databases– Transfer funds between two bank accounts– Debiting one account and crediting another
• We would like a guarantee that either all the databases get updated, or none does
• Distributed Commit Problem (all-or-noneall-or-none semantics):– Operation is committed when all participants can perform
it– Once a commit decision is reached, this requirement holds
even if some participants fail and later recover
Transaction
Transaction behave as one operation• Atomicity: all-or-none, if transaction failed
then no changes apply to the database• Consistency: there is no violation of the
database integrity constraints• Isolation: partial results are hidden (due to
incomplete transactions)• Durability: the effects of transactions that
were committed are permanent
Example
Bank A Bank B
Transfer $1000From A:$3000To B:$2000
• Clients want all-or-nothing transactions– Transfer either happens or not at all
client
Strawman solution
Bank A Bank B
Transfer $1000From A:$3000To B:$2000
client
Transactioncoordinator
Strawman solution
• What can go wrong?– A does not have enough money– B’s account no longer exists– B has crashed– Coordinator crashes
client transactioncoordinator
bank A bank B
start
doneA=A-1000
B=B+1000
One-Phase Commit
• A coordinator tells all other processes (participants) whether or not to perform the operation in question
• Problem: – If one participant fails to perform the operation,
no way to tell the coordinator– Violate the all-or-noneall-or-none rule!
Two-Phase Commit (2PC) Overview
• Assumes a coordinator that initiates the commit/abort
• Each participant votes if it is ready to commit– Until the commit actually occurs, the update is
considered temporary (placed in temp area)temporary (placed in temp area)– The participant is permitted to discard a pending
update • Until all participants vote “ok”, a participant can abort
• Coordinator decides outcome and informs all participants
2PC: More Details• Operates in rounds• Coordinator assigns unique identifiers for each
protocol run. How? It’s time to use logical clocks: run identifier can be process ID and the value of logical clock
• Messages carry the identifier of protocol run they are part of
• Since lots of messages must be stored, a garbage collection must be performed, the challenge is to determine when it is safe to remove the information
Participant States• Initial state: pi is not aware
that protocol started, ends when pi received the ready_to_commit and it is ready to send its Ok
• Prepared to commit: pi sent its Ok, saves in temp area and waits for the final decision (Commit or Abort) from coordinator
• Commit or abort state: pi knows the final decision, it must execute it
2PC State Transition
(a) The finite state machine for the coordinator in 2PC.(b) The finite state machine for a participant.Timeout mechanism is used here for coordinator and participants. Coordinator
blocked in “WAIT”, participant blocked in “INIT”
2PC
• Ideal world: coordinator and participants never fail.
• How 2PC works?
2PC: Back to Reality
• Each participant can fail at any time
• Coordinator can fail at any time
• Question: how to make 2PC work?
Problem Solving Step-by-Step
• Step1: assume the coordinator never fails but the participant could fail at any time
• Step2: assume the coordinator could fail at any time
Step1: what can go wrong on participants!
• Initial state (INIT): if pi crashes before it received vote-request. It does not send it’s vote back, the coordinator will abort the protocol (not enough votes are received). – implemented by timeouts.
• Prepared to commit(READY): if pi crashes before it learns the outcome, resources remained blocked. It is critical that a crashed participant learns the outcome of pending operations when it comes back: How?
• COMMIT or ABORT state: pi crashes before executing, it must complete the commit or abort repeatedly in spite of being interrupted by failures: How?
Group Discussions
• How to modify the 2PC to address the participant failures?
What modifications are needed?• A process must remember
in what state it was before crashing resume pointresume point
• A process must find out the outcome (by contacting the coordinator) redeem redeem orderorder
• The coordinator must find out when a process indeed completed the decision, since it can crash before executing it garbage collection @ garbage collection @ coordinatorcoordinator
2PC: Overcome Participant Failures• Coordinator:
– Multicast: vote-request– Collect replies/votes
• All vote-commit => log ‘commit’ to ‘outcomes’ table and send commit• Else => log ‘abort’ send abort
– Collect acknowledgments– Garbage-collect protocol outcome information
• Participant:– vote-request => log its vote and send vote-(commit/abort)– commit => make changes permanent, send acknowledgment– abort => delete temp area– After failure:
• For each pending protocol: contact coordinator (or other participants) to learn outcome
Step 2: What if coordinator fails?
• If coordinator crashed during first phase (WAIT):– some participants will be ready to commit– others will not be able to (they voted on abort)– other processes may not know what the state is
• If coordinator crashed during its decision or before sending it out:– some processes will be in READY state– some others will know the outcome
Group Discussions
• How to overcome the coordinator failures?
Improvement• If coordinator fails, processes are blocked
waiting for it to recover• After the coordinator recovers, there are
pendingpending protocols that must be finished• Coordinator must remember its state
before crashing – Write INIT & WAIT state on permanent
storage– write GLOBAL_COMMIT or GLOBAL_ABORT
on permanent storage before sending commit or abort decision to other processes and push these operations through
• Participants may see duplicated messages (due to message re-transmission by coordinator)
2PC: Overcome Coordinator Failures (1)
Coordinator:• Multicast: vote-request• Collect replies/votes
– All vote-commit => log ‘commit’ to ‘outcomes’ table, wait until safe on persistent storage and send commit
– Else => log ‘abort’, send abort• Collect acknowledgments• Garbage collect protocol outcome information
After failure:• For each pending protocol in outcomes table
– Possibly re-transmit VOTE_REQUEST if in WAIT– Send outcome (commit or abort) – Wait for acknowledgments– Garbage collect outcome information
2PC: Overcome Coordinator Failures (2)
Participant: first time message received• Vote-request
– save to temp area and reply its vote –(commit / abort)• Global_commit
– make changes permanent• Global_abort
– delete temp area
Message is a duplicate (recovering coordinator)– Send acknowledgment
After failure:• For each pending protocol:
– contact coordinator to learn outcome
2PC: Coordinator
Outline of the steps taken by the coordinator in 2PC protocolOutline of the steps taken by the coordinator in 2PC protocol..
. . .
2PC: participant
• The steps taken by a participant process in 2PC.
2PC: decision query from other participants
State of Q Action by P
COMMIT Transition to COMMIT
ABORT Transition to ABORT
INIT Transition to ABORT
READY Contact another participant*
Problem with 2PC
• The crash of the coordinator may block participants to reach a final decision until it recovers– during the decision stage– All participants in READY
status, cannot cooperatively decide the final decision!
– Solution: 3PC
Another example of 2PC
2 PC• Phase 1 (voting phase)– (1) coordinator sends canCommit? to participants– (2) participant replies with vote (Yes or No); before voting
Yes prepares to commit by saving objects in temp area, and if No
aborts• Phase 2 (completion according to outcome of vote)– (3) coordinator collects votes (including own)
• if no failures and all Yes, sends doCommit to participants• otherwise, sends doAbort to participants
– (4) participants that voted Yes wait for doCommit or doAbort and act accordingly; confirm their action to coordinator by sending haveCommitted
Communication in 2 PC
3 PC Overview• Remember that 2 PC blocks when coordinator
crashes during the decision stage– Participants are blocked until the coordinator
recovers!• Guarantees that the protocol will not block when
only fail-stop failures occur– Avoid blocks in 2 PC– Model is not realistic, but still interesting to look at
• A process fails only by crashing, crashes are accurately detectable
• Requires a fourth round for garbage collection
3PC Key Idea
• Introduces an additional round of communication and delays to prepare-to-prepare-to-commitcommit state to ensure that the state of the system can always be deduced by a subset of a subset of alive participantsalive participants that can communicate with each other– before the commit, coordinator tells all
participants that everyone sent Oks (ready_commit)
3PC: Coordinator
Coordinator:• Multicast: vote-request• Collect votes/replies– All commit => log ‘precommit’ and send precommit– Else => log ‘abort’, send abort
• Collect acks from non-failed participants– All ‘ready-commit’ => log commit and send global-
commit• Collect acknowledgements• Garbage collect protocol outcome information
3PC: ParticipantParticipant: logs state on each message• Vote-request
– save to temp area and reply vote-(commit/abort)• precommit
– Enter precommit state, send ack (ready-commit)• commit
– make changes permanent• abort
– delete temp area
After failure:• Collect participant state information• All precommit or any committed
– Push forward the commit• Else
– Push back the abort
3PC State Transition
• (a) The finite state machine for the coordinator in 3PC. (b) The finite state machine for a participant.
Question: Now can participants be blocked in READY status?Question: Now can participants be blocked in READY status?
Check if 2PC’s problem has been solved?
• A participant can be blocked in– INIT: abort (no participant in PRECOMMIT, why?)– READY: if a majority of participants in READY, safe to
abort– PRECOMMIT: if all participants in PRECOMMIT, then
COMMIT, otherwise safe to abort• In summary, if a participant in READY, all crashed
participants can only recover to INIT, ABORT, or PRECOMMIT, which allows surviving processes can always come to a final decision
Summary Slides
2 PC• Is a blockingblocking protocol• Consists of a coordinator and participants.
1. Coordinator multicasts a VOTE_REQUEST message to all participants.2. When a participant receives a VOTE_REQUEST message, it replies
(unicast) with either VOTE_COMMIT or VOTE_ABORT.• A VOTE_COMMIT response is essentially a contractual guarantee that it
will be able to commit.3. Coordinator collects all votes. If all are VOTE_COMMIT, then it
multicasts a GLOBAL_COMMIT message. Otherwise, it will multicast a GLOBAL_ABORT message.
4. When a participant receives GLOBAL_COMMIT, it locally commits; if it receives GLOBAL_ABORT, it locally aborts.
2 PC FSMs
• Where does the waiting/blocking occur?– Coordinator-WAIT– Participant-INIT– Participant-READY
Coordinator Participant
Two-Phase Commit Recovery
• What happens in case of a crash? How do we detect a crash?– If timeout in Coordinator-WAIT, then abort.– If timout in Participant-INIT, then abort.– If timout in Participant-READY, then need to find out if globally
committed or aborted.• Just wait for Coordinator to recover.• Check with others.
Coordinator Participant
Wait StateWait States
Two-Phase Commit Recovery
• If in Participant-READY, and we wish to check with others:– If Q is in COMMIT, then commit. If Q is in ABORT, then
ABORT.– If Q in INIT, then can safely ABORT.– If all in READY, nothing can be done. 3 PC
Coordinator
Wait State
Participant
Wait States