Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf ·...
Transcript of Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf ·...
![Page 1: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/1.jpg)
Chandy-Lamport Snapshotting
COS 418: Distributed SystemsPrecept 8
Themis Melissaris and Daniel Suo
[Content adapted from I. Gupta]
Distributed Snapshots: Determining Global States of a Distributed SystemK. Mani Chandy and Leslie Lamport ACM Transactions on Computer SystemsFebruary 4, 1985
![Page 2: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/2.jpg)
Globalsnapshots
3
![Page 3: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/3.jpg)
Exampleofaglobalsnapshot
4
![Page 4: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/4.jpg)
Butthatwaseasy
• Inoursystemofworldleaders,wewereabletocapturetheir‘state’(i.e.,likeness)easily– Synchronizedinspace– Synchronizedintime
• Howwouldwetakeaglobalsnapshotiftheleaderswereallathome?
• WhatifObamatoldTrudeauthatheshouldreallyputonashirt?
• Thismessageispartofoursystemstate!5
![Page 5: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/5.jpg)
Globalsnapshotisglobalstate
• Eachdistributedapplicationhasanumberofprocesses(leaders)runningonanumberofphysicalservers
• Theseprocessescommunicatewitheachotherviachannels(textmessaging)
• Asnapshot capturesthelocalstatesofeachprocess(e.g.,programvariables)alongwiththestateofeachcommunicationchannel
6
![Page 6: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/6.jpg)
Whydoweneedsnapshots?
• Checkpointing:restartiftheapplicationfails• Collectinggarbage:removeobjectsthatdon’thaveanyreferences
• Detectingdeadlocks:canexaminethecurrentapplicationstate
• Otherdebugging:alittleeasiertoworkwiththanprintf…
7
![Page 7: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/7.jpg)
Wecouldjustsynchronizeclocks
• Eachprocessrecordsstateattimesomeagreedupont– Butclocksskew– Andwewouldn’trecordmessages
• Doweneedsynchronization?• WhatdidLamport realizeaboutorderingevents?
8
![Page 8: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/8.jpg)
• Twoprocesses:P1andP2
Exampleofglobalsnapshotsv2
9
P1 P2
![Page 9: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/9.jpg)
• ChannelC12 fromP1toP2• ChannelC21 fromP2toP1
Exampleofglobalsnapshotsv2
10
P1 P2
C12
C21
![Page 10: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/10.jpg)
• ProcessstatesforP1andP2
Exampleofglobalsnapshotsv2
11
P1 P2
C12
C21
X:0Y:0Z:0
X:1Y:2Z:3
![Page 11: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/11.jpg)
• Channelstates(i.e.,messages)forC12andC21• Thisisourinitialglobalstate• Alsoaglobalsnapshot
Exampleofglobalsnapshotsv2
12
P1 P2
C12:[Empty]
C21:[Empty]
X1:0Y1:0Z1:0
X2:1Y2:2Z2:3
![Page 12: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/12.jpg)
• P1 tellsP2 tochangeitsstatevariable,X2,from1to4
• Thisisanotherglobalsnapshot
Exampleofglobalsnapshotsv2
13
P1 P2
C12:[X2 → 4]
C21:[Empty]
X1:0Y1:0Z1:0
X2:1Y2:2Z2:3
![Page 13: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/13.jpg)
• P2 receivesthemessagefromP1• Anotherglobalsnapshot
Exampleofglobalsnapshotsv2
14
P1 P2
C12:[Empty]
C21:[Empty]
X1:0Y1:0Z1:0
X2:1Y2:2Z2:3
X2 → 4
![Page 14: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/14.jpg)
• P2 changesitsstatevariable,X2,from1to4• Andanotherglobalsnapshot
Exampleofglobalsnapshotsv2
15
P1 P2
C12:[Empty]
C21:[Empty]
X1:0Y1:0Z1:0
X2:4Y2:2Z2:3
![Page 15: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/15.jpg)
• Theglobalstatechangeswheneveraneventhappens– Processsendsmessage– Processreceivesmessage– Processtakesastep
• Movingfromstatetostateobeyscausality
Summary
16
![Page 16: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/16.jpg)
Chandy-Lamport algorithm
17
![Page 17: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/17.jpg)
• Problem:recordaglobalsnapshot(stateforeachprocessandchannel)
• Model– N processesinthesystemwithnofailures– TherearetwoFIFOunidirectionalchannelsbetweeneveryprocesspair(Pi →Pj andPj →Pi)
– Allmessagesarrive,intact,notduplicated• Futureworkrelaxestheseassumptions
Systemmodel
18
![Page 18: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/18.jpg)
• Takingasnapshotshouldn’tinterferewithnormalapplicationbehavior– Don’tstopsendingmessages– Don’tstoptheapplication!
• Eachprocesscanrecorditsownstate• Collectstateinadistributedmanner• Anyprocesscaninitiateasnapshot
Systemrequirements
19
![Page 19: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/19.jpg)
• Let’ssayprocessPi initiatesthesnapshot• Pi recordsitsownstateandpreparesaspecialmarkermessage(distinctfromapplicationmessages)
• Sendthemarkermessagetoallotherprocesses(usingN-1 outboundchannels)
• StartrecordingallincomingmessagesfromchannelsCji forj notequaltoi
Initiatingasnapshot
20
![Page 20: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/20.jpg)
• ForallprocessesPj (includingtheinitiator),consideramessageonchannelCkj
• Ifweseemarkermessageforthefirsttime– Pj recordsownstateandmarksCkj asempty– Sendthemarkermessagetoallotherprocesses(usingN-1 outboundchannels)
– StartrecordingallincomingmessagesfromchannelsClj forl notequaltojork
• Elseaddallmessagesfrominboundchannelssincewebeganrecordingtotheirstates
Propagatingasnapshot
21
![Page 21: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/21.jpg)
• Allprocesseshavereceivedamarker(andrecordedtheirownstate)
• AllprocesseshavereceivedamarkeronalltheN-1 incomingchannels(andrecordedtheirstates)
• Later,acentralservercangatherthepartialstatetobuildaglobalsnapshot
Terminatingasnapshot
22
![Page 22: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/22.jpg)
• P1 initiatesasnapshot
Example
23
P1 P2
C12:[Empty]
C21:[Empty]
X1:0Y1:0Z1:0
X2:4Y2:2Z2:3
![Page 23: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/23.jpg)
• First,P1 recordsitsstate
Example
24
P1 P2
C12:[Empty]
C21:[Empty]
X1:0Y1:0Z1:0
X2:4Y2:2Z2:3
![Page 24: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/24.jpg)
• Then,P1 sendsamarkermessagetoP2 andbeginsrecordingallmessagesoninboundchannels
• Meanwhile,P2 sentamessagetoP1
Example
25
P1 P2
C12:[<marker>]
C21:[M1]
X1:0Y1:0Z1:0
X2:4Y2:2Z2:3
![Page 25: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/25.jpg)
• P2 receivesamarkermessageforthefirsttime,sorecordsitsstate
• P2 thensendsamarkermessagetoP1
Example
26
P1 P2
C12:[Empty]
C21:[<marker>]
X1:0Y1:0Z1:0
X2:4Y2:2Z2:3
<marker>
M1
![Page 26: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/26.jpg)
• P1 hasalreadysentamarkermessage,soitrecordsallmessagesitreceivedoninboundchannelstotheappropriatechannel’sstate
Example
27
P1 P2
C12:[Empty]
C21:[Empty]
X1:0Y1:0Z1:0
X2:4Y2:2Z2:3
M1
![Page 27: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/27.jpg)
• Bothprocesseshaverecordedtheirstateandallthestateofallincomingchannels
• Oursnapshottedstateishighlightedinblue
Example
28
P1 P2
C12:[Empty]
C21:[Empty]
X1:0Y1:0Z1:0
X2:4Y2:2Z2:3
M1
![Page 28: Chandy-Lamport Snapshottingcds.iisc.ac.in/wp-content/uploads/DS256.2018.L4.Global.Snapshot.pdf · Chandy-Lamport Snapshotting COS 418: Distributed Systems Precept 8 Themis Melissaris](https://reader034.fdocuments.net/reader034/viewer/2022050119/5f67918305498e44032ea421/html5/thumbnails/28.jpg)
• RelatedtotheLamport clockpartialordering• Aneventispresnapshot ifitoccursbeforethelocalsnapshotonaprocess
• Postsnapshot ifafterwards• IfeventA happenscausallybeforeeventB,andB ispresnapshot,thenA istoo
Causalconsistency
30