MAD lessons - Inriapeople.rennes.inria.fr/Eric.Fabre/Papiers/Mad Intro.pptx.pdfSoluon : break the...
Transcript of MAD lessons - Inriapeople.rennes.inria.fr/Eric.Fabre/Papiers/Mad Intro.pptx.pdfSoluon : break the...
MADlessons…
Models&AlgorithmsforDistributed
algorithms/systems/…
EmmanuelleAnceaume,CNRS,CidreteamEricFabre,INRIA,SuMoteam
h@p://people.rennes.inria.fr/Eric.Fabre/
1
Overallcontents
• IntroducGontodistributedsystems/algos(today)• Formalmethodsfordistributedsystems(4lessons)
- asynchrony,runsasparGalordersofevents- modelsforsuchsystems,withtrueconcurrencysemanGcs- simpleverificaGonproblems
• Distributedalgorithms(5lessons)- featuresoftheBitCoinandoftheBlockChain
• Therewillbeconcurrentteaching!(=overlappingofthetwothreads)
2
Whatisitabout?
“Adistributedsystemisoneinwhichthefailureofacomputeryoudidn'tevenknowexistedcanrenderyourowncomputerunusable.”
[LeslieLamport]
LeslieLamportTuringAward2013
forhiscontribuGonstodistributedsystems
AdistributedsystemisamulGprocessorsysteminwhichtheGmerequiredforinterprocesscommunicaGonislargecomparedtotheGmeforeventswithinasingleprocessor–inotherwords,ittakeslongerforinterprocesscommunicaGonthanitdoesforaprocesstolookatitsownmemory.
3
“theyareeverywhere…”
• Distributedsocware- onlinebusinesses:comparing,selecGng,paying,shipping- dynamicwebpagebuilding:socialnetworks,mashups,…- searchengines:map-reduce- distributeddatabases:concurrentaccess,consistency,atomicityof
operaGonsandoftransacGons- cooperaGvework:shareddocuments,jointediGng,SVNs,GitHub- P2P:distributedstorage,bitcoin- onlinemulG-playergames- …
4
“theyareeverywhere…”
• modernOS- mulG-coreprocessors- mulG-processors+asynchronousbuses
(ex.incars,planes,…):GALS- bytecodeproducedbysomecompilers,
introducingparallelism- mulG-threadsprogramming
• system- computergrids- cloudcompuGng,connectedvirtual
machines,connectedcontainers(Docker):dynamicity
- powergrids
5
“theyareeverywhere…”
• telecommunicaGonnetworks- rouGngalgorithms(OSPF):dynamicity,resilience- cellularnetworks:accesstosharedresources- adhocnetworks:fullydecentralized,dynamicity- delaytolerantnetworks- socwaredefinednetworks- IoT:InternetofThings,connectedobjects- …
• physicalsystems- sensornetworks- autonomousrobotnetworks:exploraGon- UAVformaGonflight:consensusproblems,
distributedcontrol- autonomousvehicles:cars,shu@les,
trains,subways- IoT:massivelydistributedsensornetworks- RFIDnetworks… 6
“theyareeverywhere…”
• humanorganizaGons- distributedmanufacturing- mulG-agentsorganizaGons:hospital,crisismanagement?- distributeddatacollecGon:Wikipedia,epidemysurveillanceand
reporGng- socialbehaviors:gossipspreading,diseasedisseminaGon- non-human:ant/beecolonies- …
7
Interestandchallenges
• whydistributedsystems?- someproblemsareintrinsicallydistributed:inspace,amongagents- needtoaccesssharedresources(bandwidth,database)- needtojoinforcesforacommongoal- moreefficiency/performance:compuGngpower,storage,faster
processing(parallelism)- be@errobustness:redundancyofresources,resistancetofailures
(storage,cloudcompuGng),be@erperformance/costraGo- scalability:ondemandperformance,easytoup-scale,opento
numeroususers,distributedmanagement
• challenges- faulttolerance,resilience,resistancetodynamicity- management(security,control,diagnosis,performance,upgrade…)- design(correctness/debug)- dedicatedalgorithms(impossibilityresults)
8
Whoworksonthetopic?
• distributedcompuGng– architecture,cloudcompuGng,networks
• distributedalgorithms- algorithmicsfocusedonDS(feasibility,complexity)- basicservices(mutualexclusion,elecGon,consensus…)- specificelaboratealgorithms(gossip,rouGng,control,planning…)
• distributedprogramming- languagesandprogrammingprinciplesforDS
• distributedsystems- howtobuildthem(grids,clouds,SDN,mulG-cores,…)- formalmethods(gametheory,verificaGon,concurrencymodels,
test,diagnosis,control…) 9
Firstmodels
Defini2on:adistributedsystemisanetworkofautonomousmachines/socwares/processes/enGGes,withacoordinaGonmiddleware,designedtobehaveasasinglemachineFeatures:
- noglobalGme,onlylocalGmeincomponents- communicaGonGme>>localcompuGngGme- ocenrepresentedasanundirectedgraphG=(V,E)- nodes/machines:generallyinfinitestatemachines,cancrash,join,
leave,lie(malicious)- comm.bymessages,viachannels(FIFO/LIFO,(un)boundedinGme
andsize,losslessornot,reliableornot…)- nodesgenerallyhavealocalknowledgeofthesystemtopology(i.e.
portstoneighbors)
10
Simplemodel:afiniteundirectedgraph,nodes/verGcesareprocesses,edgesarebidirecGonalcommunicaGonchannelsThetopologyisfixed,butarbitrary,locallyknown(eachnodeknowshowmanyneighborsithas).Messagesarereliablytransmi@edthroughFIFOchannels,withdelay.
centralizedse9ng1component1controlpoint
1processsimplefailures
synchronous=globalclockclosed/fixedarchitecture
distributedse9ngseveralcomponentsseveralcontrolpoints
severalprocesses,messagescomplex/mulGplefailures
asynchronous=noglobalGmeopen/changingarchitecture
11
Somebasicservicestoachieve• elecGon
– nodesmustagreeononeofthemastheirleader• consensus
– eachnodeproposesavalue;nodesmustagreeononeofthesevalues• mutualexclusion
- notwonodescanperformsomedangerousacGonatthesameGme- allofthemgetequalchancetoperformthisacGon
• recoveringeventcausality- idenGfycausalpastofanevent;idenGfypossiblereorderings
• definiGonofsnapshots- buildrecoverypointsfordistributedcomputaGons(failureprotecGon)
• atomicbroadcast- guaranteethatbroadcastsarereceivedinthecorrectorder,despitecrashes
• autostabilizingalgorithms- convergetoadesiredproperty(ex.asingletokeninaring)
• detecGonofstableproperGes- ex.detectthatadistributedexecuGonhasterminated(nopendingmessage)
12
Synchronousvsasynchronous
• Asynchronous– localcomputaGonGme<<communicaGonGme– noglobalclock
• Synchronous– communicaGonGme<<computaGonGme
orcommunicaGonGmeisbounded– globalGmecanbesimulated(butitisnotalwaysrelevant)
Inthesynchronouscase– processescanwaitforallmessagestobereceived– theycandetectmsglossesandprocesscrashes– computaGonsaregenerallyorganizedinrounds
• allprocessescompute• allprocessesexchangemessages
– Gmecomplexityofanalgo.isexpressedinnumberofrounds13
Synchronousalgorithms
1.Leaderelec2ononaringassump2ons- processesrunthesamecode- localknowledge:nodesonlyseeportstotheirneighbors- portnumbersaresymmetricbytranslaGononthering- onlyonenodemustoutput“leader”,theothermustoutput“non-
leader”(ingeneral,theyshouldalsoknowwhoistheleader)
Proof:bycontradicGonusingsymmetry• theexecuGononeachnodeisthesame(samestatereached,same
messagessent,samereacGontomessages)• bysymmetry,ifnodeioutputs“leader”,thenallnodesoutput
“leader”
Thm:thereexistsnosynchronousalgorithmonaringwithndeterminisGcindisGnguishableprocessesthatsolvestheleaderelecGonproblem.
14
Synchronousalgorithms
1.Leaderelec2ononaringassump2ons- processesrunthesamecode- localknowledge:nodesonlyseeportstotheirneighbors- portnumbersaresymmetricbytranslaGononthering- onlyonenodemustoutput“leader”,theothermustoutput“non-
leader”(ingeneral,theyshouldalsoknowwhoistheleader)
Thm:thereexistsnosynchronousalgorithmonaringwithndeterminisGcindisGnguishableprocessesthatsolvestheleaderelecGonproblem.
Proof:bycontradicGonusingsymmetry• theexecuGononeachnodeisthesame(samestatereached,same
messagessent,samereacGontomessages)• bysymmetry,ifnodeioutputs“leader”,thenallnodesoutput
“leader”
15
Solu2on:breakthesymmetry!– adduniqueidenGfiers(UID)tonodes– allowforrandomness(withdifferentseedsateachnode)
Withuniqueiden2fiers– eachprocesssendsitsid.toneighbours– eachprocessreceivesidsfromitsneighbors,computesthemaximal
UIDreceivedsofar,andpropagatesittoneighbors– diffusionprocess– Q:whatisagoodstoppingcriterion?
1
49
3
67
16
1
4
7
9
3
6
97
779
9Solu2on:breakthesymmetry!– adduniqueidenGfiers(UID)tonodes– allowforrandomness(withdifferentseedsateachnode)
Withuniqueiden2fiers– eachprocesssendsitsid.toneighbours– eachprocessreceivesidsfromitsneighbors,computesthemaximal
UIDreceivedsofar,andpropagatesittoneighbors– diffusionprocess– Q:whatisagoodstoppingcriterion?
17
1
4
7
9
3
6
99
799
9Solu2on:breakthesymmetry!– adduniqueidenGfiers(UID)tonodes– allowforrandomness(withdifferentseedsateachnode)
Withuniqueiden2fiers– eachprocesssendsitsid.toneighbours– eachprocessreceivesidsfromitsneighbors,computesthemaximal
UIDreceivedsofar,andpropagatesittoneighbors– diffusionprocess– Q:whatisagoodstoppingcriterion?
18
Solu2on:breakthesymmetry!– adduniqueidenGfiers(UID)tonodes– allowforrandomness(withdifferentseedsateachnode)
Withuniqueiden2fiers– eachprocesssendsitsid.toneighbours– eachprocessreceivesidsfromitsneighbors,computesthemaximal
UIDreceivedsofar,andpropagatesittoneighbors– diffusionprocess– Q:whatisagoodstoppingcriterion?
Withrandomness– eachprocessdrawsanid.atrandominset{1,2,…,r}– thenrunsthepreviousalgorithm– Rem:seedsoftherandomnumbergeneratorsarealsoidenGfiers…
1
4
9
9
3
6
14
963
9
19
HowmanyUIDsdoweneed?howlargeshouldbe{1,2,…,r}comparedtothenumberofnodesn?Lemma:with,theprobabilitythattwoprocessesoutofndrawthesameidenGfierislessthan✏
r = n2/✏
Proof1nboffavorablecases=totalnbofcases=probabilitytowin=withonehas
r(r � 1)...(r � n+ 1) =r!
n!rn
p =r
r
(r � 1)
r...(r � n+ 1)
r>
✓r � n
r
◆n
r = n2/✏ p > (1� ✏/n)n = 1� n · ✏/n+ ... > 1� ✏
Proof2probabilitythattwoprocessespickthesameUID=1/runionbound:probabilitytolose<sum1/roverallpossiblepairsthisis
n(n� 1)
2r= ✏
n� 1
2n< ✏/2
20
2.Maximalindependencesetassump2ons- graphofnprocesses,notnecessarilyconnected- MIS=maximalsubsetofnodesthatarepairwisedisconnected- processesdon’thaveuniqueids,sotheproblemisunsolvablein
somegraphs(seeelecGonthm):weuserandomnesstobreaksymmetry
- eachprocessshouldoutput“in”or“out”
21
Lubi’salgorithm:alterna2onof2rounds,amongac2venodesRound1- eachnodeupicksanumberx(u)atrandom,
thensendsittoallitsneighborsinN(u)[denotestheneighborset]- eachnodereceivesmessagesfromallitsneighbors- ifnodeuhasthelargestvalue:x(u)>x(v)forvinN(u)
thenitjoinstheMISandoutputs“in”
Round2- eachnodeujoiningtheMISsendsmessage“in”toitsneighbors- eachnodevreceivingmessage“in”fromaneighbormustnotjoin
theMIS,andoutputs“out”- eachnodehavingdecidedeither“in”or“out”becomesinacGve
Rounds1and2proceedunGlallnodesareinacGve(i.e.havedecidedwhethertheyareinorout)
22
3 12
15
4
8
15
27
911
6
23
3 12
15
4
8
15
27
911
6
24
3 12
15
4
8
15
27
911
6
25
8
129
26
8
129
27
8
129
28
ProofofLubi’salgorithmideas:– findandproveinvariantsthatarepreservedbyeachround– proveamonotonyproperty(forconvergence)Invariant:thenodesthatjoinedtheMIScan’tbeconnectedMonotony:thenumberofacGvenodesdecreasesQ:decreasesstrictly?A:yes,unlessiftwo(ormore)nodespickthesamemaximalvalue,whichisunlikely(seepreviouslemma).OnecanactuallyshowthattheaveragenumberofacGveedgesisatleastdividedby2ateachphase.Maximality:onlyinacGvenodescouldbeaddedtothecurrentMIS,soifallnodesareinacGveatterminaGon,theMISismaximal
29
3.Spanningtreeconstruc2onassump2ons– finiteconnectedgraphG=(V,E),unknownsizeandtopology– adisGnguishedvertexr,calledtheroot– nodeshaveuniqueidenGfiers(UID)– localknowledgeoftopology(nodesknowtheirneighbors’UIDs)goal– processesmustbuildaspanningtree,rootedatr– theuniquepathfromnodertoeachnodevmustbeashortest
path(innumberofedges)– eachnodev(exceptr)shouldoutputparent(v),possiblychildren(v)
r
30
Algorithm:principle=diffusionfromrootnoderini2aliza2on– allnodesinacGve,exceptr– roundk• ifnodevisacGve
- vsends“search”messagetoallitsneighbors,excepttoitsparent- thenvbecomesinacGve
• ifnodevreceivesoneormore“search”messagesanddoesnotyethaveaparent- vselects(nondeterminisGcally)onesenderuandsets- vmayinformuthatitisitschild- vbecomesacGve
8v 2 V, parent(v) = ;
parent(v) = u
31
r
32
r search
33
r search
34
r search
35
r search
36
r
search
37
r
search
38
r
search
39
r
search
search search
40
r
search
search search
41
r
search
42
r
search
43
r
searchsearchsearch
44
r
searchsearchsearch
45
r
search
46
r
search
47
r
48
r
49
Algorithm:principle=diffusionfromrootnoderini2aliza2on– allnodesinacGve,exceptr– roundk• ifnodevisacGve
- vsends“search”messagetoallitsneighbors,excepttoitsparent- thenvbecomesinacGve
• ifnodevreceivesoneormore“search”messagesanddoesnotyethaveaparent- vselects(nondeterminisGcally)onesenderuandsets- vmayinformuthatitisitschild- vbecomesacGve
proofinvariant(+monotony)• theselectededgesformatree• atroundk,allnodesatdistancekfromrootrhavebeenreachedand
connectedbyauniquepathoflengthk
8v 2 V, parent(v) = ;
parent(v) = u
50
Algorithm:principle=diffusionfromrootnoderini2aliza2on– allnodesinacGve,exceptr– roundk• ifnodevisacGve
- vsends“search”messagetoallitsneighbors,excepttoitsparent- thenvbecomesinacGve
• ifnodevreceivesoneormore“search”messagesanddoesnotyethaveaparent- vselects(nondeterminisGcally)onesenderuandsets- vmayinformuthatitisitschild- vbecomesacGve
Remarks– non-determinismdoesnotpreventcorrectbehavior– cancomputeaswellthedistanceofeachnodetotheroot– terminaGoniseasytodetect:backpropagaGonfromtheleaves
• leafnodesstarttellingtheirparentthattheyare“done”• aninnernodereceivinga“done”messagefromallitschildrenpropagates
“done”toitsparent• oncetherootnodereceives“done,”itbroadcastsitdownthetreetoallnodes
8v 2 V, parent(v) = ;
parent(v) = u
51
Algorithm:principle=diffusionfromrootnoderini2aliza2on– allnodesinacGve,exceptr– roundk• ifnodevisacGve
- vsends“search”messagetoallitsneighbors,excepttoitsparent- thenvbecomesinacGve
• ifnodevreceivesoneormore“search”messagesanddoesnotyethaveaparent- vselects(nondeterminisGcally)onesenderuandsets- vmayinformuthatitisitschild- vbecomesacGve
Extension:Bellman-Fordalgorithm(dynamicprogramming)– computesashortestpathtreerootedatr,whenedgeshavealength– searchmessagesmustcarryaswelladistancetor– nodesreceivingashorterdistancemessagegetreacGvated
andchangetheirparentoneclosertor
8v 2 V, parent(v) = ;
parent(v) = u
52
Asynchronousalgorithms
changes– norounds,muchmorenon-determinism– processescomputeandcommunicatearbitrarily(atomicacGons)– correctnessandconvergencemustbeprovedforamuchlargerand
complexsetofexecuGonsse9ng
– G=(V,E)undirectedgraph,V=processes,E=bidirecGonalchannels– processesneednotbedisGnguishable,knowlocalportsto
neighbours,(infinite)statemachines– communicaGonbylosslessFIFOchannels(unbounded)– Cu,visthechannel(buffer)fromutov– primiGvessendu,v(m),receiveu,v(m)
53
1.Spanningtreeconstruc2onassump2ons– finiteconnectedgraphG=(V,E),unknownsizeandtopology– adisGnguishedvertexr,calledtheroot– nodeshaveuniqueidenGfiers(UID)– localknowledgeoftopology(nodesknowtheirneighbors’UIDs)
goal– processesmustbuildaspanningtree,rootedatr– theuniquepathfromnodertoeachnodev
isnotrequestedanymoretobeashortestpath(innumberofedges)– eachnodev(exceptr)shouldoutputparent(v),possiblychildren(v)
54
1.Spanningtreeconstruc2onidea:taketheprevioussynchronousalgo,makeitasynchronous!Algorithm:principle=diffusionfromrootnoderini2aliza2on– allnodesinacGve,exceptr– stepk:dooneofthetwoacGonsbelow• emissionfromanacGvenodev
- vsends“search”messagetoallitsneighbors,excepttoitsparent- thenvbecomesinacGve
• recepGonofa“search”messagebynodevfromnodeuifnodevdoesnotyethaveaparent- vselectssetsuasitsparent:- vmayinformuthatitisitschild- vbecomesacGveotherwisedonothing
8v 2 V, parent(v) = ;
parent(v) = u
55
r
56
r search
57
r
58
r
search
59
r
60
r
61
r
62
r
63
r
search
64
r
search
65
r
search
search
66
r
search
67
r
search
68
r
search
search
search
69
r
search
70
r
search
71
r
search
search
72
r
search
73
r
search search
74
r
75
2.AsynchronousBellman-Fordcontextandgoal
• undirectedconnectedgraphG=(V,E)• posiGveweightsw(e)onedges,orw(u,v)onedge(u,v)• disGnguishedrootnoder• computeashortestpathfromrtoanynode
mainidea• extensionofspanningtreeconstrucGontomindistancecomputaGon:
Bellman-Ford(withreacGvaGonofnodes)• desynchronizaGon
76
Algorithm:principle=diffusion+distancecomputaGonfromrootnoder,thenfromallupdatednodesini2aliza2on– allnodesinacGve,exceptr– distancestotherootnoder
stepk:dooneofthetwoacGonsbelow• emissionfromanacGvenodeu
- usendsmessagetoallitsneighborsv,excepttoitsparent- thenubecomesinacGve
• recepGonofamessagembynodevfromnodeu• ifnodevdoesnotyethaveaparentorifthepaththroughuisshorter:
- vupdatesitsdistance:- vselectssetsuasitsparent:- vmayinformuthatitisitschild(anditsformerparentthathelec…)- vbecomesacGve
• otherwisedonothing(messagemconsumedwithnoimpact)
parent(v) = u
8v 2 V, parent(v) = ;, dist(v) =?dist(r) = 0
dist(v) := m+ w(u, v)dist(v) > m+ w(u, v)
m = dist(u)
77
r1
12
5
6
31
11
1
11
4
9
3
2
0
78
r1
12
5
6
31
11
1
11
4
9
3
2
0
0
79
r1
12
5
6
31
11
1
11
4
9
3
2
0
1
80
r1
12
5
6
31
11
1
11
4
9
3
2
0
1
6
132
81
r1
12
5
6
31
11
1
11
4
9
3
2
0
1
6
132
12
10
7
82
r1
12
5
6
31
11
1
11
4
9
3
2
0
1
5
52
12
10
7
3
83
r1
12
5
6
31
11
1
11
4
9
3
2
0
1
5
52
12
8
7
3
14
4
84
r1
12
5
6
31
11
1
11
4
9
3
2
0
1
5
52
11
8
6
3
14
4
85
r1
12
5
6
31
11
1
11
4
9
3
2
0
1
5
52
11
7
6
3
13
4
86
r1
12
5
6
31
11
1
11
4
9
3
2
0
1
5
52
11
7
6
3
5
4
87
r1
12
5
6
31
11
1
11
4
9
3
2
0
1
5
52
7
7
6
3
5
4
88
r1
12
5
6
31
11
1
11
4
9
3
2
0
1
5
52
7
7
6
3
5
4
89
remarkonecanassumethatmessagespileupintochannels,orthatthelastmessageerasesthepreviousoneproof:invariant(s)– currentstructureisalwaysatree– distanceofnodevistheshortestonsomepathfromrtov,andthe
parentofvisthepredecessorofvonthatpath– distanceofnodevtotherootnoderistheshortestamongpaths
exploredbythemessagesthatreachedvsofarmonotony– distancesatnodesdecrease,boundedfrombelowbytrueshortest
distances– thereisatleastoneshortestpathtoanodethatwillbeprogressively
acGvated(allnodesarereached)
complexity?i.e.howmanysuccessiveelementarysteps?– homework…– hint:each(re)acGvatednodemay(re)iniGateamessageflowonthe
wholegraph
90
Takehomemessages
distributedsystems• areeverywhereinmoderncomputerscience• requiredifferentmodels/assumpGonstobecorrectlydescribed/analyzed
amajorphenomenonisasynchrony• duetotheabsenceofaglobalclock• itbecomesnecessarytomodelandunderstandtheconcurrencyofevents• thismakesthestudyofalgorithms/systemsmuchmoreinvolved
next2me• representaGonofrunsasparGalordersofevents• vectorclocks• snapshotalgorithms
91