Distributed Systems Basic Algorithms · Network as a graph • Network is a graph : G = (V,E) •...
Transcript of Distributed Systems Basic Algorithms · Network as a graph • Network is a graph : G = (V,E) •...
DistributedSystems
BasicAlgorithmsRikSarkar
UniversityofEdinburgh
2016/2017
DistributedComputaEon
• Howtosendmessagestoallnodesefficiently• Howtocomputesumsofvaluesatallnodesefficiently
• Networkasagraph• BroadcasEngmessages• CompuEngsumsinatree• CompuEngtreesinanetwork• CommunicaEoncomplexity
DistributedSystems,Edinburgh,2016
Ref:NL
Networkasagraph• Networkisagraph:G=(V,E)• Eachvertex/nodeisacomputer/process• EachedgeiscommunicaEonlinkbetween2nodes• EverynodehasaUniqueidenEfierknowntoitself.
– OVenused1,2,3,…n• Everynodeknowsitsneighbors–thenodesitcanreach
directlywithoutneedingothernodestoroute– Edgesincidentonthevertex– Forexample,inLANorWLAN,throughlisteningtothebroadcastmedium
– Orbyexplicitlyasking:Everyonethatreceivesthismessage,pleasereportback
• Butanodedoesnotknowtherestofthenetwork
DistributedSystems,Edinburgh,2016
Example:Unitdiskgraphs
• Supposeallnodesarewireless• Eachcancommunicatewithnodeswithindistancer.
• Say,r=1
• UDGisamodel• Notperfect
• Ingeneral,networkscanbeanygraph
DistributedSystems,Edinburgh,2016
Directedgraphs
• WhenAcansendmessagetoB,butBcannotsendmessagetoA
• Forexample,inwirelesstransmission,ifBisinA’srange,butAisnotinB’srange
DistributedSystems,Edinburgh,2016
BA
Directedgraphs
• WhenAcansendmessagetoB,butBcannotsendmessagetoA
• OrifprotocolortechnologylimitaEonspreventBfromcommunicaEngwithA
DistributedSystems,Edinburgh,2016
BA
Directedgraphs
• Protocolsmorecomplex• Needsmoremessages
DistributedSystems,Edinburgh,2016
Networkasagraph• Distance/costbetweennodespandqinthenetwork– Numberofedgesontheshortestpathbetweenpandq(whenalledgesaresame:unweighted)
• SomeEmes,edgescanbeweighted– Eachedgee=(a,b)hasaweightw(e)– w(e)isthecostofusingthecommunicaEonlinke(maybelengthe)
– Distance/costbetweenpandqistotalweightofedgesonthepathfromptoqwithleastweight
DistributedSystems,Edinburgh,2016
Networkasagraph
• Diameter– Themaximumdistancebetween2nodesinthenetwork
• Radius– Halfthediameter
• Spanningtreeofagraph:– Asubgraphwhichisatree,andreachesallnodesofthegraph
– Ifnetworkhasnnodes• Howmanyedgesdoesaspanningtreehave?
DistributedSystems,Edinburgh,2016
CompuEngsumsinatree
• Supposerootwantstoknowsumofvaluesatallnodes
DistributedSystems,Edinburgh,2016
root
CompuEngsumsinatree• Supposerootwantstoknowsumofvaluesatallnodes
• Itsends“compute”messagetoallchildren
• Theyforwardthemessagetoalltheirchildren(unlessitisaleafnode)
• Thevaluesmoveupwardfromleaves
• Eachnodeaddsvaluesfromallchildrenanditsownvalue
• Sendsittoitsparent
DistributedSystems,Edinburgh,2016
root
CompuEngsumsinatree
• Whatcanyoucomputeotherthansums?
• Howmanymessagesdoesittake?
• HowmuchEmedoesittake?
DistributedSystems,Edinburgh,2016
root
CommunicaEoncomplexity
• UsedtorepresentcommunicaEoncostforgeneralscenarios
• CalledCommunicaEonComplexityorAsymptoEccommunicaEoncomplexity
• UsebigohnotaEon:O
DistributedSystems,Edinburgh,2016
Bigoh–upperbounds• Forasystemofnnodes,• CommunicaEoncomplexityc(n)isO(f(n))means:– ThereareconstantsaandN,suchthat:
– Forn>N:c(n)<a*f(n)
f(n)
c(n)
NAllowingsomeiniEalirregularity,‘c(n)’isnotbiggerthanaconstantEmes‘f(n)’Inthelongrun,c(n)doesnotgrowfasterthanf(n)
a*f(n)
DistributedSystems,Edinburgh,2016
Examples
• 3n=O(?)• 1000n=O(?)• n2/5=O(?)• 10logn=O(?)• 2n3+n+logn+200=O(?)• 15=O(?)
DistributedSystems,Edinburgh,2016
Examples
• 3n=O(n)• 1000n=O(n)• n2/5=O(n2)• 10logn=O(logn)• 2n3+n+logn+200=O(n3)• 15oranyotherconstant=O(1)
DistributedSystems,Edinburgh,2016
Example1
• ‘Star’network• CompuEngsumofallvalues• CommunicaEoncomplexity:O(n)
Server
DistributedSystems,Edinburgh,2016
Example2a• ‘Chain’topologynetwork• Simpleprotocolwhereeveryonesendsvaluetoserver
• CommunicaEoncomplexity:?
Server
DistributedSystems,Edinburgh,2016
Example2a• ‘Chain’topologynetwork• Simpleprotocolwhereeveryonesendsvaluetoserver
• CommunicaEoncomplexity:1+2+…+n=O(n2)
Server
DistributedSystems,Edinburgh,2016
Example2b• ‘Chain’network• Protocolwhereeachnodewaitsforsumofpreviousvaluesandsends
• CommunicaEoncomplexity:1+1+…+1=O(n)
Server
DistributedSystems,Edinburgh,2016
CompuEngsumsinatree
• Howmanymessagesdoesittake?
• HowmuchEmedoesittake?
DistributedSystems,Edinburgh,2016
root
GlobalMessagebroadcast• Messagemustreachallnodesinthenetwork– DifferentfrombroadcasttransmissioninLAN– Allnodesinalargenetworkcannotbereachedwithsingletransmission
DistributedSystems,Edinburgh,2016
Source
GlobalMessagebroadcast• Messagemustreachallnodesinthenetwork– DifferentfrombroadcasttransmissioninLAN– Allnodesinalargenetworkcannotbereachedwithsingletransmissions
DistributedSystems,Edinburgh,2016
Source
FloodingforBroadcast
• ThesourcesendsaFloodmessagetoallneighbors
• Themessagehas– TypeFlood– Uniqueid:(sourceid,messageseq)– Data
DistributedSystems,Edinburgh,2016
FloodingforBroadcast
• ThesourcesendsaFloodmessage,withauniquemessageidtoallneighbors
• Everynodepthatreceivesafloodmessagem,doesthefollowing:– Ifm.idwasseenbefore,discardm– Otherwise,Addm.idtolistofpreviouslyseenmessagesandsendmtoallneighborsofp
DistributedSystems,Edinburgh,2016
Floodingforbroadcast
• Storage– Eachnodeneedstostorealistoffloodidsseenbefore
– Ifaprotocolrequiresxfloods,theneachnodemuststorexids• (thereisawaytoreducethis.Think!)
DistributedSystems,Edinburgh,2016
AssumpEons
• Weareassuming:– NodesareworkinginsynchronouscommunicaDonrounds(e.g.transmissionsoccurinintervalsof1secondexactly)
– MessagesfromallneighborsarriveatthesameEme,andprocessedtogether
– Ineachround,eachnodecansuccessfullysend1messagetoeachneighbor
– AnynecessarycomputaEoncanbecompletedbeforethenextround
DistributedSystems,Edinburgh,2016
CommunicaEoncomplexity
• Thethemessage/communicaEoncomplexityis:
DistributedSystems,Edinburgh,2016
CommunicaEoncomplexity
• Thethemessage/communicaEoncomplexityis:– O(|E|)
DistributedSystems,Edinburgh,2016
CommunicaEoncomplexity
• Thethemessage/communicaEoncomplexityis:– O(|E|)– Worstcase:O(n2)
DistributedSystems,Edinburgh,2016
ReducingCommunicaEoncomplexity(slightly)
• Nodepneednotsendmessagemtoanynodefromwhichithasalreadyreceivedm– Needstokeeptrackofwhichnodeshavesentthemessage
– Savessomemessages– DoesnotchangeasymptoEccomplexity
DistributedSystems,Edinburgh,2016
Timecomplexity
• Thenumberofroundsneededtoreachallnodes:diameterofG
DistributedSystems,Edinburgh,2016
CompuEngTreefromanetwork
• BFStree– TheBreadthfirstsearchtree– Withaspecifiedrootnode
DistributedSystems,Edinburgh,2016
BFSTree
• Breadthfirstsearchtree– Everynodehasaparentpointer– Andzeroormorechildpointers
– BFSTreeconstrucEonalgorithmsetsthesepointers
DistributedSystems,Edinburgh,2016
BFSTreeConstrucEonalgorithm• Breadthfirstsearchtree– Theroot(source)nodedecidestoconstructatree– Usesfloodingtoconstructatree– Everynodepongepngthemessageforwardstoallneighbors
– AddiEonally,everynodepstoresparentpointer:nodefromwhichitfirstreceivedthemessage• IfmulEpleneighborshadfirstsentpthemessageinthesameround,chooseparentarbitrarily.E.g.nodewithsmallestid
– pinformsitsparentoftheselecEon• Parentcreatesachildpointertop
DistributedSystems,Edinburgh,2016
BFSTree
• Property:BFStreeisashortestpathtree– Forsourcesandanynodep– TheshortestpathbetweensandpiscontainedintheBFStree
DistributedSystems,Edinburgh,2016
Time&messagecomplexity
• AsymptoEcallySameasFlooding
DistributedSystems,Edinburgh,2016
root
Treebasedbroadcast
• Sendmessagetoallnodesusingtree– BFStreeisaspanningtree:connectsallnodes
• Floodingonthetree
• Receivemessagefromparent,sendtochildren
DistributedSystems,Edinburgh,2016
root
Treebasedbroadcast
• Simplerthanflooding:sendmessagetoallchildren
• CommunicaEon:Numberofedgesinspanningtree:n-1
DistributedSystems,Edinburgh,2016
AggregaEon:Findthesumofvaluesatallnodes
• WithBFStree
• Startfromleafnodes– Nodeswithoutchildren– Sendthevaluetoparent
• Everyothernode:– Waitforallchildrentoreport– Sumvaluesfromchildren+ownvalue– Sendtoparent
DistributedSystems,Edinburgh,2016
AggregaEon
• Withoutthetree• Floodfromallnodes:– O(|E|)costpernode– O(n*|E|)totalcost:expensive– Eachnodeneedstostorefloodidsfromnnodes
• RequiresΩ(n)storageateachnode– Goodfaulttolerance
• IfafewnodesfailduringoperaEon,alltherestsEllgetsomevalue
DistributedSystems,Edinburgh,2016
AggregaEon
• WithTree
• AlsocalledConvergecast
DistributedSystems,Edinburgh,2016
AggregaEon• WithTree
• Oncetreeisbuilt,anynodecanuseforbroadcast– Justfloodonthetree
• Anynodecanuseforconvergecast– FirstfloodamessageonthetreerequesEngdata– Nodesstoreparentpointer– Thenreceivedata
• WhatisthedrawbackoftreebasedaggregaEon?
DistributedSystems,Edinburgh,2016
AggregaEon• WithTree
• Oncetreeisbuilt,anynodecanuseforbroadcast– Justfloodonthetree
• Anynodecanuseforconvergecast– FirstfloodamessageonthetreerequesEngdata– Nodesstoreparentpointer– Thenreceivedata
• Faulttolerancenotverygood– Ifanodefails,themessagesinitssubtreewillbelost– WillneedtorebuildthetreeforfutureoperaEons
DistributedSystems,Edinburgh,2016
BFStreescanbeusedforrouEng• Fromeachnode,createaseparateBFStree• EachnodestoresaparentpointercorrespondingtoeachBFStree
• ActsasrouEngtable
DistributedSystems,Edinburgh,2016
1 4
2 4
3 4
4 4
1 3
2 3
3 3
5 5
1 1
2 2
4 4
5 4
2 2
3 3
4 3
5 31
2
4
3
5
1 1
3 3
4 3
5 3 DistributedSystems,Edinburgh,2014 45
BFStreescanbeusedforrouEng• Fromeachnode,createaseparateBFStree• EachnodestoresaparentpointercorrespondingtoeachBFStree
• ActsasrouEngtable• O(n*|E|)messagecomplexityincompuEngrouEngtable
DistributedSystems,Edinburgh,2014 46DistributedSystems,Edinburgh,2016
ObservaEononcomplexity
• Supposec(n)=n– Thenc(n)isO(n)andalsoO(n2)– Although,whenweaskforthecomplexity,wearelookingfortheEghtestpossiblebound,whichisO(n)
DistributedSystems,Edinburgh,2016
BigΩ–lowerbounds• Forasystemofnnodes,• CommunicaEoncomplexityc(n)isΩ(f(n))means:– ThereareconstantsaandN,suchthat:
– Forn>N:b*f(n)<c(n)
f(n)c(n)
NAllowingsomeiniEalirregularity,‘c(n)’isnotsmallerthanaconstantEmes‘f(n)’Inthelongrun,f(n)doesnotgrowfasterthanc(n)
b*f(n)
DistributedSystems,Edinburgh,2016
Bigθ–Eghtbounds:bothOandΩ• Forasystemofnnodes,• CommunicaEoncomplexityc(n)isθ(f(n))means:– Thereareconstantsa,bandN,suchthat:
– Forn>N:b*f(n)<c(n)<a*f(n)
f(n)c(n)
NAllowingsomeiniEalirregularity,c(n)andf(n)areWithinconstantfactorsofeachother.Inthelongrun,c(n)growsatsamerateasf(n),uptoconstantfactors.
b*f(n)
a*f(n)
DistributedSystems,Edinburgh,2016
BitcomplexityofcommunicaEon• WehaveassumedthateachcommunicaEonis1message,
andwecountedthemessages• SomeEmes,communicaEonisevaluatedbybitcomplexity
–thenumberofbitscommunicated• Thisisdifferentfrommessagecomplexitybecausea
messagemayhavenumberofbitsthatdependonnor|E|• Forexample,nodeidsinmessagehavesizeΘ(logn)
• InpracEcethisismaynotbecriEcalsincelognismuchsmallerthanpacketsizes,soitdoesnotchangethenumberofpacketscommunicated
• ButdependingonwhatotherdatathealgorithmiscommunicaEng,sizesofmessagesmaymaxer
DistributedSystems,Edinburgh,2016
Sizeofids
• Inanetworkofnnodes• EachnodeidneedsΘ(logn)(thatis,bothO(logn)andΩ(logn))bitsforstorage– ThebinaryrepresentaEonofnneedslog2nbits
• Ω–sinceweneedatleastthismanybits– Mayvarybyconstantfactorsdependingonbaseoflogarithm
DistributedSystems,Edinburgh,2016
CompuEngTrees:
• Whatiftheedgeshaveweights?
DistributedSystems,Edinburgh,2016
AggregaEonusingTrees:
• Whatiftheedgeshaveweights?• ThecostmaynotbeO(n)sinceweightscanbehigher
• Howtogetthebestperformance?
DistributedSystems,Edinburgh,2016
Minimumspanningtreeis
• Aspanningtree(reachesallnodes)• Withminimumpossibletotalweight
• Howcanwecomputeaminimumspanningtreeefficientlyinadistributedsystem?
• (remember,anodeknowsonlyitsneighborsandedgeweights)
DistributedSystems,Edinburgh,2016