Consensus Services for Loosely- coupled distributed...

40
Consensus Services for Loosely- coupled distributed Systems Zookeeper and Chubby

Transcript of Consensus Services for Loosely- coupled distributed...

Page 1: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

ConsensusServicesforLoosely-coupleddistributedSystems

ZookeeperandChubby

Page 2: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

Mo<va<on

•  Distributedsystemsneedaservicethatprovides–  Synchroniza<on(whoistheleader?)– Membership(whoareac<venodesinmyservice?)–  Configura<onmetadata(whatareenvironmentinfo?)

•  Distributedsystemsneedaservicethatis–  reliability–  availability–  easy-to-understandseman<cs–  performance,throughput,latencyonlysecondary

Page 3: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

BeforeChubbyCameAbout…

•  Lotsofdistributedsystemswithclientsinthe10,000s

•  Howtodoprimaryelec<on?–  Adhoc(noharmfromduplicatedwork)–  Operatorinterven<on(correctnessessen<al)

•  Unprincipled•  Disorganized•  Costly•  Lowavailability

Page 4: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

Whatisthispaperabout?

“BuildingChubbywasanengineeringeffort…itwasnotresearch.Weclaimnonewalgorithmsortechniques.Thepurposeofthispaperistodescribewhatwedidandwhy,ratherthantoadvocateit.”

•  Designbasedonwell-knownideas– distributedconsensus,caching,no<fica<ons,file-systeminterface

Page 5: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

LifeBeforeCubby

•  Distributedsystemsdevelopers..–  ImplementRaR(wellactuallyPaxos)•  Applica<onmustbewriUenasastatemachine•  Poten<alperformanceproblems

–  Quorumon5iseasieroverquorumof10Knodes

– Sharedcri<calregions(Exclusivelocks)•  Hardtocode/understand

–  Peoplethinktheycan…buttheycan’t!

Page 6: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

ChubbyDesign

Page 7: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

DesignDecisions:Mo<va<ngLocks?

•  Lockservicevs.consensus(RaR/Paxos)library

•  Advantages:– Noneedtorewritecode

•  Maintainprogramstructure,communica<onpaUerns•  Cansupportno<fica<onmechanism

–  Smaller#ofnodes(servers)neededtomakeprogress

•  Advisoryinsteadofmandatorylocks(why?):– HoldingalockcalledFneitherisnecessarytoaccessthefileF,norpreventsotherclientsfromdoingso

Page 8: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

DesignDecisions:LockTypes•  Coarsevs.fine-grainedlocks–  Fine-grained:grablockbeforeeveryevent–  Coarse-grained:grablockforlargegroupofevents

•  Advantagesofcoarse-grainedlocks–  Lessloadonlockserver–  Lessdelaywhenlockserverfails–  Lesslockserversandavailabilityrequired

•  Advantagesoffine-grainedlocks– Morelockserverload–  Ifneeded,couldbeimplementedonclientside

Page 9: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

SystemStructure

•  Chubbycell:asmallnumberofreplicas(e.g.,5)

•  Masterisselectedusingaconsensusprotocol(e.g.,RaR)

Page 10: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

SystemStructure

•  Clients–  Sendreads/writesonlytothemaster

–  Communicateswithmasterviaachubbylibrary

•  Everyreplicaserver–  IslistedinDNS–  Directclientstomaster–  Maintaincopiesofasimpledatabase

Page 11: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

ReadandWrites

•  Write– Masterpropagateswritetoreplica

–  RepliesaRerthewritereachesamajority(e.g.,quorum)

•  Read– Masterrepliesdirectly,asithasmostuptodatestate

–  Readsmusts<llgotothemaster

Page 12: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

ChubbyAPIandLocks

Page 13: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

SimpleUNIX-likeFileSystemInterface

•  Barebonefile&directorystructure

•  /ls/foo/wombat/pouch

Lock service; common to all names

Cell name

Name within cell

Page 14: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

SimpleUNIX-likeFileSystemInterface

•  Barebonefile&directorystructure

•  /ls/foo/wombat/pouch

•  Doesnotsupport,maintain,orreveal– Movingfiles– Path-dependentpermissionseman<cs– Directorymodified<mes,fileslast-access<mes

Page 15: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

Nodes•  Node:afileordirectory– Anynodecanactasanadvisoryreader/writerlock

•  Anodemaybeeitherpermanentorephemeral– Ephemeralusedastemporaryfiles,e.g.,indicateaclientisalive

•  Metadata– ThreenamesofACLs(R/W/changeACLname)•  Authen<ca<onbuildintoROC

– 64-bitfilecontentchecksum

Page 16: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

Locks•  Any node can act as lock (shared or exclusive)

•  Advisory (vs. mandatory) – Protect resources at remote services – No value in extra guards by mandatory locks

•  Write permission needed to acquire – Prevents unprivileged reader blocking progress

Page 17: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

LocksandSequences•  Poten<allockproblemsindistributedsystems

–  AholdsalockL,issuesrequestW,thenfails–  BacquiresL(becauseAfails),performsac<ons–  Warrives(out-of-order)aRerB’sac<ons

•  Solu<on1:backwardcompa<ble–  Lockserverwillpreventotherclientsfromgehngthelockifalockbecome

inaccessibleortheholderhasfailed–  Lock-delayperiodcanbespecifiedbyclients

Page 18: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

LocksandSequences•  Poten<allockproblemsindistributedsystems

–  AholdsalockL,issuesrequestW,thenfails–  BacquiresL(becauseAfails),performsac<ons–  Warrives(out-of-order)aRerB’sac<ons

•  Solu<on2:sequencer– AlockholdercanobtainasequencerfromChubby–  ItaUachesthesequencertoanyrequeststhatitsendstootherservers

– Theotherserverscanverifythesequencerinforma<on

Page 19: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

Design:Events•  Clientsubscribeswhencrea<nghandle

•  Deliveredasyncviaup-callfromclientlibrary

•  Eventtypes– Filecontentsmodified– Childnodeadded/removed/modified– Chubbymasterfailedover– Handle/lockhavebecomeinvalid– Lockacquired/conflic<nglockrequest(rarelyused)

Page 20: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

Design:API

•  Open() (only call using named node) – how handle will be used (access checks here) – events to subscribe to –  lock-delay – whether new file/dir should be created

•  Close() vs. Poison()

•  Other ops: –  GetContentsAndStat(), SetContents(), Delete(), Acquire(), TryAcquire(), Release(),

GetSequencer(), SetSequencer(), CheckSequencer()

Page 21: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

Chubby:ProvidingPerformance

Page 22: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

Whyscalingisimportant?•  Clientsconnecttoasingleinstanceofmasterinacell

–  Muchmoreclientprocessesthatnumberofmachines–  Note:MasterandclienthavesameCPU/Memory

•  Exis<ngmechanisms:–  Par$$on:MoreChubbycells(consistenthashing)–  Increaselease$me:from12sto60storeduceKeepAlivemessages(dominantrequestsinexperiments)

–  Clientcaches:reducesreads/keepalivenotwrites–  Addnewtypeofservers:AddProxyservers

Page 23: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

Caching•  Client caches file data, node meta-data

–  Write-through held in memory

•  masterkeepslistofwhatclientsmayhavecached

•  Strictconsistency:easytounderstand–  Leasebased–  Masterwillinvalidatecachedcopiesuponawriterequest–  donotwanttoalterpreexis<ngcomm.protocols

•  Handles and locks cached as well –  Event informs client of conflicting lock request

Page 24: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

CachingandInvalida<on–  writesblock,mastersendsinvalida<ons–  clientsflushchangeddata,ack.withKeepAlive–  datauncachableun<linvalida<onacked

•  allowsreadstohappenwithoutdelay

Page 25: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

NewMechanisms

•  Proxies:– HandleKeepAliveandreadrequests,passwriterequeststothemaster

– Reducetrafficbutreduceavailability

•  Par<<oning:par<<onnamespace– Amasterhandlesnodeswithhash(name)modN==id

– Limitedcross-par<<onmessages

Page 26: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

Scaling:Proxies•  Proxiespassrequestsfromclientstocell

–  Alayerofmiddlemanagementbetweencellandclients

•  CanhandleKeepAlivesandreadsNOTWRITES–  Notwrites,buttheyare<<1%ofworkload

•  KeepAlivetrafficbyfarmostdominant

•  Disadvantages:–  addi<onalRPCforwrites/first<mereads–  increasedunavailabilityprobability–  fail-overstrategynotideal(willcomebacktothis)

Page 27: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

Scaling:Par<<oning•  Namespacepar<<onedbetweenservers– Npar<<ons,eachwithmasterandreplicas

•  NodeD/CstoredonP(D/C)=hash(D)modN– meta-dataforDmaybeondifferentpar<<on

•  LiUlecross-par<<oncomm.desirable–  permissionchecks–  directorydele<on–  cachinghelpsmi<gatethis

Page 28: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

Chubby:ProvidingAvailability

Page 29: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

SessionsandKeep-Alives•  Aclientsendskeep-aliverequeststoamaster

•  Amasterrespondsbyakeep-aliveresponse

•  ImmediatelyaRergehngthekeep-aliveresponse,theclientsendsanotherrequestforextension

•  Themasterwillblockkeep-alivesun<lclosetheexpira<onofasession

•  Extensionisdefaultto12s

Page 30: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

Whenthingsfail…

•  Failure==missingkeepalivemsgs

•  Server– Deleteephemeralfilesw/oopenhandlesaReraninterval

– Deleteassociatedcache

•  Client:discardinmemorystate– Sessions,handles,locks,cacheddata…

Page 31: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

LocksandAsynchFailures•  Asynchfailure:delayed,orderedorlostmessages

•  Twosolu<ons–  Sequence#s:getLockandaSequence-ID(seq-#)

•  SubmitrequestswithLock-Seq-ID•  LeadercheckersLock-seq-IDtomakesurerequestiscurrent•  Lock-seq-IDisincrementedwhenanewlockisacquired

–  LockDelay:delaygivingoutnewlocksaRerthelockholderdies•  Outstandingrequestsshouldarrivewithinthat<meperiod

Page 32: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

Chubby:Prac<cal,warstories

Page 33: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

UseandObserva<ons•  Manyfilesfornaming

•  Config,ACL,meta-datacommon

•  10clientsuseeachcachedfile,onavg.

•  Fewlocksheld,nosharedlocks

•  KeepAlivesdominateRPCtraffic

Page 34: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

Use:Outages•  Sampleofcells–  61outagesoverfewweeks(700cell-days)–  duetonetworkconges<on,maintenance,overload,errorsinsoRware,hardware,operators

•  52outagesunder30s –  applica<onsnotsignificantlyaffected

•  Fewdozencell-yearsofopera<on–  dataloston6occasions(bugs&operatorerror)

Page 35: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

Today•  Google’sChubby– Mo<va<on– Designchoices–  Scaling/Performance– Availability

•  Yahoo’sZooKeeper(NowApache’sZookeeper)– DifferenceswithChubby

•  Summary

Page 36: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

Zookeeper=Chubbywithoutlocks

•  Filesystem-basedAPI– Similartypes:Ephemeral,persistent,sequen<al– DifferentAPIcalls

•  Performance– Cachingandwatches(ZK’sinvalida<on)•  Similarlyoneshot!

•  Availability– Keepalive,leaderelec<on

Page 37: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

FileTypesChubby ZooKeeper

Filetypes Ephemeral Yes Yes

Permanent Yes Yes

Sequen<al NO Yes

FileAPI Hierarchical Yes Yes

Predefinedpath Yes NO/

services

users

apps

locks

servers

YaView

s-1

morestupidity

stupidname

EphemeralscreatedbySessionX

Sequenceappendedoncreate

•  Ephemeral:theznodewillbedeletedwhenthesessionthatcreatedit<mesoutoritisexplicitlydeleted

•  Permanent:explicitlydeletedbyclient

•  Sequence:thepathnamewillhaveamonotonicallyincreasingcounterrela<vetotheparentappended

Page 38: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

FilesystemAPI

/ls/foo/wombat/pouch

Lock service; common to all names

Cell name

Name within cell

/

services

users

apps

locks

servers

YaView

read-1

morestupidity

stupidname

Page 39: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master

Read/WriteInterac<ons•  Writes

–  Allgothroughleader–  Requiresquorum

•  Reads–  ZK:gotoanynode

•  Higherperformance–  Chubby:gotoleader

•  Performancelimita<on–  Both:Clientscacheandserverinvalidatescache

•  Chubby:invalida<onisnon-op<onal•  ZK:mustexplicitlyregisterforinvalida<onrequests

Page 40: Consensus Services for Loosely- coupled distributed Systemscs.brown.edu/courses/csci1380/s18/lectures/L21.pdf• Every replica server – Is listed in DNS – Direct clients to master