Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf ·...
Transcript of Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf ·...
DistributedSystemsDay16:Distributed FileSystems
Mar19,2019
Outline
• RecapofBGPandBFT• BGP:ByzantineGeneralProblem• BFT:ByzantineFaultTolerance
• DistributedFileSystems• Opportunistic Locking• AFSv.NFS• GFSv.AzureFileSystem
FaultRecoveryByzantineV.Consensus
• Consensus• Surviveffailures• N>2f:
• soyouneedatleastN=2f+1
• Byzantinegeneral• Surviveffailures• N>3f
• soyouneedatleastN=3f+1
Howdoyouget‘f ’?• Analyzefailuredistributions• ‘what’sthemaxserver
failureatanytime’
Howdoyouget‘f ’?• Analyzesystemsecurity• ‘what’smaxserver’s
compromisedatanytime?’
Attack! Attack!
She saidRetreat!
She saidAttack!
Attack! Retreat!
She saidRetreat
!
She saidAttack!
Whenf=1(liars),andN=3(total#ofnodes)thecorrectprocessescan’tdifferentiatebetweenagoodorbadgeneral
BadGeneralBadLT
GoodLTcan’tdifferentiatebetweentwoscenarios.3optionsalwayspickleader,alwayspickotherLT,orrandomly
GoodLTGoodLT
ConsensusProperties• C1: All loyallieutenantgenerals obeythesameorder
• C2:Ifthecommandinggeneralisloyal,theneveryloyallieutenant general obeystheorder shesends
• Note:ifthecommandinggeneralislyingthenByzantineGenerals doesn’thelpreachconsensus
GoodLT
AlwaysPickLeader:ViolatesC1
AlwaysPickotherLT:ViolatesC2
With3nodesandoneliar:CannotsatisfybothC1andC2
[4,1]ByzantineGeneralsProblem
Attack!
Attack!
She saidRetreat!
She saidAttack!
She saidAttack!
She saidRetreat!
She saidAttack!
She saidAttack!
Retreat!
BadGeneral
Attack!
Attack!
She saidRetreat
!She saidAttack!
She saidAttack!
She saidRetreat
!She saidAttack!
She saidAttack!
Attack!
BadLT
GoodLTscan’tdistinguishbadGenfromBadLT.
ButGoodLTswillstilldothesamething.[SatisfyC1]• BylookingatmajorityofmessagesGoodLTwilldosamething.
WhenGenistruthful“samething”-->”leader’sorder” [SatisfyC2]
ConsensusProperties• C1: All loyallieutenantgenerals obeythesameorder
• C2:Ifthecommandinggeneralisloyal,theneveryloyallieutenant general obeystheorder shesends
• Note:ifthecommandinggeneralislyingthenByzantineGenerals doesn’thelpreachconsensus
LocalFileSystem:BackgroundandTerminology
WhatisaFile?
• Ablobofbinary?
• Asetofblobs?• Thinkabook:TableofContents+ chapters
• Indexà inode (maprangestodatablocks)• Chaptersà Datablocks
• Howaboutdirectories?• Howaboutfilepermissions?
Data21010101010100100
File11010101010100100
File1Data1Data2
Data11010101010100100
WhatisaDirectory?
• Directory->mapnamestoFileIDs• DirectoryalsocontainsDirectory• InLinuxadirectory--->alsoafile
RootDirectory• File1->IDX• File2à IDY• Dir1à IDZ
Dir1• File3->IDM• File4->IDC
File1Data1Data2
File5Data8Data9
WhatisaFileSystem?
• Filesystemsà systemthatmanagesfiles
• Provides• APIforApplications tointeractw/files• Algorithms forsecuringfiles(access control)• Maintainmetadataaboutafile
Application Application
FileSystem
API
File1 FileN
FileMetaData• Filelength(size)• Timestamp• Location• Referencecount• Type• accesscontrol• Owner
ModifiableByAPP
ModifiableByFileSystem
TraditionalDistributedFileSystems
LocalFSà DistributedFS
Application Application
FileSystem
API
File1 FileN
FileSystem
DirectoryServiceFlatfileservice
DataBlocks DataBlocks
Application
ClientAPI
Application
ClientAPI
RPCcallsRPCcalls
WhatareRPC
semantics?
Semantics
At-least-once(1ormorecalls)At-most-once(0 or1calls)
FileSystemDesignChallenges
• MaintainTransparency• Accessà sameAPIforremote/local files• Locationà same``name’’forremote/local file• Mobilityà clientshouldbeunawareoffilesmoving• Performanceà asworkloadgrows:performance isOK• Scalabilityà #offilesgrow:performance isOK
FileSystem
DirectoryServiceFlatfileservice
DataBlocks DataBlocks
Application
ClientAPI
Application
ClientAPI
RPCcalls
RPCcalls
GeneralDFSFunctionality
• DirectoryService• Mapsnamesà FileID• MapsFileID toDatablocks andblock location
• FlatFileservices• Unawareofdirectorystructure• UnawareofNames• Maintainsdatablocks
• Clientmodule:providestransparency• ProvidesfilesystemAPItoapplications• Hidescomplexity: includesgluelogicaroundRPCcalls• Knowslocationofdirectoryservice• Dealswithmobilityofdata
FileSystem
DirectoryServiceFlatfileservice
DataBlocks DataBlocks
Application
ClientAPI
Application
ClientAPI
RPCcalls
RPCcalls
FlatFileSystemOperations
• Providesat-least-oncesemantics• Callsdeigned tobeidempotent• Client repeatcallsthat itreceives noresponse to• Serverisstateless: state isembedded ineachcall
Semantics
At-least-once(1ormorecalls)At-most-once(0 or1calls)
FileSystemDesignChallenges
• MaintainTransparency• Accessà sameAPIforremote/local files• Locationà same``name’’forremote/local file• Mobilityà clientshouldbeunawareoffilesmoving• Performanceà asworkloadgrows:performance isOK• Scalabilityà #offilesgrow:performance isOK
FileSystem
DirectoryServiceFlatfileservice
DataBlocks DataBlocks
Application
ClientAPI
Application
ClientAPI
RPCcalls
RPCcalls
ReliabilityProvidedbyRedundancy
• Serverfailureà lossofdata• Overcomedata-lossthroughredundancy
FileSystem
DirectoryServiceFlatfileservice
DataBlocks DataBlocks
RPCcalls
Application
ClientAPI
Cache
FileSystem
DirectoryServiceFlatfileservice
DataBlocks DataBlocks
PerformanceProvidedbyCaching
• Client: cachesfiles locally• Eliminatesneedtousenetworktotransferdata!
FileSystem
DirectoryServiceFlatfileservice
DataBlocks DataBlocks
RPCcalls
Application
ClientAPI
Cache
NOCACHINGCACHING
PerformanceProvidedbyCaching
• Client: cachesfiles locally• Eliminatesneedtousenetworktotransferdata!
FileSystem
DirectoryServiceFlatfileservice
DataBlocks DataBlocks
RPCcalls
Application
ClientAPI
Cache
NOCACHINGCACHING
Performanceisgood……ifalloperationstakeplaceonclient… everythingiscachedonclient
Strictconsistencyiseasy……ifalloperationstakeplaceonserver… noclientcaching
DesignChangecreatedbyCaching:• Directoryservicemustmaintain
state• Directoryserviceneedscomplex
logictodealwithstated• Especiallyduringfailures
Traditionaldesign:• Directoryserviceisstateless• OnlykeeptrackofName->ID
mapping
OpportunisticLocks:strictconsistency
• CommonInternetFileSystem• Microsoft’sdistributed filesystem• Alternate lockingapproach
• Features• strictlyconsistent
https://learnteachevangelizeit.wordpress.com/2014/07/31/windows-server-2012-r2-file-and-storage-services/
OverviewofOpportunisticLocks
Server
OpenA
OK,OpLock
Client1 Client2
OpenA
RevokeOpLock
OK,changes
OK
Step0:findlocationofdatablocksStep1:requestlocksStep2:servergrantslocksStep3:downloadfiletolocalcacheStep4:modifylocalcache
…..
StepN:serverrevokeslockStepN+1:clientflushescachetodiskStepN+2:releaselock
ManagingCachingandConsistency:OpportunisticLocking
• Tomodifyblocks,clientrequestslocks• Onceaclients havelocks
• Clientscanread/writeblocksusinglocalcachewithoutusingthenetwork
• LeadstosignificantPerformanceimprovements
• Otherclientsmust:• Waitforservertorevokelocks• Potentialwaittimes
https://docs.microsoft.com/en-us/windows/desktop/fileio/opportunistic-locksFileSystem
DirectoryServiceFlatfileservice
DataBlocks DataBlocks
RPCcalls
Application
ClientAPI
Cache
Opportunisticbecauseserveronlygrantsthelocks
if/whenconvenient
ManagingCachingandConsistency:OpportunisticLocking
• Tomodifyblocks,clientrequestslocks• Onceaclients havelocks
• Clientscanread/writeblocksusinglocalcachewithoutusingthenetwork
• LeadstosignificantPerformanceimprovements
• Otherclientsmust:• Waitforservertorevokelocks• Potentialwaittimes
https://docs.microsoft.com/en-us/windows/desktop/fileio/opportunistic-locksFileSystem
DirectoryServiceFlatfileservice
DataBlocks DataBlocks
RPCcalls
Application
ClientAPI
Cache
Step1:findlocationofdatablocksStep2:requestlocksStep3:downloadfiletolocalcacheStep4:modifylocalcache
…..
StepN:serverrevokeslockStepN+1:clientflushescachetodiskStepN+2:releaselock
Locks:Opportunistic,Optimistic,Pessimistic
TypesofLocksDiscussed
DistributedTransactions:• PessimisticLocks:
• Getallblocksbeforeatransactionstartsandreleaseafter
• Lowperformancebutguaranteesstrictserializability• Optimisticlocks:
• Proveshighperformancebutmanytransactionsabort• getsnapshotsofdatabeforetransactionstartsandon-
commitcomparesnapshotwithdataandonlycommitifdataisunchanged
DistributedFileSystems:• Opportunisticlocks:
• clientsgetlocksbeforedatamanipulation
• Enablesperformanceimprovements
• Clientswithalockcanmanipulatedatalocallywithoutusingthenetwork
GFS:GoogleFileSystem
DataCenterFS@2003
• Googleneededagooddistributedfilesystem• Redundantstorageofmassiveamountsofdataoncommoditycomputers(cheapandunreliable)
• Whynotuseanexistingfilesystem?• Google’sproblemsaredifferentfromothersintermsofworkloadanddesignpriorities
• GooglefilesystemisdesignedforGoogleapplications
• GoogleapplicationsaredesignedforGFS
Assumptionsà DesignDecisions
• Highcomponentfailurerates• Inexpensive commodity components oftenfail
• Modestnumberofhugefiles• Afewmillion 100MBorlargerfiles
• Filesarewrite-once,mostlyappendedto• Largestreamingreads• Highsustainedbandwidthisfavoredoverlowlatency
144
• Filesstoredaschunks• Fixedsize(64MB)
• Reliability throughreplication• Eachchunkisreplicatedacross3+chunkservers
• Singlemastertocoordinateaccess andkeepmetadata• Simplecentralizedmanagement
• Nodatacaching• Littlebenefitduetolargedatasets,streamingreads
• Familiar interfacebutcustomizetheAPI• Snapshot andrecordappend
GFSArchitecture
145FileSystem
DirectoryServiceFlatfileservice
DataBlocks DataBlocks
RPCcalls
Application
ClientAPI
Cache
GFSArchitecture
146
OneMaster!!!!• StoresMetaData• Storemapofnameà FileID
ManyChunkServers• Storesdatablocks• Datablocks areCalledchunks
FileSystem
DirectoryServiceFlatfileservice
DataBlocks DataBlocks
RPCcalls
Application
ClientAPI
Cache
GFSArchitecture
147
SingleMaster:Challenges:
SinglepointoffailureScalabilitybottleneck
GFSsolutionsShadowmaster(thinkLeader,followers)Reducemasterinvolvement
onlyusedformetadata(smalldata,fewAPIcalls)Largechunksizeà reduce#ofAPIcalls
ClientsneverusefordataOnwrite:mastergiveschunkalease(Delegatesauthority)