Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf ·...

30
Distributed Systems Day 16: Distributed File Systems Mar 19, 2019

Transcript of Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf ·...

Page 1: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

DistributedSystemsDay16:Distributed FileSystems

Mar19,2019

Page 2: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

Outline

• RecapofBGPandBFT• BGP:ByzantineGeneralProblem• BFT:ByzantineFaultTolerance

• DistributedFileSystems• Opportunistic Locking• AFSv.NFS• GFSv.AzureFileSystem

Page 3: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

FaultRecoveryByzantineV.Consensus

• Consensus• Surviveffailures• N>2f:

• soyouneedatleastN=2f+1

• Byzantinegeneral• Surviveffailures• N>3f

• soyouneedatleastN=3f+1

Howdoyouget‘f ’?• Analyzefailuredistributions• ‘what’sthemaxserver

failureatanytime’

Howdoyouget‘f ’?• Analyzesystemsecurity• ‘what’smaxserver’s

compromisedatanytime?’

Page 4: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

Attack! Attack!

She saidRetreat!

She saidAttack!

Attack! Retreat!

She saidRetreat

!

She saidAttack!

Whenf=1(liars),andN=3(total#ofnodes)thecorrectprocessescan’tdifferentiatebetweenagoodorbadgeneral

BadGeneralBadLT

GoodLTcan’tdifferentiatebetweentwoscenarios.3optionsalwayspickleader,alwayspickotherLT,orrandomly

GoodLTGoodLT

ConsensusProperties• C1: All loyallieutenantgenerals obeythesameorder

• C2:Ifthecommandinggeneralisloyal,theneveryloyallieutenant general obeystheorder shesends

• Note:ifthecommandinggeneralislyingthenByzantineGenerals doesn’thelpreachconsensus

GoodLT

AlwaysPickLeader:ViolatesC1

AlwaysPickotherLT:ViolatesC2

With3nodesandoneliar:CannotsatisfybothC1andC2

Page 5: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

[4,1]ByzantineGeneralsProblem

Attack!

Attack!

She saidRetreat!

She saidAttack!

She saidAttack!

She saidRetreat!

She saidAttack!

She saidAttack!

Retreat!

BadGeneral

Attack!

Attack!

She saidRetreat

!She saidAttack!

She saidAttack!

She saidRetreat

!She saidAttack!

She saidAttack!

Attack!

BadLT

GoodLTscan’tdistinguishbadGenfromBadLT.

ButGoodLTswillstilldothesamething.[SatisfyC1]• BylookingatmajorityofmessagesGoodLTwilldosamething.

WhenGenistruthful“samething”-->”leader’sorder” [SatisfyC2]

ConsensusProperties• C1: All loyallieutenantgenerals obeythesameorder

• C2:Ifthecommandinggeneralisloyal,theneveryloyallieutenant general obeystheorder shesends

• Note:ifthecommandinggeneralislyingthenByzantineGenerals doesn’thelpreachconsensus

Page 6: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

LocalFileSystem:BackgroundandTerminology

Page 7: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

WhatisaFile?

• Ablobofbinary?

• Asetofblobs?• Thinkabook:TableofContents+ chapters

• Indexà inode (maprangestodatablocks)• Chaptersà Datablocks

• Howaboutdirectories?• Howaboutfilepermissions?

Data21010101010100100

File11010101010100100

File1Data1Data2

Data11010101010100100

Page 8: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

WhatisaDirectory?

• Directory->mapnamestoFileIDs• DirectoryalsocontainsDirectory• InLinuxadirectory--->alsoafile

RootDirectory• File1->IDX• File2à IDY• Dir1à IDZ

Dir1• File3->IDM• File4->IDC

File1Data1Data2

File5Data8Data9

Page 9: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

WhatisaFileSystem?

• Filesystemsà systemthatmanagesfiles

• Provides• APIforApplications tointeractw/files• Algorithms forsecuringfiles(access control)• Maintainmetadataaboutafile

Application Application

FileSystem

API

File1 FileN

FileMetaData• Filelength(size)• Timestamp• Location• Referencecount• Type• accesscontrol• Owner

ModifiableByAPP

ModifiableByFileSystem

Page 10: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

TraditionalDistributedFileSystems

Page 11: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

LocalFSà DistributedFS

Application Application

FileSystem

API

File1 FileN

FileSystem

DirectoryServiceFlatfileservice

DataBlocks DataBlocks

Application

ClientAPI

Application

ClientAPI

RPCcallsRPCcalls

WhatareRPC

semantics?

Semantics

At-least-once(1ormorecalls)At-most-once(0 or1calls)

Page 12: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

FileSystemDesignChallenges

• MaintainTransparency• Accessà sameAPIforremote/local files• Locationà same``name’’forremote/local file• Mobilityà clientshouldbeunawareoffilesmoving• Performanceà asworkloadgrows:performance isOK• Scalabilityà #offilesgrow:performance isOK

FileSystem

DirectoryServiceFlatfileservice

DataBlocks DataBlocks

Application

ClientAPI

Application

ClientAPI

RPCcalls

RPCcalls

Page 13: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

GeneralDFSFunctionality

• DirectoryService• Mapsnamesà FileID• MapsFileID toDatablocks andblock location

• FlatFileservices• Unawareofdirectorystructure• UnawareofNames• Maintainsdatablocks

• Clientmodule:providestransparency• ProvidesfilesystemAPItoapplications• Hidescomplexity: includesgluelogicaroundRPCcalls• Knowslocationofdirectoryservice• Dealswithmobilityofdata

FileSystem

DirectoryServiceFlatfileservice

DataBlocks DataBlocks

Application

ClientAPI

Application

ClientAPI

RPCcalls

RPCcalls

Page 14: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

FlatFileSystemOperations

• Providesat-least-oncesemantics• Callsdeigned tobeidempotent• Client repeatcallsthat itreceives noresponse to• Serverisstateless: state isembedded ineachcall

Semantics

At-least-once(1ormorecalls)At-most-once(0 or1calls)

Page 15: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

FileSystemDesignChallenges

• MaintainTransparency• Accessà sameAPIforremote/local files• Locationà same``name’’forremote/local file• Mobilityà clientshouldbeunawareoffilesmoving• Performanceà asworkloadgrows:performance isOK• Scalabilityà #offilesgrow:performance isOK

FileSystem

DirectoryServiceFlatfileservice

DataBlocks DataBlocks

Application

ClientAPI

Application

ClientAPI

RPCcalls

RPCcalls

Page 16: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

ReliabilityProvidedbyRedundancy

• Serverfailureà lossofdata• Overcomedata-lossthroughredundancy

FileSystem

DirectoryServiceFlatfileservice

DataBlocks DataBlocks

RPCcalls

Application

ClientAPI

Cache

FileSystem

DirectoryServiceFlatfileservice

DataBlocks DataBlocks

Page 17: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

PerformanceProvidedbyCaching

• Client: cachesfiles locally• Eliminatesneedtousenetworktotransferdata!

FileSystem

DirectoryServiceFlatfileservice

DataBlocks DataBlocks

RPCcalls

Application

ClientAPI

Cache

NOCACHINGCACHING

Page 18: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

PerformanceProvidedbyCaching

• Client: cachesfiles locally• Eliminatesneedtousenetworktotransferdata!

FileSystem

DirectoryServiceFlatfileservice

DataBlocks DataBlocks

RPCcalls

Application

ClientAPI

Cache

NOCACHINGCACHING

Performanceisgood……ifalloperationstakeplaceonclient… everythingiscachedonclient

Strictconsistencyiseasy……ifalloperationstakeplaceonserver… noclientcaching

DesignChangecreatedbyCaching:• Directoryservicemustmaintain

state• Directoryserviceneedscomplex

logictodealwithstated• Especiallyduringfailures

Traditionaldesign:• Directoryserviceisstateless• OnlykeeptrackofName->ID

mapping

Page 19: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

OpportunisticLocks:strictconsistency

• CommonInternetFileSystem• Microsoft’sdistributed filesystem• Alternate lockingapproach

• Features• strictlyconsistent

https://learnteachevangelizeit.wordpress.com/2014/07/31/windows-server-2012-r2-file-and-storage-services/

Page 20: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

OverviewofOpportunisticLocks

Server

OpenA

OK,OpLock

Client1 Client2

OpenA

RevokeOpLock

OK,changes

OK

Step0:findlocationofdatablocksStep1:requestlocksStep2:servergrantslocksStep3:downloadfiletolocalcacheStep4:modifylocalcache

…..

StepN:serverrevokeslockStepN+1:clientflushescachetodiskStepN+2:releaselock

Page 21: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

ManagingCachingandConsistency:OpportunisticLocking

• Tomodifyblocks,clientrequestslocks• Onceaclients havelocks

• Clientscanread/writeblocksusinglocalcachewithoutusingthenetwork

• LeadstosignificantPerformanceimprovements

• Otherclientsmust:• Waitforservertorevokelocks• Potentialwaittimes

https://docs.microsoft.com/en-us/windows/desktop/fileio/opportunistic-locksFileSystem

DirectoryServiceFlatfileservice

DataBlocks DataBlocks

RPCcalls

Application

ClientAPI

Cache

Opportunisticbecauseserveronlygrantsthelocks

if/whenconvenient

Page 22: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

ManagingCachingandConsistency:OpportunisticLocking

• Tomodifyblocks,clientrequestslocks• Onceaclients havelocks

• Clientscanread/writeblocksusinglocalcachewithoutusingthenetwork

• LeadstosignificantPerformanceimprovements

• Otherclientsmust:• Waitforservertorevokelocks• Potentialwaittimes

https://docs.microsoft.com/en-us/windows/desktop/fileio/opportunistic-locksFileSystem

DirectoryServiceFlatfileservice

DataBlocks DataBlocks

RPCcalls

Application

ClientAPI

Cache

Step1:findlocationofdatablocksStep2:requestlocksStep3:downloadfiletolocalcacheStep4:modifylocalcache

…..

StepN:serverrevokeslockStepN+1:clientflushescachetodiskStepN+2:releaselock

Page 23: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

Locks:Opportunistic,Optimistic,Pessimistic

Page 24: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

TypesofLocksDiscussed

DistributedTransactions:• PessimisticLocks:

• Getallblocksbeforeatransactionstartsandreleaseafter

• Lowperformancebutguaranteesstrictserializability• Optimisticlocks:

• Proveshighperformancebutmanytransactionsabort• getsnapshotsofdatabeforetransactionstartsandon-

commitcomparesnapshotwithdataandonlycommitifdataisunchanged

DistributedFileSystems:• Opportunisticlocks:

• clientsgetlocksbeforedatamanipulation

• Enablesperformanceimprovements

• Clientswithalockcanmanipulatedatalocallywithoutusingthenetwork

Page 25: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

GFS:GoogleFileSystem

Page 26: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

DataCenterFS@2003

• Googleneededagooddistributedfilesystem• Redundantstorageofmassiveamountsofdataoncommoditycomputers(cheapandunreliable)

• Whynotuseanexistingfilesystem?• Google’sproblemsaredifferentfromothersintermsofworkloadanddesignpriorities

• GooglefilesystemisdesignedforGoogleapplications

• GoogleapplicationsaredesignedforGFS

Page 27: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

Assumptionsà DesignDecisions

• Highcomponentfailurerates• Inexpensive commodity components oftenfail

• Modestnumberofhugefiles• Afewmillion 100MBorlargerfiles

• Filesarewrite-once,mostlyappendedto• Largestreamingreads• Highsustainedbandwidthisfavoredoverlowlatency

144

• Filesstoredaschunks• Fixedsize(64MB)

• Reliability throughreplication• Eachchunkisreplicatedacross3+chunkservers

• Singlemastertocoordinateaccess andkeepmetadata• Simplecentralizedmanagement

• Nodatacaching• Littlebenefitduetolargedatasets,streamingreads

• Familiar interfacebutcustomizetheAPI• Snapshot andrecordappend

Page 28: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

GFSArchitecture

145FileSystem

DirectoryServiceFlatfileservice

DataBlocks DataBlocks

RPCcalls

Application

ClientAPI

Cache

Page 29: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

GFSArchitecture

146

OneMaster!!!!• StoresMetaData• Storemapofnameà FileID

ManyChunkServers• Storesdatablocks• Datablocks areCalledchunks

FileSystem

DirectoryServiceFlatfileservice

DataBlocks DataBlocks

RPCcalls

Application

ClientAPI

Cache

Page 30: Distributed Systems - Brown Universitycs.brown.edu/courses/csci1380/s19/lectures/Day16_2019.pdf · • AFS v. NFS • GFS v. Azure File System. Fault Recovery Byzantine V. Consensus

GFSArchitecture

147

SingleMaster:Challenges:

SinglepointoffailureScalabilitybottleneck

GFSsolutionsShadowmaster(thinkLeader,followers)Reducemasterinvolvement

onlyusedformetadata(smalldata,fewAPIcalls)Largechunksizeà reduce#ofAPIcalls

ClientsneverusefordataOnwrite:mastergiveschunkalease(Delegatesauthority)