LEC16 Dist Para File Systems - ranger.uta.edu

17
CSE 3320 Operating Systems Distributed and Parallel File Systems Jia Rao Department of Computer Science and Engineering http://ranger.uta.edu/~jrao

Transcript of LEC16 Dist Para File Systems - ranger.uta.edu

Page 1: LEC16 Dist Para File Systems - ranger.uta.edu

CSE 3320OperatingSystems

DistributedandParallelFileSystems

Jia RaoDepartmentofComputerScience and Engineering

http://ranger.uta.edu/~jrao

Page 2: LEC16 Dist Para File Systems - ranger.uta.edu

RecapofPreviousClasses• Filesystemsprovideanabstractionofpermanentlystoreddatao Namespace:filesanddirectories

} Translatepathstocorrespondinglocationsondisks

o Spacemanagementandoptimizations} Freeblocks

} Cachingandprefetching

o Reliabilityandconsistency

Page 3: LEC16 Dist Para File Systems - ranger.uta.edu

DistributedandParallelFileSystems

• Providesimilarabstractionsofdataonmultiple machineso Namespace:pathnameàmachineID:diskblockaddress

o Management:placementoffilesonmachines} Replication

} Striping

• Designedforperformanceandavailability

Page 4: LEC16 Dist Para File Systems - ranger.uta.edu

Distributedv.s.ParallelFileSystems• Designobjectives

o Fault-tolerancev.s.Concurrentperformance

• Datadistributiono Entirefileonasinglenodev.s.stripingovermultinodes

• Symmetryo Storageco-locatedwithappsv.s.storageseparatedfromapps

• Fault-toleranceo Designedforfault-tolerancev.s.relyingonenterprisestorage

• Workloado Looselycoupled,distributedappsv.s.coordinatedHPCapps

Theboundaryisblurring

Page 5: LEC16 Dist Para File Systems - ranger.uta.edu

Examples

• DistributedFileSystemso NFS,GFS(GoogleFileSystem),HDFS(Hadoop DistributedFileSystem),GlusterFS

• ParallelFileSystemso PVFS(ParallelVirtualFileSystem),Lustre,OCFS2,GPFS

Page 6: LEC16 Dist Para File Systems - ranger.uta.edu

DesignIssues(1)

• Nameservero mapsfilenamestoobjects(files,directories,blocks)o Implementationoptions

} SinglenameServer¨ Simple implementation, reliabilityandperformance issues

} SeveralNameServers(ondifferenthosts)¨ Eachserverresponsible foradomain

Page 7: LEC16 Dist Para File Systems - ranger.uta.edu

DesignIssues(2)• Caching

o Cachingattheclient:Mainmemoryvs.Disko Cacheconsistency

} Serverinitiated¨ Serverinformscachemanagerswhendatainclientcachesisstale¨ Clientcachemanagersinvalidatestaledataorretrievenewdata¨ Disadvantage:extensivecommunication

} Clientinitiated¨ Cachemanagersattheclientsvalidatedatawithserverbeforereturningitto

clients¨ Disadvantage:extensivecommunication

} Prohibit filecachingwhenconcurrent-writing¨ Severalclientsopenafile,atleastoneofthemforwriting¨ Serverinformsallclientstopurgethatcachedfile

} Lockfileswhenconcurrent-writesharing (atleastoneclientopens forwrite)

Page 8: LEC16 Dist Para File Systems - ranger.uta.edu

DesignIssues(3)• Update(write)policy

o Onceaclientwritesintoafile(andthelocalcache),whenshouldthemodifiedcachebesenttotheserver?} Write-through:allwritesattheclients,immediatelytransferredtotheservers¨ Advantage:reliability¨ Disadvantage:performance, itdoesnottakeadvantageofthecache

} Delayedwriting:delaytransfertoservers¨ Advantages:

¨ Manywritestakeplace(including intermediateresults)beforeatransfer

¨ Somedatamaybelost¨ Disadvantage:reliability

} Delayedwritinguntilfileisclosedatclient¨ Forshortopen intervals,sameasdelayedwriting¨ Forlong intervals,reliabilityproblems

Page 9: LEC16 Dist Para File Systems - ranger.uta.edu

DesignIssues(4)Availability

o Whatisthelevelofavailabilityoffilesinadistributedfilesystem?

o Usereplicationtoincreaseavailability,i.e.manycopies(replicas)offilesaremaintainedatdifferentsites/servers

o Replicationissues:} Howtokeepreplicasconsistent

} Howtodetectinconsistencyamong replicas

Page 10: LEC16 Dist Para File Systems - ranger.uta.edu

DesignIssues(5)Scalability

o Dealwithagrowingsystem?

o Issues} Nodejoinandleave(fail)

} Cacheconsistency

} Nameserver

o Solutions} Replication

} Designcacheconsistencyprotocolforscalability

} Multiplename(meta)servers

} Takeadvantageofmulti-threadandmulti-core

Page 11: LEC16 Dist Para File Systems - ranger.uta.edu

Example- GlusterFS (DFS)

Client-1 Client-2 Client-N

Gluster VirtualStoragePool(builtondonatedpartitionsoneachmachine)

Gluster GlobalNamespace(Gluster Native)

IPnetwork

Page 12: LEC16 Dist Para File Systems - ranger.uta.edu

Example– GlusterFS (2)

• Threewaystoplacefileso Distribute:placeentirefilesondifferentservers

} Pros:goodscalability,efficientdiskspaceusage

} Cons:poorreliability

o Replicate:placeidenticalcopiesoffilesondifferentservers} Pros:reliability

} Cons:wasteddiskspace,moderatescalability

o Stripe:placeonlypartofafileononeserver} Pros:goodperformanceforconcurrentandrandomaccess

} Cons:poorscalabilityandreliability

Page 13: LEC16 Dist Para File Systems - ranger.uta.edu

Example– PVFS(PFS)

Page 14: LEC16 Dist Para File Systems - ranger.uta.edu

Example– PVFS(PFS)

Significant improvement inthroughputWhatcouldbetheissues?

1. Severcoordinationaffectsefficiency2. ClientQoS?

Page 15: LEC16 Dist Para File Systems - ranger.uta.edu

DFSandPFSintheCloud(1)

• Bothapproachesprovidecheap,reliableandhigh-performancecloudstoragesolutions

Usecase-1

Page 16: LEC16 Dist Para File Systems - ranger.uta.edu

DFSandPFSintheCloud(2)

Usecase-2

Page 17: LEC16 Dist Para File Systems - ranger.uta.edu

SomeRealResults…• Hosta8-VMHadoop clusteron8DELLmachines

• PerformedmicroandrealI/Ointensiveworkloads

• Twostoragesolutions:PVFSandlocalext3

PVFS Localext3

Gridmix websort 20GBdata 2391second 4693second

16k 32k 64k 256k 1M

Sequential 58.89 60.15 60.47 104.80 130.47

random 12.34 20.84 33.51 50.43 108.71

16k 32k 64k 256k 1M

Sequential 120.11 120.56 120.39 120.39 120.57

random 4.01 7.80 14.71 43.20 92.19

PVFS

Localext3

Networkbandwidthbottleneck