Distributed File Systems

35
Thomas Hollstegge Distributed File Systems

description

Distributed File Systems. Agenda. Motivation Distributed file system basics Case studies Summary and outlook. Agenda. Motivation Distributed file system basics Case studies Summary and outlook. Motivation. ICT allows for distributed work Users work timely and spatially separated - PowerPoint PPT Presentation

Transcript of Distributed File Systems

Page 1: Distributed File Systems

Thomas Hollstegge

Distributed File Systems

Page 2: Distributed File Systems

2

Distributed file systems

Agenda

Motivation

Distributed file system basics

Case studies

Summary and outlook

Page 3: Distributed File Systems

3

Distributed file systems

Agenda

Motivation

Distributed file system basics

Case studies

Summary and outlook

Page 4: Distributed File Systems

4

Distributed file systems

Motivation

ICT allows for distributed workUsers work timely and spatially separated

They need access to common data collections

Provided by distributed file systems (DFS)

Distributed work leads to new business models24/7 customer service

Analysis of worldwide financial information (stock prices etc.)

Economic relevance!

Different DFSs were developed in the past Structured discussion necessary

Page 5: Distributed File Systems

5

Distributed file systems

Agenda

Motivation

Distributed file system basics

Case studies

Summary and outlook

Page 6: Distributed File Systems

6

Distributed file systems

Basics – Storage fundamentals

„Storage“: Fundamendal abstraction in computingData encapsulated in objects

Explicit creation and deletion

Unaffected by system failures

„File system“: Refinement of abstraction

Three different usage dimensionsSingle user vs. multiple users

Single-thread vs. multi-thread OS

Single site vs. multiple sites

Page 7: Distributed File Systems

7

Distributed file systems

Basics – Requirements for DFS (1/2)

TransparencyUser must be unaware of internal separation of components

Access, performance, location, scaling transparency

AvailabilitySystem should be fault tolerant

Concurrent updatesSimultaneous access to a single resource

ReplicationFile may be present at different locations

Shares load between servers, enhances fault tolerancy

Page 8: Distributed File Systems

8

Distributed file systems

Basics – Requirements for DFS (2/2)

Hardware and software heterogeneitySupport for various platforms

ConsistencyData integrity has to be maintained

SecurityAccess control, user authentication, confidentiality

EfficiencyPerformance should be comparable to local file systems

Page 9: Distributed File Systems

9

Distributed file systems

Basics – Abstract file service model (1/3)

Source: [CDK01], p. 318

Page 10: Distributed File Systems

10

Distributed file systems

Basics – Abstract file service model (2/3)

Service Operations

Directory service Lookup(Dir, Name) FileId – throws NotFoundAddName(Dir, Name, File) – throws NameDuplicateUnName(Dir, Name) – throws NotFoundGetNames(Dir, Pattern) NameSeq

Flat file service Read(FileId, i, n) Data – throws BadPositionWrite(FileId, i, Data) – throws BadPositionCreate() FileIdDelete(FileId)GetAttributes(FileId) AttrSetAttributes(FileId, Attr)

Source: [CDK01], p. 319-322

Page 11: Distributed File Systems

11

Distributed file systems

Basics – Abstract file service model (3/3)

Access controlServer-side user authorisation

Access rights checked upon directory lookup or every request

Hierarchical file structureRealised within the client module

Directories may store references to other directories

File groupsSet of files that can be moved between servers

Similar to a file system

Page 12: Distributed File Systems

12

Distributed file systems

Agenda

Motivation

Distributed file system basics

Case studiesNetwork File System (NFS)

Andrew File System (AFS)

Lustre

Summary and outlook

Page 13: Distributed File Systems

13

Distributed file systems

NFS – History

198?: NFSv1Developed at Sun Microsystems, unreleased

1984: NFSv2Developed at Sun Microsystems

First released version, widely accepted

Supports files < 4GB, synchronous writes

1992: NFSv3Developed by a group of researchers

Overcomes drawbacks (file size, asynchronous writes)

2002: NFSv4Enhanced security, user authentication

Better Windows support

Page 14: Distributed File Systems

14

Distributed file systems

NFS – General description (1/2)

Source: [CDK01], p. 324

Page 15: Distributed File Systems

15

Distributed file systems

NFS – General description (2/2)

Stateless protocolServer does not maintain client states

Client requests are blocking (Exception: asynchronous write)

User authenticationDefault: UNIX user ID (insecure!)

Optional: Kerberos, DES

CachingRead cache: Yes

Write cache: No!

Server file systemNot restricted, should support unique file IDs

Page 16: Distributed File Systems

16

Distributed file systems

NFS – Abstract model (1/2)

vs.

Page 17: Distributed File Systems

17

Distributed file systems

NFS – Abstract model (2/2)

OperationsSimilar to UNIX file system calls

All abstract operations can be represented

Access controlChecked upon every request

Hierarchical file systemRealised within the client module

File groupsNot supported, only manual movement of files

Page 18: Distributed File Systems

18

Distributed file systems

NFS – Requirements

Transparency

Availability

Concurrent updates

Replication

Heterogeneity

Consistency

Security

Efficiency

Page 19: Distributed File Systems

19

Distributed file systems

Agenda

Motivation

Distributed file system basics

Case studiesNetwork File System (NFS)

Andrew File System (AFS)

Lustre

Summary and outlook

Page 20: Distributed File Systems

20

Distributed file systems

AFS – History

1982: Initial versionDeveloped at Carnegie Mellon University (CMU), Pittsburgh

Part of the Andrew distributed computing environment

Provides support for teaching and research

1989: Spin-offDevelopment outsourced to Transarc Inc.

1994: Transarc acquired by IBMAll rights owned by IBM

2000: Open-sourceCode was released under an open source license

Since then: continuous development

Page 21: Distributed File Systems

21

Distributed file systems

AFS – General description (1/3)

Page 22: Distributed File Systems

22

Distributed file systems

AFS – Name spaces

Page 23: Distributed File Systems

23

Distributed file systems

AFS – General description (2/3)

Cached?No!

Page 24: Distributed File Systems

24

Distributed file systems

AFS – General description (3/3)

Caching„Callback promises“

Workstations are notified when cached files change

Stateful protocolServer maintains client states

Problematic when client fails

User authenticationKerberos

Server file systemNot restricted, should support unique file IDs

Page 25: Distributed File Systems

25

Distributed file systems

AFS – Abstract model (1/2)

vs.

Page 26: Distributed File Systems

26

Distributed file systems

AFS – Abstract model (2/2)

OperationsDiffer from abstract model

Some operations combined, callback promises added

Access controlRights checked upon every request

Extended access lists per directory

Hierarchical file systemRealized within the client module

File groupsFile idenitfier contains link to file group

Location database maps file groups to servers

Page 27: Distributed File Systems

27

Distributed file systems

AFS – Requirements

Transparency

Availability

Concurrent updates

Replication

Heterogeneity

Consistency

Security

Efficiency

Page 28: Distributed File Systems

28

Distributed file systems

Agenda

Motivation

Distributed file system basics

Case studiesNetwork File System (NFS)

Andrew File System (AFS)

Lustre

Summary and outlook

Page 29: Distributed File Systems

29

Distributed file systems

Lustre (1/3)

„Lustre“: Linux ClusterFile system especially suited for clusters

Easily handles thousands of clients and servers

Uses object-based storageObjects offer methods for data access, attributes, policies

High-level abstraction

Lower performance than block-based storage

Three system rolesObject Storage Targets (OST)

Metadata Servers (MDS)

Clients

Page 30: Distributed File Systems

30

Distributed file systems

Lustre (2/3)

Object StorageTargets(OST)

MetadataServers(MDS)

Clients

File operations,locking

Recovery,file status

Directorymetadata

Source: [BS02], p. 51

Page 31: Distributed File Systems

31

Distributed file systems

Lustre (3/3)

Lustre partly follows abstract modelSeparation of directory and flat file service

File attributes managed by OSTs

Hierarchical file systemsRealised within the client module

High availabilityHeavy use of redundancy

Caching of metadata

Page 32: Distributed File Systems

32

Distributed file systems

Agenda

Motivation

Distributed file system basics

Case studies

Summary and outlook

Page 33: Distributed File Systems

33

Distributed file systems

Summary and outlook

Abstract file service modelDeveloped to meet many requirements for DFSs

Different implementationsNFS: Stateless, concurrency control

AFS: Stateful, heavy use of caching, better performance

Other approach: LustreModularised approach, especially suited for clusters

Future developmentsLarge-scale environments

Cloud computing

Issues: Data security, privacy

Page 34: Distributed File Systems

34

Distributed file systems

ANY QUESTIONS?Thank you for your attention!

Page 35: Distributed File Systems

35

Distributed file systems

Literature

[BS02] Peter J. Braam, Philip Schwan: Lustre: The intergalactic file system, Proceedings of the 2003 Ottawa Linux Symposium, pp. 50–54, 2002.[CDK01] George Coulouris, Jean Dollimore, Tim Kindberg: Distributed Systems, Concepts and Design, 3rd. ed., Addison-Wesley, 2001.[Kir06] Olaf Kirch: Why NFS Sucks, Proceedings of the Linux Symposium, 2nd. ed., pp. 51–63, 2006.[MSC+ 86] James H. Morris, Mahadev Satyanarayanan, Michael H. Conner, John H. Howard, David S. H. Rosenthal, F. Donelson Smith: Andrew: A distributed personal computing environment, Commununications of the ACM, 29(3), pp. 184–201, Association for Computing Machinery, 1986.[PJS+ 94] Brian Pawlowski, Chet Juszczak, Peter Staubach, Carl Smith, Diane Lebel, David Hitz: NFS Version 3: Design and Implementation, Proceedings of the Summer 1994 USENIX Technical Conference, pp. 137–151, 1994.[Sat89] Mahadev Satyanarayanan: Distributed file systems, Distributed systems, S. Mullender (ed.), pp. 149–188, ACM Press, 1989.[Sch03] Philip Schwan: Lustre: Building a file system for 1000-node clusters, Proceedings of the 2003 Ottawa Linux Symposium, pp. 380–386, 2003.[Tan03] Andrew S. Tanenbaum: Moderne Betriebssysteme, 2nd. ed., Prentice Hall, 2003.[Tv07] Andrew S. Tanenbaum, Marten van Steen: Distributed Systems: Principles and Paradigmsva, 2nd. ed., Prentice Hall, 2007.