Distributed File System: Data Storage for Networks Large and Small Pei Cao Cisco Systems, Inc.
-
date post
22-Dec-2015 -
Category
Documents
-
view
222 -
download
2
Transcript of Distributed File System: Data Storage for Networks Large and Small Pei Cao Cisco Systems, Inc.
Review: DFS Design Considerations
1. Name space construction
2. AAA
3. Operator batching
4. Client caching
5. Data consistency
6. Locking
Summing it Up: CIFS as an Example
• Network transport in CIFS– Use SMB (Server Message block) messages
over a reliable connection-oriented transport• TCP
• NetBIOS over TCP
– Use persistent connections called “sessions”• If a session is broken, client does the recovery
Design Choices in CIFS
• Name space construction: – per-client linkage, multiple methods for server
resolution• file://fs.xyz.com/users/alice/stuff.doc• \\cifsserver\users\alice\stuff.doc• E:\stuff.doc
– CIFS also offers “redirection” method• A share can be replicated in multiple servers or moved• Client open server reply
“STATUS_DFS_PATH_NOT_COVERED” client issues “TRANS2_DFS_GET_REFERRAL” server reply with new server
Design Choices in CIFS
• AAA: Kerberos– Older systems use NTLM
• Operator batching: supported– These methods have “AndX” variations:
TREE_CONNECT, OPEN, CREATE, READ, WRITE, LOCK
– Server implicitly takes results of preceding operations as input for subsequent operations
– First command that encounters an error stops all subsequent processing in the batch
Design Choices in CIFS
• Client caching– Cache both file data and file metadata, write-back cache, can read-
ahead– Offers strong cache consistency using an invalidation-based
approach• Data access consistency
– Oplocks: similar to “tokens” in AFS v3• “level II oplock”: read-only data locks• “exclusive oplock”: exclusive read/write data lock• “batch oplock”: exclusive read/write “open” lock and data lock and
metadata lock– Transition among the oplocks– Observation: can have a hierarchy of lock managers
Design Choices in CIFS
• File and data record locking– Offer “shared” (read-only) and “exclusive” (read/write)
locks– Part of the file system; Mandatory– Can lock either a whole file or byte-range in the file– Lock request can specify a timeout for waiting– Enables atomic writes with the “ANDX” batching with
Writes• “Lock/write/unlock” as a batched command sequence
• Additional capability: “directory change notification”
DFS for Mobile Networks
• What properties of DFS are desirable:– Handle frequent connection and disconnection– Enable clients to operate in disconnected state
for an extended period of time– Ways to resolve/merge conflicts
Design Issues for DFS in Mobile Networks
• What should be kept in client cache?• How to update the client cache copies with
changes made on the server?• How to upload changes made by the client
to the server?• How to resolve conflicts when more than
one clients change a file during disconnected state?
Example System: Coda
• Client cache content:– User can specify which directories should always be
cached on the client
– Also cache recently used files
– Cache replacement: walk over the cached items every 10 min to reevaluate their priorities
• Updates from server to client:– The server keeps a log of callbacks that couldn’t be
delivered and deliver them upon client connection
Coda File System
• Upload the changes from client to server– The client has to keep a “replay log”
• Contents of the “replay log”
– Ways to reduce the “replay log” size
• Handling conflicts– Detecting conflicts– Resolving conflicts
Performance Issues in File Servers
• Components of server load– Network protocol handling– File system implementation– Disk accesses
• Read operations– Metadata– Data
• Write operations– Metadata– Data
• Workload characterization
DFS for High-Speed Networks: DAFS
• Proposal from Network Appliance and companies• Goal: eliminate memory copies and protocol processing
– Standard implementation: network buffers file system buffer cache user-level application buffers
• Designed to take advantage of RDMA (“Remote DMA”) network protocols– Network transport provides direct memory memory transfer– Protocol processing is provided in hardware
• Suitable for high-bandwidth, low-error-rate, low-latency network
DAFS Protocol
• Data read from the client:– RDMA request from the server to copy file data directly into application
buffer• Data write from the client
– RDMA request from the server to copy application buffer into server memory
• Implementation:– as a library linked to user application interface with RDMA network
library directly• Eliminate two data copies
– as a new file system implementation in the kernel• Eliminate one data copy
• Performance advantage:– Example: 90 usec/op in NFS vs. 25 usec/op in DAFS
DAFS Features
• Session-based
• Offer authentication of client machines
• Flow control by server
• Stateful lock implementation with leases
• Offers atomic writes
• Offers operator batching
Clustered File Servers
• Goal: scalability in file service– Build a high-performance file service using a collection of cheap
file servers• Methods for Partitioning the Workload
– Each server can support one “subtree”• Advantages• Disadvantages
– Each server can support a group of clients• Advantages• Disadvantages
– Client requests are sent to server in round-robin or load-balanced fashion
• Advantages• Disadvantages
Non-Subtree-Partition Clustered File Servers
• Design issues– On which disks should the data be stored?– Management of memory cache in file servers– Data consistency management
• Metadata operation consistency• Data operation consistency
– Server failure management• Single server failure fault tolerance• Disk failure fault tolerance
Mapping Between Disks and Servers
• Direct-attached disks
• Network-attached disks– Fiber-channel attached disks– iSCSI attached disks
• Managing the network-attached disks: “volume manager”
Functionalities of a Volume Manager
• Group multiple disk partitions into a “logical” disk volume
• Volume can expand or shrink in size without affecting existing data
• Volume can be RAID-0/1/5, tolerating disk failures
• Volume can offer “snapshot” functionalities for easy backup
• Volumes are “self-evident”
Implementations of Volume Manager
• In-kernel implementation– Example: Linux volume manager, Veritas
volume manager, etc.
• Disk server implementation– Example: EMC storage systems