A Low-bandwidth Network File System Presentation by Joseph Thompson.

19
A Low-bandwidth Network File System Presentation by Joseph Thompson

Transcript of A Low-bandwidth Network File System Presentation by Joseph Thompson.

Page 1: A Low-bandwidth Network File System Presentation by Joseph Thompson.

A Low-bandwidth Network File System

Presentation by Joseph Thompson

Page 2: A Low-bandwidth Network File System Presentation by Joseph Thompson.

Problem

• Without a network file system, people have two methods of editing files remotely:– Make and edit local copies of files

• Risk of update conflicts

– Use remote login • If low latency networks, unresponsive applications

become a problem.

• Most network files systems are designed for local high bandwidth networks.

Page 3: A Low-bandwidth Network File System Presentation by Joseph Thompson.

Goal

• To create a network file system capable of operating over a WAN by building a system that consumes less bandwidth than most current file systems.

Page 4: A Low-bandwidth Network File System Presentation by Joseph Thompson.

Plan

• Provide traditional file system semantics and consistency.

• Exploit cross-file similarities.

• Use positive aspects of other file systems– NFS, AFS, Echo, JetFile, CODA, Bayou,

OceanStore, TACT, Rsync.

Page 5: A Low-bandwidth Network File System Presentation by Joseph Thompson.

Provide traditional file system semantics and consistency

• LBFS provides close-to-open consistency.– After a client has successfully written and

closed a file, the data is safely stored to the server.

• Wanted to build a file system that could directly substitute for a widely accepted network file system in use today.

Page 6: A Low-bandwidth Network File System Presentation by Joseph Thompson.

Consistency con’t (more detail)

• Server issues read leases to clients accessing a file.– “The lease is a commitment on the part of the

server to notify the client of any modifications made to that file during the term of the lease”

• Files committed Atomically• “If multiple clients are writing the same file,

then the last one to close the file will win and overwrite changes from the others”

Page 7: A Low-bandwidth Network File System Presentation by Joseph Thompson.

Exploit cross-file similarities

• LBFS institutes a large client-side persistent file cache.

• When a process needs a new file it checks its file cache to see if it can reuse already downloaded segments of a file.

• When it writes data, it only sends chunks of data that are different from the servers.

Page 8: A Low-bandwidth Network File System Presentation by Joseph Thompson.

Indexing file chunks

• Uses the SHA-1 has function to hash file chunks.

• Assumes that there are no hashing collisions between different file chunks.

• The implication is that any chunk that hashes to the same index contains the same data.

• Using this method we can determine whether or not we need to send the file data by sending hashes of file chunks .

Page 9: A Low-bandwidth Network File System Presentation by Joseph Thompson.

Indexing Woes

• Imagine fixed chunk offsets (every 8 bytes).– If you insert one byte to the front of the file, all chunks

will have their bytes moved one over and all potential savings are lost.

• Rsync looks at two files with the same name and tries to do a file comparison to see which parts need to be resent– This method negates benefits of renamed files, files

that are build from other files, and files that have similar segments based on being written by the same application.

Page 10: A Low-bandwidth Network File System Presentation by Joseph Thompson.

LBFS’ Solution

• Divide files into chunks dynamically with each modification

• Reads every overlapping 48-byte region using the Rabin fingerprint algorithm to chose break-points (chunk boundaries).– Rabin fingerprint used because of its efficient

computation and its highly uniform distribution properties.

– Given the probability of the Rabin algorithm, each chunk size is estimated to be 8KB.

– In order to avoid inefficiencies, a min/max chunk sizes are enforced: 2KB/64KB.

Page 11: A Low-bandwidth Network File System Presentation by Joseph Thompson.

Example Explained

• Example shows how new chunks are created/destroyed based on file modifications.

Page 12: A Low-bandwidth Network File System Presentation by Joseph Thompson.

File Reads

• New RPC GETHASH function returns a vector (array) containing the hash values of all chunks in a file.

Page 13: A Low-bandwidth Network File System Presentation by Joseph Thompson.

File Writes• “LBFS uses temporary files to implement atomic updates. The

server first creates a unique temporary file, writes the temporary file, and only then atomically commits the contents to the real file being updated”

• Four new RPC Functions:– MKTMPFILE

• Creates a temporary file for use in atomic update – TMPWRITE

• Writes to the temp file on the server instead of the permanent one– CONDWRITE

• Includes a hash value the server can check and if the chunk needs to be written the server returns HASHNOTFOUND msg

– COMMITTMP• If no errors have occurred during any of the previous calls, committmp

merges the temporary file with the permanent version and updates the file chunks.

Page 14: A Low-bandwidth Network File System Presentation by Joseph Thompson.

File Write con’t

Page 15: A Low-bandwidth Network File System Presentation by Joseph Thompson.

Graphs explained

Page 16: A Low-bandwidth Network File System Presentation by Joseph Thompson.

Graphs con’t

Page 17: A Low-bandwidth Network File System Presentation by Joseph Thompson.

Graphs con’t

Page 18: A Low-bandwidth Network File System Presentation by Joseph Thompson.

Graphs Finished

Page 19: A Low-bandwidth Network File System Presentation by Joseph Thompson.

Paper’s Summary

• “In many situations, LBFS makes transparent remote file access a viable and less frustrating alternative to running interactive programs on remote machines”