A LOW-BANDWIDTH NETWORK FILE SYSTEM
description
Transcript of A LOW-BANDWIDTH NETWORK FILE SYSTEM
![Page 1: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/1.jpg)
A LOW-BANDWIDTHNETWORK FILE SYSTEM
A. Muthitacharoen, MIT
B. Chen, MIT
D. Mazieres, New York U
![Page 2: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/2.jpg)
Highlights• A file system for slow or wide-area networks• Exploits similarities between files or versions of the
same file– Avoids sending data that can be found in the
server’s file system or the client’s cache• Also uses conventional compression and caching• Requires 90% less bandwidth than traditional
network file systems
![Page 3: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/3.jpg)
Working on slow networks
• Can work with local copies– Must then worry about update conflicts
• Can use remote login– Only for text-based applications
• Should use instead a low-bandwidth file system– Better than remote login– Must then deal with issues like big autosaves
blocking the editor for the duration of transfer
![Page 4: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/4.jpg)
LBFS (I)
• Client keeps all recently accessed files in its cache
• LBFS exploits cross file similarities to reduce data transfers between client and server– File server divides the file it stores into
variable-size chunks – Indexes these chunks by their hash values
![Page 5: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/5.jpg)
LBFS (II)
• When transferring a file between the client and the server– LBFS identifies the chunks the receiving side
already has– Only transmits the other chunks
• Provides close-to-open consistency– Same as Coda (and newer versions of NFS)
![Page 6: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/6.jpg)
Related work (I)
• AFS used callbacks to reduce network traffic • Leases are callbacks with expiration date• Coda supports slow networks and disconnected
operations through optimistic replication• Bayou and OceanStore investigate conflict
resolution for optimistic updates• Lee et al. have extended Coda to support
operation-based updates
![Page 7: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/7.jpg)
Related Work (II)
• Spring and Wetherall use large client and server caches to eliminate redundant network traffic:– Can send address of data already in cache of
receiver rather than data themselves• Rsync exploits similarities between directory
trees containing similar subtrees
![Page 8: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/8.jpg)
LBFS Design
• Key ideas:– Close-to-open consistency– Have a large persistent file cache at client
• IDE disks are now large enough for that– Exploits similarities between files (and file
versions)• Only transmits data chunks containing
new data
![Page 9: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/9.jpg)
Identifying Similar Data Chunks
• LBFS uses collision-resistant property of SHA-1 hash function – Assumes no hash collisions
• Central challenge is – Keeping the index a reasonable size– Dealing with shifting offsets
![Page 10: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/10.jpg)
The Case against Fixed-Size Blocks
File F
File F afteran insertion
The two files do not have a single block in common
![Page 11: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/11.jpg)
The Case against “Diffs”
• “Diffs” are used by several UNIX utilities– Computed by comparing contents of file with
another file– Very efficient
• Must know which file(s) to compare to• Difficult in a file system
– Obscure naming of editor buffer files and other temp files
![Page 12: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/12.jpg)
Dividing Files into Chunks
• LBFS– Only looks for non-overlapping chunks in files– Sets chunk boundaries based on file contents
• To divide a file into chunks, LBFS– Examines every (overlapping) 48-byte region
of the file– Uses Rabin’s fingerprints to select
boundary regions or breakpoints
![Page 13: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/13.jpg)
Using Rabin’s Fingerprints
• Polynomial representation of data in 48-byte region modulo an irreducible polynomial
• Boundary regions have the 13 least significant bits of their fingerprint equal to an arbitrary predefined value– Assuming random data, expected chunk size
is 213 = 8K• Method is reasonably fast
![Page 14: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/14.jpg)
How it works
A file X partitioned into three chunks
Same file X after one insertion inside middle chunk
Chunk boundaries are arbitrary and identifiedby the content of their boundary regions
New Chunk
![Page 15: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/15.jpg)
Another way to look at it (I)
• Old File:
Four score and seven years ago our fathers brought forth,a new country, conceived in liberty,and dedicated to the proposition that "all men are created equal."
![Page 16: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/16.jpg)
Another way to look at it (II)
• New File:
Four score and seven years ago our fathers brought forth,upon this continent,a new nation, conceived in liberty,and dedicated to the proposition that "all men are created equal"
![Page 17: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/17.jpg)
Another way to look at it (III)
• Identify Chunks:
Four score and seven years ago our fathers brought forth,upon this continent,a new nation, conceived in liberty,and dedicated to the proposition that "all men are created equal"
![Page 18: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/18.jpg)
Another way to look at it (IV)
• Send back to server the modified chunk:
upon this continent,a new nation, conceived in liberty,in compressed form
![Page 19: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/19.jpg)
Pathological cases
• Having too many chunks require too much aggregate bandwidth
• Very large chunks would be too difficult to send in a single RPC
• Chunk sizes must be between 2K and 64K– May have to artificially insert chunk boundaries
when files are full of repeated sequences
![Page 20: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/20.jpg)
The chunk database (I)• The chunk database
– Indexes chunks by first 64 bits of SHA-1 hash– Maps keys to (file,offset, count) triples
• How to keep this database up to date?– Must update it whenever file is updated– Can still have problems with local updates at
server site– Crashes can corrupt database contents
![Page 21: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/21.jpg)
The chunk database (II)
• Best solution is to tolerate inconsistencies:– LBFS recomputes hash of any data chunk
before using it– Recomputed value is also used to detect
collisions• Very improbable but still possible
![Page 22: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/22.jpg)
Protocol
• NFS with some changes:– Uses leases to implement close-to-open
consistency (callbacks with limited lifetime)– Practices aggressive pipelining of RPC calls– Compresses all RPC traffic
![Page 23: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/23.jpg)
Leases
• Leases are callbacks with– A limited lifetime (a few seconds) – A guarantee that server will not accept updates
during lease lifetime without first notifying client• Advantages:
– No problems with lost callbacks– Automatically expire when server crashes
![Page 24: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/24.jpg)
An example (I)
Time
Server
Alice
Requests alease
During duration of lease
Alice controls the file
Must now
renew it
![Page 25: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/25.jpg)
An example (II)
Time
Server
Alice
Got alease
During duration of lease
Alice controls the file
Bob
Also requestsa lease
![Page 26: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/26.jpg)
An example
• When server receives Bob's request,– It will try to contact Alice and break the lease
• Alice will then flush all the blocks she had updated and invalidate the contents of her cache
– If Alice does not answer, server must wait until Alice's lease expires
![Page 27: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/27.jpg)
File Consistency
• LBFS– Caches entire files– Implements close-to-open consistency
• Client– Gets a lease first time a file is opened for read– Renews expired leases by requesting file
attributes– Will then check if cached copy is still current
![Page 28: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/28.jpg)
Reads and writes
• Use additional calls not in NFS– GETHASH for reads– MKTMPFILE,and three other for write
• Server ensures atomicity of updates bywriting them first into a temporary file
![Page 29: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/29.jpg)
Security
• More of an issue than in a well-controlled LAN• Uses SFS security infrastructure
– Servers have public keys and authenticate themselves to clients
• New Problem:– All LBFS users can check whether file system
contains a specific chunk of data– Requires observing subtle timing differences
![Page 30: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/30.jpg)
Implementation
• Some problems with the way NFS allocatesi-node numbers
![Page 31: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/31.jpg)
Evaluation (I)
• Compared upstream and downstream bandwidth of LBFS with those of– CIFS (Common Internet File System)– NFS– AFS– LBFS with leases and gzip but w/o chunking
• Downstream traffic benefits most of chunking
![Page 32: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/32.jpg)
Evaluation (II)
First four bars of each workload show upstream bandwidth, second four downstream bandwidth
![Page 33: A LOW-BANDWIDTH NETWORK FILE SYSTEM](https://reader030.fdocuments.net/reader030/viewer/2022020717/56814385550346895db0018f/html5/thumbnails/33.jpg)
Conclusions
• LBFS bandwidth usage is one order of magnitude less than conventional file systems