Simplifying Wide-Area Application Development with WheelFS
description
Transcript of Simplifying Wide-Area Application Development with WheelFS
![Page 1: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/1.jpg)
Simplifying Wide-Area Application Development with
WheelFS
Jeremy Stribling
In collaboration with Jinyang Li, Frans Kaashoek, Robert Morris
MIT CSAIL & New York University
![Page 2: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/2.jpg)
2
Resources Spread over Wide-Area Net
PlanetLabGoogle datacenters
China grid
![Page 3: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/3.jpg)
3
Grid Computations Share Data
Nodes in a distributed computation share:– Program binaries– Initial input data– Processed output from one node as
intermediary input to another node
![Page 4: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/4.jpg)
4
So Do Users and Distributed Apps
• Apps aggregate disk/computing at hundreds of nodes
• Example apps– Content distribution networks (CDNs)– Backends for web services – Distributed digital research library
All applications need distributed storage
![Page 5: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/5.jpg)
5
State of the Art in Wide-Area Storage
• Existing wide-area file systems are inadequate– Designed only to store files for users– E.g., No hundreds of nodes can write files to the same dir– E.g., Strong consistency at the cost of availability
• Each app builds its own storage!– Distributed Hash Tables (DHTs)– ftp, scp, wget, etc.
![Page 6: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/6.jpg)
6
Current Solutions
Usual drawbacks:– All data flows through one node – File systems are too transparent
• Mask failures• Incur long delays
Node NodeNode
NodeNode Node
Testbed/Grid
CentralFile Server
Copyfoo File
foo
![Page 7: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/7.jpg)
7
If We Had a Good Wide-Area FS?
Wide-area FS
• FS makes building apps simpler
CDN
write(“/a/foo”)
CDN
write(“/a/bar”)
Filebar
Filefoo
read(“/a/foo”)
![Page 8: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/8.jpg)
8
Why Is It Hard?
• Apps care a lot about performance
• WAN is often the bottleneck– High latency, low bandwidth– Transient failures
• How to give app control without sacrificing ease of programmability?
![Page 9: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/9.jpg)
9
Our Contribution: WheelFS
• Suitable for wide-area apps
• Gives app control through cues over:– Consistency vs. availability tradeoffs– Data placement– Timing, reliability, etc.
• Prototype implementation
![Page 10: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/10.jpg)
10
Talk Outline
• Challenges & our approach
• Basic design
• Application control
• Running a Grid computation over WheelFS
![Page 11: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/11.jpg)
11
What Does a File System Buy You?
• Re-use existing software
• Simplify the construction of new applications– A hierarchical namespace– A familiar interface– Language-independent usage
![Page 12: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/12.jpg)
12
Why Is It Hard To Build a FS on WAN?
• High latency, low bandwidth – 100s ms instead of 1s ms latency– 10s Mbps instead of 1000s Mbps bandwidth
• Transient failures are common – 32 outages over 64 hours across 132 paths [Andersen’01]
![Page 13: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/13.jpg)
13
What If Grid Uses AFS over WAN?
GRID node(AFS client)
a.dat
Potentially unnecessary data transfer
a.dat
Blocks forever under failure despite
available cached copy
GRID node(AFS client)
GRID node(AFS client) AFS server
![Page 14: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/14.jpg)
14
Design Challenges
• High latency – Store data close to where it is needed
• Low wide-area bandwidth – Avoid wide-area communication if possible
• Transient failures are common – Cannot block all access during partial failures
Only applications have the needed information!
![Page 15: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/15.jpg)
15
WheelFS Gives Apps Control
AFS,NFS,
GFSWheelFS
Goal Total network
transparency
Application control
Can apps control how to handle failures? X
Can apps control data placement? X
![Page 16: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/16.jpg)
16
WheelFS: Main Ideas
• Apps control– Apps embed semantic cues to inform FS
about failure handling, data placement ...
• Good default policy– Write locally, strict consistency
![Page 17: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/17.jpg)
17
Talk Outline
• Challenges & our approach
• Basic design
• Application control
• Running a Grid computation over WheelFS
![Page 18: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/18.jpg)
18
File Systems 101• Basic FS operations:
– Name resolution: hierarchical name flat id
– Data operations: read/write file data
– Namespace operations: add/remove files or dirs
open(“/wfs/a/foo”, …) id: 1235
read(1235, …)
write(1235, …)
mkdir(“/wfs/b”, …)
![Page 19: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/19.jpg)
19
File Systems 101
a: 246 b: 357
id: 0
id: 135
file2: 468file3: 579
file1: 135
id: 357id: 246
![Page 20: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/20.jpg)
20
Distribute a FS across nodes
a: 246 b: 357
file2: 468file3: 579
file1: 135
id: 0
id: 357
id: 135
id: 246
NodeNode
Node
Node
NodeMust locate files/dirs using ids
Must automatically configurenode membership
Must keep files versioned to distinguish new and old
![Page 21: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/21.jpg)
21
Basic Design of WheelFS
Node653
Node076
Node150 Node
554
Node402
Node257
id135
135
135135
id135v2
id135v3
135v2
135v2
135v3
135v3
Consistency Servers
076 150257 402554 653
![Page 22: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/22.jpg)
22
Default Behavior:Write Locally, Strict
Consistency• Write locally: Store newly created files at the
writing node–Writes are fast –Transfer data lazily for reads when necessary
• Strict consistency: data behaves as in a local FS– Once new data is written and the file is closed, the next open will see the new data
![Page 23: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/23.jpg)
23
Write Locally
Node653 Node
076
Node150
Node554
Node402
Node257
Createfoo/bar
1. Choose an ID
2. Create dir entry
3. Write local file
550
Dir209(foo)
File550(bar) bar = 550
Readfoo/bar
![Page 24: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/24.jpg)
24
Strict Consistency• All writes go to the same node
• All reads do tooNode653 Node
076
Node150
Node554
Node402
Node257
WriteFile 135
File135
WriteFile 135
File135v2
File 135?
![Page 25: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/25.jpg)
25
Talk Outline
• Challenges & our approach
• Basic design
• Application control
• Running a Grid computation over WheelFS
![Page 26: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/26.jpg)
26
WheelFS Gives Apps Control with Cues
• Apps want to control consistency, data placement ...• How? Embed cues in path names
– Flexible and minimal interface change
Coarse-grained:Cues apply recursively over an entire subtree of files
/wfs/cache/.cue/a/b/ /wfs/cache/a/b/.cue/foo
Fine-grained:Cues can apply to a single file
![Page 27: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/27.jpg)
27
Eventual Consistency: Reads
File 135?
Node653 Node
076
Node150
Node554
Node402
Node257
File135
Cached135
• Read any version of the file you can find• In a given time limit
![Page 28: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/28.jpg)
28
Eventual Consistency: Writes• Write to any replica of the file
Node653 Node
076
Node150
Node554
Node402
Node257
WriteFile 135
File135
135
WriteFile 135
File135v2
135v2
![Page 29: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/29.jpg)
29
Handle Read Hotspots
Node653 Node
076
Node150
Node554
Node402
Node257
Read file 135
File135
Cached135 Cached
135
076653
Chunk
Chunk
Cached135
1. Contact node
2. Receive list
3. Get chunks
076653
076554653
Chunk
Read file 135
File135
![Page 30: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/30.jpg)
30
WheelFS Cues
Name Purpose
Eventual-Consistency
Control whether reads
must see fresh data, and whether writes must be serialized
MaxTime= Specify time limit for operations
Site=, Node= Hint which node or group of nodes a file should be stored
Controlconsistency
Hint about data placement
Other types of cues: Durability, system information, etc.
HotSpot This file will be read simultaneously by many nodes, so use p2p caching
Large reads
![Page 31: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/31.jpg)
31
Example Use of Cues: Content Distribution Networks
• CDNs prefer availability over consistency
wfsnode
wfsnode wfs
node
wfsnode
ApacheCaching
Proxy
ApacheCaching
Proxy
ApacheCaching
Proxy
ApacheCaching
Proxy If $url exists in cache dir read $url from WheelFSelse get page from web server store page in WheelFS
One line change in Apache config file:/wfs/cache/$URL
blocks under failure with default strong consistency
![Page 32: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/32.jpg)
32
Example Use of Cues: CDN
• Apache proxy handles potentially stale files well – The freshness of cached web pages can be determined from saved HTTP headers
Cache dir: /wfs/cache/ .EventualConsistency
Tells WheelFS to read a cached file even when
the corresponding file server cannot be contacted
Tells WheelFS to write the file data anywhere even when the corresponding file server cannot be contacted
/.HotSpot
Tells WheelFS to read data from the nearest client cache it can find
![Page 33: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/33.jpg)
33
Example Use of Cues:BLAST Grid Computation
• DNA alignment tool run on Grids
• Copy separate DB portions and queries to many nodes
• Run separate computations
• Later fetch and combine results
• Read binary using .HotSpot
• Write output using .EventualConsistency
![Page 34: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/34.jpg)
34
Talk Outline
• Challenges & our approach
• Basic design
• Application control
• Running a Grid computation over WheelFS
![Page 35: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/35.jpg)
35
Experiment Setup
• Up to 16 nodes run WheelFS on Emulab– 100Mbps access links– 12ms delay– 3 GHz CPUs
• “nr” protein database (673 MB), 16 partitions
• 19 queries sent to all nodes
![Page 36: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/36.jpg)
36
BLAST Achieves Near-Ideal Speedup on WheelFS
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14 16
BLAST on WheelFS
Ideal speedup
Number of nodes
Sp
eed
up
![Page 37: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/37.jpg)
37
Related Work
• Cluster FS: Farsite, GFS, xFS, Ceph
• Wide-area FS: JetFile, CFS, Shark
• Grid: LegionFS, GridFTP, IBP
• POSIX I/O High Performance Computing Extensions
![Page 38: Simplifying Wide-Area Application Development with WheelFS](https://reader035.fdocuments.net/reader035/viewer/2022062422/568140fc550346895dacc6ae/html5/thumbnails/38.jpg)
38
Conclusion
• A WAN FS simplifies app construction
• FS must let app control data placement & consistency
• WheelFS exposes such control via cues
Building appsis easy with WheelFS