1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.
-
date post
19-Dec-2015 -
Category
Documents
-
view
221 -
download
4
Transcript of 1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.
3
Frangipani
• Scalable file system built at SRC-DEC
• Published in SOSP’97
• Uses failure detection, Paxos, leases,…
• Two layers:– Petal: virtual disk from many “storage bricks”– Frangipani file system and lock service
4
Motivation
• Large-scale distributed file systems are hard to administer
• Hard to add/remove machines (servers)
• Hard to add/remove disks (storage space)
• Hard to manage set of current components
• Hard to manage locks
5
Petal: Distributed Virtual Disks
C. A. Thekkath and E. K. LeeSystems Research Center
Digital Equipment CorporationASPLOS’96
7
Petal Overview
• Petal provides virtual disks– Large (264 bytes), sparse virtual space
– Disk storage allocated on demand
– Accessible to all file servers over a network
• Virtual disks implemented by– Cooperating CPUs executing Petal software
– Ordinary disks attached to the CPUs
– A scalable interconnection network
9
Global State Management
• Uses Paxos– Global state is replicated across all servers
• Metadata (disk allocation) only!
– Consistent in the face of server and network failures
– A majority is needed to update the global state– Any server can be added/removed in the
presence of failed servers
10
Key Petal Features
• Storage is incrementally expandable• Data is optionally mirrored over multiple servers• Metadata is replicated on all servers• Transparent addition and deletion of servers• Supports read-only snapshots of virtual disks• Client API looks like block-level disk device• Throughput
– Scales linearly with additional servers– Degrades gracefully with failures
11
Frangipani: A Scalable Distributed File System
C. A. Thekkath, T. Mann, and E. K. LeeSystems Research Center
Digital Equipment CorporationSOSP’97
12
Frangipani Features
• Behaves like a local file system– Multiple machines cooperatively manage
a Petal disk– Users on any machine see a consistent
view of data
• Exhibits good performance, scaling, and load balancing
• Easy to administer
13
Ease of Administration
• Frangipani machines are modular– Can be added and deleted transparently
• Common free space pool – Users don’t have to be moved
• Automatically recovers from crashes
• Consistent backup without halting the system
14
Frangipani Structure
• Distributed file system built atop a shared virtual disk (Petal)
• Frangipani servers do not communicate with each other directly– Only through Petal
• Simplifies managemant– Addition/removal of servers
17
Components of Frangipani
• File system core– Implements the file system (FS) interface– Uses FS mechanisms (buffer cache etc.)– Exploits Petal’s large virtual space
• Locks with leases– Granted for finite time, must be refreshed
• Write-ahead redo log– Performance optimization + failure recovery
18
Locks• Multiple reader/single writer• Granularity: lock per entire file or directory• A lock is really a lease – it expires
– After 30 seconds in their implementation
• Assumption?
19
Using Locks
• Frangipani servers are clients of lock service
• Dirty data is written to disk (Petal) before the lock is given to another machine
• Locks are cached by servers that acquire them– Soft state: no need to explicitly release locks– Uses lease timeouts for lock recovery
20
Distributed Lock Management
• A set of lock servers collaboratively manage locks– Run Paxos among them– Consensus on global state: set of locks each server is
responsible for, list of current lock servers, lock allocation to clients
– Need majority to make progress• Using leases requires assuming loosely
synchronized clocks– Expired leases should not be accepted
• Why Paxos then?– To overcome network partitions
21
Logging
• Frangipani uses a write ahead redo log for metadata– Log records are kept on Petal (why?)
• Data is written to Petal – On sync, fsync, or every 30 seconds– On lock revocation or when the log wraps
• Each server has a separate log– Reduces contention– Independent recovery
22
Recovery
• Recovery initiated due to failure detection– By the lock service– Failure detection implemented using heartbeats
• Any server can recover operations for a failed server– Log is available via Petal