A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie...

19
A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky Intel Research Pittsburgh

Transcript of A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie...

Page 1: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

A Locality Preserving Decentralized File System

Jeffrey Pang, Suman Nath, Srini SeshanCarnegie Mellon University

Haifeng Yu, Phil Gibbons, Michael KaminskyIntel Research Pittsburgh

Page 2: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

Project Intro

• Defragmenting DHT: data layout for• Improved availability for

entire tasks• Amortize data lookup

latency

Current DHT Data Layout:

random placement

Defragmented DHTData Layout:

sequential placement

objects in two tasks:

Page 3: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

Background

150-210 211-400 401-513

EXISTING DHT STORAGE SYSTEMS

• Each server responsible for pseudo-random range of ID space• Object are given pseudo-random IDs

800-999

324

987

160

Page 4: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

Preserving Object Locality

• Motivation• Fate sharing: all objects in a single operation are

more likely to be available at once• Effective caching/prefetching: servers I’ve

contacted recently are more likely to have what I want next

• Design options:• Namespace locality (e.g., filesystem hierarchy)• Dynamic clustering (e.g., based on observed

access patterns)

Page 5: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

Is Namespace Locality Good Enough?

Page 6: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

Encoding Object Names

Bill

Bob

Docs1

1

2

6 1 0 …

6 1 1 …

6 1 2 …

userid path encode blockid6

7

570-600 601-660 661-700

bid

bid

bid

Page 7: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

Dynamic Load Balancing• Motivation

• Hash function is no longer uniform

• Uniform ID assignments to nodes leads to load imbalance

• Design options:• Simple item balancing

(MIT)• Mercury (CMU)

1 39 77 115 153 191 229 267 305 343 381 419 457 495 533 571 609 647 685 723 761 799 837 875 913 951 989

Load balance with 1024nodes using the Harvard trace

stor

age

load

node number

Page 8: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

Results

• How much improvement in availability and lookup latency can we expect?

• What is the overhead of load balancing?• Setup

• Trace-based simulation with Harvard trace• File blocks named using our encoding scheme• Same availability calculation as before• Clients keep open connections to 1-100 of the most

recently contacted data servers• 1024 servers

Page 9: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

Potential Reduction in Lookups

1

10

100

1 4 10 100client cache size (# servers)

% accesses requiring lookup

(logscale)

random ordered

Page 10: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

Potential Availability ImprovementRandom (expected)

Ordered (unif)Optimal

• Encoding has nearly identical failure prob as the “alphabetical” encoding (differs by ~0.0002)

Page 11: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

Data Migration Overhead

Page 12: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

Summary

• Designed a DHT-based filesystem that preserves namespace locality

• Potentially improves availability and lookup latency by an order of magnitude

• Load balancing overhead is low

• Todo: completing actual implementation, evaluation for NSDI

Page 13: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

Extra Slides

Page 14: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

Related Work

• Namespace Locality:• Cylinder group allocation [FFS]• Co-locating data+meta-data [C-FFS]• Isolating user data in clusters [Archipelago]• Namespace flattening in object based storage [Self-*]

• Load Balancing + Data Indirection:• DHT Item Balancing [SkipNets, Mercury]• Data Indirection [Total Recall]

Page 15: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

Encoding Object Names

• Leverage:• Large key space (amortized cost over wide-area is minimal)• Workload properties (e.g., 99% of the time directory depth < 12)

• Corner cases:• Depth or width overflow: use 1 bit to signal overflow region and just use SHA1(filepath)

SHA-1Hash

SHA1(data)data

Traditional DHT key encoding:

Example

intel.pkey /home/bob/Mail/INBOX0000 . 0001 . 0004 . 0003 . 0000 . …

b-tree-like8kb blocks

hash(data)

SHA1(pkey) dir1 block no.dir2 ... file ver. hash

Namespace locality preserving encoding:

160 bits 16 bits 16 bits 16 bits 64 bits 64 bits

480 bits

depth: 12 width: 65k petabytes

160 bits

Page 16: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

Handling Temporary Resource Constraints

• Drastic storage distribution changes can cause frequent data movement

• Node storage can be temporarily constrained (i.e., no more disk space)

• Solution:• Lazy data movement

• Node responsible for a key keeps a pointer to actual data blocks

• Data blocks can be stored anywhere in system

Page 17: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

Handling Temporary Resource Constraints

rep1rep2rep3

rep1rep2rep3

rep1rep2rep3

WRITE

data

data

data

NO SPACE!

NO SPACE!

Page 18: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

Load Balancing Algorithm• Basic Idea:

• Contact a random node in the ring• If myLoad > delta*hisLoad (or vis versa), the lighter

node changes its ID to move before the heavier node.• Heavy node’s load splits in two.• Node load within factor of 4 in O(log(n)) steps

•Mercury optimizations:•Continuous sampling of load around the ring•Use estimated load histogram to do informed probes

Page 19: A Locality Preserving Decentralized File System Jeffrey Pang, Suman Nath, Srini Seshan Carnegie Mellon University Haifeng Yu, Phil Gibbons, Michael Kaminsky.

Load Balance Over Time