Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

68
Pastry Scalable, decentralized object location and routing for large- scale peer-to-peer systems Peter Druschel, Rice University Antony Rowstron, Microsoft Research Cambridge, UK Modified and Presented by: JinYoung You 1

description

Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems. Peter Druschel, Rice University Antony Rowstron, Microsoft Research Cambridge, UK Modified and Presented by: JinYoung You. Outline. Background Pastry Pastry proximity routing Related Work - PowerPoint PPT Presentation

Transcript of Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Page 1: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

PastryScalable, decentralized object location and routing for large-scale peer-to-peer systems

Peter Druschel, Rice University

Antony Rowstron, Microsoft Research Cambridge, UK

Modified and Presented by:JinYoung You

1

Page 2: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Outline

• Background• Pastry• Pastry proximity routing• Related Work• Conclusions

2

Page 3: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Background

Peer-to-peer systems

• distribution• decentralized control• self-organization• symmetry (communication, node

roles)

3

Page 4: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Common issues

• Organize, maintain overlay network– node arrivals– node failures

• Resource allocation/load balancing• Resource location• Network proximity routing

IDea: provide a generic p2p substrate

4

Page 5: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Architecture

TCP/IP

Pastry

Network storage

Event notification

Internet

p2p substrate (self-organizingoverlay network)

p2p application layer?

5

Page 6: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Structured p2p overlays

One primitive:route(M, X): route message M to the live node with nodeID closest to key X

• nodeIDs and keys are from a large, sparse id space

6

Page 7: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Distributed Hash Tables (DHT)

k6,v6

k1,v1

k5,v5

k2,v2

k4,v4

k3,v3

nodes

Operations:insert(k,v)lookup(k)

P2P overlay networ

k

P2P overlay networ

k

• p2p overlay maps keys to nodes• completely decentralized and self-organizing• robust, scalable

7

Page 8: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Why structured p2p overlays?

• Leverage pooled resources (storage, bandwidth, CPU)

• Leverage resource diversity (geographic, ownership)

• Leverage existing shared infrastructure

• Scalability• Robustness• Self-organization

8

Page 9: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Outline

• Background• Pastry• Pastry proximity routing• Related Work• Conclusions

9

Page 10: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: Object distribution

objID

Consistent hashing [Karger et al. ‘97]

128 bit circular id space

nodeIDs (uniform random)

objIDs (uniform random)

Invariant: node with numerically closest nodeID maintains object

nodeIDs

O2128-1

11

Page 11: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: Object insertion/lookup

X

Route(X)

Msg with key X is routed to live node with nodeID closest to X

Problem: complete

routing table not feasible

O2128-1

12

Page 12: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: Routing

Tradeoff

• O(log N) routing table size• O(log N) message forwarding steps

13

Page 13: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: Routing table (# 65a1fcx)0x

1x

2x

3x

4x

5x

7x

8x

9x

ax

bx

cx

dx

ex

fx

60x

61x

62x

63x

64x

66x

67x

68x

69x

6ax

6bx

6cx

6dx

6ex

6fx

650x

651x

652x

653x

654x

655x

656x

657x

658x

659x

65bx

65cx

65dx

65ex

65fx

65a0x

65a2x

65a3x

65a4x

65a5x

65a6x

65a7x

65a8x

65a9x

65aax

65abx

65acx

65adx

65aex

65afx

log16 Nrows

Row 0

Row 1

Row 2

Row 3

14

Page 14: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: Routing

Properties• log16 N steps • O(log N) state

d46a1c

Route(d46a1c)

d462ba

d4213f

d13da3

65a1fc

d467c4d471f1

15

Page 15: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: Leaf sets

Each node maintains IP addresses of the nodes with the L/2 numerically closest larger and smaller nodeIDs, respectively. • routing efficiency/robustness • fault detection (keep-alive)• application-specific local coordination 16

Page 16: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: Routing procedureif (destination is within range of our leaf set)

forward to numerically closest memberelse

let l = length of shared prefix let d = value of l-th digit in D’s addressif (Rl

d exists)

forward to Rld

else forward to a known node that (a) shares at least as long a prefix(b) is numerically closer than this node

17

Page 17: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: Performance

Integrity of overlay/ message delivery:

• guaranteed unless L/2 simultaneous failures of nodes with adjacent nodeIDs

Number of routing hops:• No failures: < log16 N expected, 128/b

+ 1 max• During failure recovery:

– O(N) worst case, average case much better

18

Page 18: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: Self-organization

Initializing and maintaining routing tables and leaf sets

• Node addition• Node departure (failure)

19

Page 19: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: Node addition

d46a1c

Route(d46a1c)

d462ba

d4213f

d13da3

65a1fc

d467c4d471f1

New node: d46a1c

20

Page 20: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Node departure (failure)

Leaf set members exchange keep-alive messages

• Leaf set repair (eager): request set from farthest live node in set

• Routing table repair (lazy): get table from peers in the same row, then higher rows

21

Page 21: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: Experimental results

Prototype• implemented in Java• emulated network• deployed testbed (currently ~25

sites worldwide)

22

Page 22: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: Average # of hops

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

1000 10000 100000

Ave

rag

e n

um

ber

of

ho

ps

Number of nodes

Pastry Log(N)

L=16, 100k random queries 23

Page 23: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: # of hops (100k nodes)

0.0000 0.0006 0.0156

0.1643

0.6449

0.1745

0.00000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0 1 2 3 4 5 6

Number of hops

Pro

bab

ilit

y

L=16, 100k random queries 24

Page 24: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: # routing hops (failures)

L=16, 100k random queries, 5k nodes, 500 failures

2.73

2.96

2.74

2.6

2.65

2.7

2.75

2.8

2.85

2.9

2.95

3

No Failure Failure After routing table repair

Ave

rag

e h

op

s p

er lo

oku

p

25

Page 25: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Outline

• Background• Pastry• Pastry proximity routing• Related Work• Conclusions

26

Page 26: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: Proximity routing

Assumption: scalar proximity metric • e.g. ping delay, # IP hops• a node can probe distance to any

other node

Proximity invariant: Each routing table entry refers to a node close to the

local node (in the proximity space), among all nodes with the appropriate nodeID prefix.

27

Page 27: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: Routes in proximity space

d46a1c

Route(d46a1c)

d462ba

d4213f

d13da3

65a1fc

d467c4d471f1

NodeID space

d467c4

65a1fcd13da3

d4213f

d462ba

Proximity space

28

Page 28: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: Distance traveled

0.8

0.9

1

1.1

1.2

1.3

1.4

1000 10000 100000Number of nodes

Rel

ativ

e D

ista

nce

Pastry

Complete routing table

L=16, 100k random queries, Euclidean proximity space29

Page 29: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: Locality properties1) Expected distance traveled by a message

in the proximity space is within a small constant of the minimum

2) Routes of messages sent by nearby nodes with same keys converge at a node near the source nodes

3) Among k nodes with nodeIDs closest to the key, message likely to reach the node closest to the source node first

30

Page 30: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

d467c4

65a1fcd13da3

d4213f

d462ba

Proximity space

Pastry: Node addition

New node: d46a1c

d46a1c

Route(d46a1c)

d462ba

d4213f

d13da3

65a1fc

d467c4d471f1

NodeID space31

Page 31: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry delay

0

500

1000

1500

2000

2500

0 200 400 600 800 1000 1200 1400

Distance between source and destination

Dis

tan

ce t

rave

led

by

Pas

try

mes

sag

e

Mean = 1.59

GATech top., .5M hosts, 60K nodes, 20K random messages 32

Page 32: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: API

• route(M, X): route message M to node with nodeID numerically closest to X

• deliver(M): deliver message M to application

• forwarding(M, X): message M is being forwarded towards key X

• newLeaf(L): report change in leaf set L to application

33

Page 33: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: Security

• Secure nodeID assignment• Secure node join protocols• Randomized routing• Byzantine fault-tolerant leaf set

membership protocol

34

Page 34: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Outline

• Background• Pastry• Pastry proximity routing• Related Work• Conclusions

35

Page 35: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Pastry: Related work

• Chord [Sigcomm’01]• CAN [Sigcomm’01]• Tapestry [TR UCB/CSD-01-1141]

• PAST [SOSP’01]• SCRIBE [NGC’01]

36

Page 36: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Outline

• Background• Pastry• Pastry proximity routing• Related Work• Conclusions

37

Page 37: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Conclusions

• Generic p2p overlay network• Scalable, fault resilient, self-organizing,

secure• O(log N) routing steps (expected)• O(log N) routing table size• Network proximity routing

• For more informationhttp://www.cs.rice.edu/CS/Systems/Pastry

38

Page 38: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Thanks

• Any Questions?

39

Page 39: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

40

Page 40: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

41

Page 41: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Outline

• PAST• SCRIBE

42

Page 42: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

PAST: Cooperative, archival file storage and distribution

• Layered on top of Pastry• Strong persistence• High availability• Scalability• Reduced cost (no backup)• Efficient use of pooled resources

43

Page 43: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

PAST API

• Insert - store replica of a file at k diverse storage nodes

• Lookup - retrieve file from a nearby live storage node that holds a copy

• Reclaim - free storage associated with a file

Files are immutable

44

Page 44: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

PAST: File storage

fileID

Insert fileID

45

Page 45: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

PAST: File storage

Storage Invariant: File “replicas” are stored on k nodes with nodeIDsclosest to fileID

(k is bounded by the leaf set size)

fileID

Insert fileID

k=4

46

Page 46: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

PAST: File Retrieval

fileID file located in log16 N steps (expected)

usually locates replica nearest client C

Lookup

k replicasC

47

Page 47: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

PAST: Exploiting Pastry

• Random, uniformly distributed nodeIDs – replicas stored on diverse nodes

• Uniformly distributed fileIDs – e.g. SHA-1(filename,public key, salt)– approximate load balance

• Pastry routes to closest live nodeID– availability, fault-tolerance

48

Page 48: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

PAST: Storage management

• Maintain storage invariant• Balance free space when global

utilization is high– statistical variation in assignment of

files to nodes (fileID/nodeID)– file size variations– node storage capacity variations

• Local coordination only (leaf sets)

49

Page 49: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Experimental setup

• Web proxy traces from NLANR– 18.7 Gbytes, 10.5K mean, 1.4K median,

0 min, 138MB max• Filesystem

– 166.6 Gbytes. 88K mean, 4.5K median, 0 min, 2.7 GB max

• 2250 PAST nodes (k = 5)– truncated normal distributions of node

storage sizes, mean = 27/270 MB

50

Page 50: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Need for storage management

• No diversion (tpri = 1, tdiv = 0):

– max utilization 60.8% – 51.1% inserts failed

• Replica/file diversion (tpri = .1, tdiv = .05):– max utilization > 98%– < 1% inserts failed

51

Page 51: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

PAST: File insertion failures

0

524288

1048576

1572864

2097152

0 20 40 60 80 100Global Utilization (%)

Fil

e s

ize

(B

yte

s)

0%

5%

10%

15%

20%

25%

30%

Fa

ilu

re r

ati

o

Failed insertion

Failure ratio

52

Page 52: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

PAST: Caching

• Nodes cache files in the unused portion of their allocated disk space

• Files caches on nodes along the route of lookup and insert messages

Goals:• maximize query xput for popular

documents• balance query load• improve client latency

53

Page 53: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

PAST: Caching

fileID

Lookup topicID54

Page 54: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

PAST: Caching

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60 80 100

Utilization (%)

Glo

bal C

ache

Hit

Rat

e

0

0.5

1

1.5

2

2.5

Ave

rage

num

ber

of r

outin

g ho

ps

GD-S: Hit RateLRU : Hit RateGD-S: # HopsLRU: # HopsNone: # Hops

None: # Hops

LRU: Hit RateGD-S : Hit Rate

LRU: # Hops

GD-S: # Hops

55

Page 55: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

PAST: Security

• No read access control; users may encrypt content for privacy

• File authenticity: file certificates• System integrity: nodeIDs, fileIDs

non-forgeable, sensitive messages signed

• Routing randomized

56

Page 56: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

PAST: Storage quotas

Balance storage supply and demand• user holds smartcard issued by

brokers– hides user private key, usage quota– debits quota upon issuing file certificate

• storage nodes hold smartcards– advertise supply quota– storage nodes subject to random audits

within leaf sets

57

Page 57: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

PAST: Related Work

• CFS [SOSP’01]• OceanStore [ASPLOS 2000]• FarSite [Sigmetrics 2000]

58

Page 58: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Outline

• PAST• SCRIBE

59

Page 59: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

SCRIBE: Large-scale, decentralized multicast

• Infrastructure to support topic-based publish-subscribe applications

• Scalable: large numbers of topics, subscribers, wide range of subscribers/topic

• Efficient: low delay, low link stress, low node overhead

60

Page 60: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

SCRIBE: Large scale multicast

topicID

Subscribe topicID

Publish topicID

61

Page 61: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Scribe: Results

• Simulation results• Comparison with IP multicast: delay,

node stress and link stress• Experimental setup

– Georgia Tech Transit-Stub model– 100,000 nodes randomly selected out of

.5M– Zipf-like subscription distribution, 1500

topics62

Page 62: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Scribe: Topic popularity

1

10

100

1000

10000

100000

0 150 300 450 600 750 900 1050 1200 1350 1500

Group Rank

Gro

up

Siz

e

gsize(r) = floor(Nr -1.25 + 0.5); N=100,000; 1500 topics63

Page 63: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Scribe: Delay penalty

0

300

600

900

1200

1500

0 1 2 3 4 5

Delay Penalty

Cu

mu

lati

ve G

rou

ps

RMD

RAD

Relative delay penalty, average and maximum 64

Page 64: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Scribe: Node stress

0

5000

10000

15000

20000

0 100 200 300 400 500 600 700 800 900 1000 1100

Total Number of Children Table Entries

Nu

mb

er

of

No

de

s

0

5

10

15

20

25

30

35

40

45

50

55

50 150 250 350 450 550 650 750 850 950 1050

Total Number of Children Table Entries

Nu

mb

er

of

No

de

s

65

Page 65: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Scribe: Link stress

0

5000

10000

15000

20000

25000

30000

1 10 100 1000 10000

Link Stress

Nu

mb

er

of

Lin

ks

Scribe

IP Multicast

Maximum stress

One message published in each of the 1,500 topics 66

Page 66: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Scribe: Summary

Self-configuring P2P framework for topic-based publish-subscribe

• Scribe achieves reasonable performance when compared to IP multicast– Scales to a large number of subscribers– Scales to a large number of topics– Good distribution of load

• For more informationhttp://www.cs.rice.edu/CS/Systems/Pastry

67

Page 67: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Status

Functional prototypes• Pastry [Middleware 2001]• PAST [HotOS-VIII, SOSP’01]• SCRIBE [NGC 2001, IEEE JSAC]• SplitStream [submitted]• Squirrel [PODC’02]

http://www.cs.rice.edu/CS/Systems/Pastry

68

Page 68: Pastry Scalable, decentralized object location and routing for large-scale peer-to-peer systems

Current Work

• Security– secure routing/overlay maintenance/nodeID

assignment– quota system

• Keyword search capabilities• Support for mutable files in PAST• Anonymity/Anti-censorship• New applications• Free software releases

69