Replication Strategies in Unstructured Peer-to-Peer Networks

28
Aug 22, 2002 Sigcomm 2002 Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen AT&T Labs-research Scott Shenker ICIR

description

Replication Strategies in Unstructured Peer-to-Peer Networks. Edith Cohen AT&T Labs-research. Scott Shenker ICIR. Peer-to-peer Networks. Peers are connected by an overlay network. Users cooperate to share files (e.g., music, videos, etc.). (Search in) Basic P2P Architectures. - PowerPoint PPT Presentation

Transcript of Replication Strategies in Unstructured Peer-to-Peer Networks

Page 1: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

Replication Strategies in Unstructured Peer-to-Peer

Networks

Edith CohenAT&T Labs-research

Scott ShenkerICIR

Page 2: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

Peer-to-peer Networks• Peers are connected by an

overlay network.• Users cooperate to share

files (e.g., music, videos, etc.)

Page 3: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

(Search in) Basic P2P Architectures

• Centralized: central directory server. (Napster) Supports versatile queries, scope, legal troubles• Decentralized: search is performed by

probing peers– Structured (DHTs): (Freenet, Can, Chord,…)

location is coupled with topology - search is routed by the query. Scope, Only exact-match queries, tightly controlled overlay.

– Unstructured: (Gnutella, FastTrack); search is “blind” - probed peers are unrelated to query. Resilient to transient peers; versatile queries; Harsh scope/scalability tradeoff.

Page 4: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

(replication in) P2P architectures

• No proactive replication (Gnutella)– Hosts store and serve only what they

requested– A copy can be found only by probing a host

with a copy• Proactive replication of “keys” (= meta

data + pointer) for search efficiency (FastTrack, DHTs)

• Proactive replication of “copies” – for search and download efficiency, anonymity. (Freenet)

Page 5: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

Question: how to use replication to improve search efficiency in unstructured networks with a proactive replication mechanism ?

Page 6: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

Search and replication model

• Search: probe hosts, uniformly at random, until the query is satisfied (or the search max size is exceeded)

Goal: minimize average search size (number of probes till query is satisfied)

• Replication: Each host can store up to copies (or keys=metadata+pointer) of items.

Unstructured networks with replication of keys or copies. Peers probed (in the search and replication process) are unrelated to query/item - Probe success likelihood can not be better, on average, than random probes.

Page 7: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

What is the search size of a query ?• Insoluble queries: maximum search size• Soluble queries: number of probes until

answer is found. We look at the Expected Search Size (ESS) of

each item. The ESS is inversely proportional to the fraction of peers with a copy of the item.

Search size• Query is soluble if there are sufficiently many

copies of the item.• Query is insoluble if item is rare or non

existent.

Page 8: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

Search Example

2 probes 4 probes

Page 9: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

Expected Search Size (ESS)

• Allocation : p1, p2, p3,…, pm i pi = 1 ith item is allocated pi fraction of

storage. (keys placed in pi fraction of hosts)

• m items with relative query rates q1 > q2 > q3 > … > qm. i qi = 1

• Search size for ith item is a Geometric r.v. with mean Ai = 1/( pi ).

• ESS is i qi Ai = (i qi / pi)/

Page 10: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

Uniform and Proportional Replication

Two natural strategies:• Uniform Allocation: pi = 1/m

•Simple, resources are divided equally• Proportional Allocation: pi = qi

•“Fair”, resources per item proportional to demand• Reflects current P2P practices

Example: 3 items, q1=1/2, q2=1/3, q3=1/6Uniform Proportional

Page 11: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

Basic Questions• How do Uniform and Proportional

allocations perform/compare ?• Which strategy minimizes the Expected

Search Size (ESS) ?• Is there a simple protocol that achieves

optimal replication in decentralized unstructured networks ?

Page 12: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

Insoluble queries• Search always extends to the maximum allowed

search size.• If we fix the available storage for copies, the query

rate distribution, and the number of items that we wish to be “locatable”, then

• The maximum required search size depends on the smallest allocation of an item. Thus,

• Uniform allocation minimizes this maximum and thus the cost induced by insoluble queries.

What about the cost of soluble queries? Answer is more surprising …

Page 13: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

ESS under Uniform and Proportional Allocations

(soluble queries)• Lemma: The ESS under either Uniform or Proportional allocations is m/– Independent of query rates (!!!)– Same ESS for Proportional and Uniform (!!!)

Proportional:ASS is (i qi / pi)/(i qi / qi)/m/

Uniform:ASS is (i qi / pi)/(i m qi)/m/i qi m/

• Proof…

Page 14: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

Space of Possible AllocationsDefinition: Allocation p1, p2, p3,…, pm is “in-between”

Uniform and Proportional if for 1i <m, q i+1/q i < p i+1/p i < 1

Theorem1: All (strictly) in-between strategies are (strictly) better than Uniform and Proportional

Theorem2: p is worse than Uniform/Proportional if for all i, p i+1/p i > 1 (more popular gets less) OR for all i, q i+1/q i > p i+1/p i (less popular gets less than

“fair share”)

Proportional and Uniform are the worst “reasonable” strategies (!!!)

Page 15: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002q2/q1

p2/p

1Space of allocations on 2 items

Worse than prop/uniMore popular item gets less.

Worse than prop/uni

More popular gets more thanits proportional share

Better than prop/uni

Uniform

Proportional

SR

Page 16: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

So, what is the best strategy for soluble queries ?

Page 17: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

Square-Root Allocationpi is proportional to square-root(qi)

m

jj

ii

q

qp

1

• Lies “In-between” Uniform and Proportional

• Theorem: Square-Root allocation minimizes the ESS (on soluble queries)

Minimize i qi / pi such that i pi = 1

Page 18: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

How much can we gain by using SR ?wi iq −∝Zipf-like query rates

Page 19: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

OK• SR is best for soluble queries• Uniform minimizes cost of insoluble queries

OPT is a hybrid of Uniform and SR

Tuned to balance cost of soluble and insoluble queries.

What is the optimal strategy?

Page 20: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

UniformSR

10^4 items, Zipf-like w=1.5

All Soluble

85% Soluble

All Insoluble

Page 21: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

We now know what we need.

How do we get there?

Page 22: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

Replication Algorithms

• Fully distributed where peers communicate through random probes; minimal bookkeeping; and no more communication than what is needed for search.

• Converge to/obtain SR allocation when query rates remain steady.

• Uniform and Proportional are “easy” :-– Uniform: When item is created, replicate its key

in a fixed number of hosts.– Proportional: for each query, replicate the key

in a fixed number of hostsDesired properties of algorithm:

Page 23: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

Model for Copy Creation/Deletion

• Creation: after a successful search, C(s) new copies are created at random hosts.

• Deletion: is independent of the identity of the item; copy survival chances are non-decreasing with creation time. (i.e., FIFO at each node)

<Ci> average value of C used to replicate ith item.

Claim: If <Ci>/<Cj> remains fixed over time, and <Ci>, <Cj> , then pi/pj qi <Ci>/qj <Cj>

Property of the process:

Page 24: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

Creation/Deletion Process

ii qC 1∝ jiji qqpp →If then

Corollary:

Algorithm for square-root allocation needs to have <Ci> equal to or converge to a value inversely proportional to

iq

Page 25: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

SR Replication Algorithms• Path replication: number of new copies C(s) is proportional

to the size of the search (Freenet)– Converges to SR allocation (+reasonable conditions)– Convergence unstable with delayed creations

• Sibling memory: each copy remembers the number of sibling copies,– Quickly “on target”– For “good estimates” need to find several copies.

• Probe memory: each peer records number and combined search size of probes it sees for each item. C(S) is determined by collecting this info from number of peers proportional to search size. – Immediately “on target”– Extra communication (proportional to that needed for search).

Page 26: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

Alg1: Path Replication• Number of new copies produced per query,

<Ci>, is proportional to search size 1/pi• Creation rate is proportional to qi <Ci>• Steady state: creation rate proportional to

allocation pi, thus

iiiii ppqCq ∝∝

ii qp ∝

Page 27: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

Simulation

Path replicationSibling number

Host

s with

cop

y

time

Delay = 0.25 * copy lifetime; 10000 hosts

Page 28: Replication Strategies in Unstructured Peer-to-Peer Networks

Aug 22, 2002 Sigcomm 2002

Summary• Random Search/replication Model: probes to

“random” hosts• Proportional allocation – current practice• Uniform allocation – best for insoluble queries• Soluble queries:

• Proportional and Uniform allocations are two extremes with same average performance

• Square-Root allocation minimizes Average Search Size

• OPT (all queries) lies between SR and Uniform• SR/OPT allocation can be realized by simple

algorithms.