Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen,...

28
Search and Replication in Unstructured Peer- to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002

Transcript of Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen,...

Page 1: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Search and Replication in Unstructured Peer-to-Peer

Networks

Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker

ICS 2002

Page 2: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Outline

• Brief survey of P2P architectures

• Evaluation Methodology

• Search Methods

• Replication

• Conclusions

Page 3: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Peer-to-Peer Networks

• Peers are connected by an overlay network.

• Users cooperate to share files (e.g., music, videos, etc.)

• Dynamic: nodes join or leave frequently

Page 4: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

P2P Network Architectures I

• Centralized: – Use of central directory server (CDS)– Peers query to the CSD to find other peers

that hold the desired object

Pros: very efficient

Cons: poorly scales single point of failure

Page 5: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

P2P Network Architectures II

• Decentralized: No central directory server– But structured:

• P2P network topology is tightly controlled

• Files are placed at specified locations

– Unstructured:• No control in Network

topology or file placement

Page 6: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

P2P Network Architectures III

Decentralized but Structured• “loose structured”

– Placement of files is based on hints

• “tight structure”– Precisely declare

• structure of P2P network and • file placement

– Use of distributed hash tablePros: Efficient satisfaction of queries

Good scalingCons: No proof it works

Page 7: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

P2P Network Architectures IV

Decentralized and Unstructured• Placement of files not based on topology

knowledge• Finding files

– Node queries neighbors (usually using flooding)

Pros: extremely resilient to network changesCons: extremely unscalable

generates large loads

Page 8: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Evaluation Methodology I

Terminology• Network Topology:

instant graph formed by nodes in the network

• Query Distribution:frequency of lookups to files

• Replication Distribution:

percentage of nodes that have a particular file

Page 9: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Evaluation Methodology II

• Network Topologies– Powel-Law Random Graph (PLRG)

• Max node degree: 1746, median: 1 average 4.46

– Normal Random Graph (Random)• Average and median node degree is 4

– Gnutella graph (Gnutella)• Oct 2000 snapshot• Max degree: 136, median: 2, average: 5.5

– Two-dimensional Grid• 100x100 10000 nodes

Page 10: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Evaluation Methodology III

• Object query distribution qi

– Uniform– Zipf-like

• Object replication density distribution ri

– Uniform

– Proportional: ri qi

– Square-Root: ri qi

Page 11: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Evaluation Methodology IV

• Metrics– User aspects

• Pr(success)• #hops

– Load aspects• Average #messages per node• #nodes visited• Peak #messages

Page 12: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Limitation of Flooding I

• Gnutella uses TTL to check #hops queries travel

• Problem: – Hard to choose TTL:

• For objects that are widely present in the network, small TTLs suffice

• For objects that are rare in the network, large TTLs are necessary

– Number of query messages grow exponentially as TTL grows

Page 13: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Limitation of Flooding II

• Node may receive the same messages more than once

• Need for duplication detection mechanisms

• Still duplication increases as TTL increases in flooding

Page 14: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Limitation of Flooding Conclusion

• Flooding increases per-node overhead

• Need for more scalable search methods:– Expanding Ring

– Random Walks

Page 15: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Expanding Ring• Adaptively Adjust TTL

– Multiple floods: start with TTL=1; increment TTL by 2 each time until search succeeds

Still have duplicate messages

Page 16: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Random Walk

• Simple random walk– Takes too long to find anything

• Multiple-walker random walk– K walkers after each walking T steps visits as

many nodes as 1 walker walking K*T steps– More messages more overhead– When to terminate the search:

• TTL• Checking: check back with query originator once

every C steps

Page 17: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Search Traffic Comparison

avg. # msgs per node per query

1.863

2.85

0.053

0.961

0.027 0.0310

0.5

1

1.5

2

2.5

3

Random Gnutella

Flood Ring Walk

Page 18: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Search Delay Comparison

# hops till success

2.51 2.39

4.033.4

9.12

7.3

0

2

4

6

8

10

Random Gnutella

Flood Ring Walk

Page 19: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Lessons Learned about Search Methods

• Key: Cover the right number of nodes as quickly as possible and with as little overhead as possible

• Pay Attention to– Adaptive termination– Minimize message duplication– Small expansion in each step

Page 20: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Replication

• In unstructured P2P systems, search success is essentially about coverage: visiting enough nodes to find the object => replication density matters

• Goal: minimize average search size (number of probes till query is satisfied)

• Theoretical Optimal: copy everything everywhere– Limited node storage

Page 21: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Replication Strategies

• Uniform Replication– pi = 1/m– Simple, resources are divided equally

• Proportional Replication– pi = qi– “Fair”, resources per item proportional to

demand– Reflects current P2P practices

Page 22: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Square-Root Replication

• pi is proportional to square-root(qi)• Lies “In-between” Uniform and Proportional

Page 23: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Achieving Square-Root Replication I

• Assuming that each query keeps track the number of probes needed

• Store an object at a number of nodes that is proportional to the number of probes

• Two implementations:– Path replication: store the object along the

path of a successful “walk”– Random replication: store the object randomly

among nodes visited by the agents

Page 24: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Achieving Square-Root Replication II

Page 25: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Evaluation of Replication Methods I

• Metrics– Overall message traffic– Search delay

• Dynamic simulation– Assume Zipf-like object query probability– 5 query/sec Poisson arrival– Results are during 5000sec-9000sec– Search method: 32-walkers random walk with

state keeping and check every 4 steps

Page 26: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Evaluation of Replication Methods II

Square-Root Replication reduces search traffic

Avg. # msgs per node (5000-9000sec)

0

10000

20000

30000

40000

50000

60000

Owner Rep

Path Rep

Random Rep

Page 27: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Evaluation of Replication Methods III

Dynamic simulation: Hop Distribution (5000~9000s)

0

20

40

60

80

100

120

1 2 4 8 16 32 64 128 256

#hops

qu

eri

es

fin

ish

ed

(%

)

Owner Replication

Path Replication

Random Replication

Page 28: Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Conclusions

• Multi-walker random walk scales much better than flooding– Can find data more quickly– Reduces the traffic overload

• Square-root replication distribution is desirable– Minimizes search delay– Minimizes the overall search traffic