P2P Search COP6731 Advanced Database Systems. P2P Computing Powerful personal computer Share...

41
P2P Search COP6731 Advanced Database Systems

description

P2P Search Techniques  Centralized P2P systems e.g. Napster,  Decentralized & unstructured P2P systems e.g. Gnutella  Hybrid - partially decentralized e.g., Freenet  Structured P2P systems DHT systems (CAN/Chord/Pastry/Tapestry) Skip-list based systems

Transcript of P2P Search COP6731 Advanced Database Systems. P2P Computing Powerful personal computer Share...

Page 1: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

P2P Search

COP6731Advanced Database Systems

Page 2: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

P2P Computing

Powerful personal computer Share computing resources P2P Computing

Advantages: Shared infrastructure costs Highly scalable No SPOF censorship-resistance

Name
Page 3: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

P2P Search Techniques

Centralized P2P systems e.g. Napster, SETI@home

Decentralized & unstructured P2P systems e.g. Gnutella

Hybrid - partially decentralized e.g., Freenet

Structured P2P systems DHT systems (CAN/Chord/Pastry/Tapestry) Skip-list based systems

Page 4: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

Napster

MP3 file sharing with a centralized catalog

Peers hold files Napster Inc’s servers hold catalog File transfer is P2P, using a

proprietary protocol

Page 5: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

Central Napster server(xyz.mp3, 192.1.2.3)

192.1.2.3

Napster: Publish a File

Users upload their IP address and music titles they wish to share

Page 6: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

Users search for peers to download desired files

xyz.mp3 ?

192.1.2.3192.1.2.3

Napster: Query for a File

Central Napster server

Page 7: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

File transfer is P2P, using a proprietary protocol

192.1.2.3

xyz.mp3 ?

Napster: Transfer Requested File

Central Napster server

Page 8: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

Disadvantage of Centralized Directory

Performance bottleneck Single point of failure

Can we do it without a directory ?

Page 9: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

Gnutella

No catalog Pings network to locate Gnutella

peers File requests are broadcast to peers

Flooding or breadth-first research When provider is located, the file is

transferred via HTTP

Page 10: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

xyz.mp3 ?

Gnutella: Issue a Request

Page 11: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

Gnutella: Flood the Request

Page 12: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

xyz.mp3

Gnutella: Reply with the File

Page 13: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

Gnutella - Disadvantages

Network flooding - unnecessary network traffic

Using TTL - some files might not be found

Alternatively, using ultranodes (or supernodes) using depth-first search, i.e., Freenet

Page 14: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

Morpheus, Kazaa

Cluster

Cluster

Cluster

CenterIndex forits cluster

C

B

A

F

E

D

I

H

G

Query: “W

ho has

file X”

Reply: “Peer H

has

file X”

Download file X from Peer H

SupernodeLayer

Page 15: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

Using Ultranodes

Queries flood only the network of ultranodes

Other peer nodes shielded from query traffic

Combine the benefits of centralized and decentralized search;

Take advantage of the heterogeneity in peer capabilities;

Page 16: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

Freenet - Depth-First Search

A

B

D

C

E

Query: “Who has file X”

Download file X from Peer E

Page 17: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

Freenet – File not Found

A

B

D

C

E

Download file X from Peer E

F

NOT FOUND !

I HAVE FILE X !

The requested file not found due to a poor routing decision made at peer D

In this case, query backs out of the dead-end, and tries another peer in depth-first manner

Page 18: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

Structured P2P Systems DHT-based

Chord / Pastry / Tapestry: hash-based into single dimensional space

CAN: hash-based into multi-dimensional space P-grid: hash-based into virtual binary search tree

Skip-list based Skipgraph / SkipNet

Index Tree-based BATON

Page 19: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

DHT Design Goals

An “overlay” network with: Flexible mapping of keys to physical nodes

Data Independence Small network diameter Small degree (fan-out) Local routing decisions Robustness to churn Routing flexibility Proximity

A “storage” or “memory” mechanism with No guarantees on persistence Maintenance via soft state

Page 20: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

Metrics

Searching/Lookup Number of hops in searching Number of messages Database related metrics:

Total disk I/O Response Time Accuracy

Maintenance Number of hops Number of messages

Page 21: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

How to Bound Search Space ?

Network

Work onplacement!

Page 22: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

Basic Idea - Hashing

Hash key

Object “y”

Objects have hash keys

Peer “x”Peer nodes also have hash keys in the same hash space

P2P Network

y xH(y) H(x)

Join (H(x))Publish (H(y))

Place object to the peer with closest hash keys

Page 23: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

Viewed as a Distributed Hash Table

Hash table0 2128-1

Peernodes

Each is responsible for a range of the hash table,according to the peer hash key

Objects are placed in the peer with the closest keyNote thatpeers areInternetedges

Internet

Page 24: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

How to Find an Object?

Hashtable

0 2128-1

Peernode

Simplest idea:Everyone knows everyone else!

one hop tofind the objectWant to keep only

a few entries!

Page 25: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

Using Distributed Hash Table (DHT) A peer only needs to know its logical

neighbors Search based on multihop routing

Hashtable

0 2128-1

Peernode

Page 26: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action

Page 27: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action

Page 28: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action

Operation: take key as input; route messages to node holding key

Page 29: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action: put()

insert(K1,V1)

Operation: take key as input; route messages to node holding key

Page 30: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action: put()

Operation: take key as input; route messages to node holding key

insert(K1,V1)

Page 31: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

(K1,V1)

K V

K VK V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action: put()

Operation: take key as input; route messages to node holding key

Page 32: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

retrieve (K1)

K V

K VK V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action: get()

Operation: take key as input; route messages to node holding key

Page 33: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

DHT in action

retrieve (K1)

Page 34: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

CAN – Content Addressable Network

Each peer is responsible for one zone, i.e., stores all (key, value) pairs of the zone

Each peer knows the neighbors of its zone

Random assignment of peers to zones at startup

Dimensional-ordered multihop routing

Page 35: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

CAN: Object Publishing

node I::publish(K,V) I

Page 36: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

(1) a = hx(K)

CAN: Object Publishingx = a

node I::publish(K,V) I

Page 37: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

(1) a = hx(K) b = hy(K)

CAN: Object Publishingx = a

y = b

node I::publish(K,V) I

Page 38: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

(1) a = hx(K) b = hy(K)

CAN: Object Publishing

(2) route (K,V) -> J

node I::publish(K,V) I

J

Page 39: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

(2) route (K,V) -> J

(3) J stores (K,V)

CAN: Object Publishing

(K,V)

node I::publish(K,V) I

(1) a = hx(K) b = hy(K)

J

Page 40: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

(2) route “retrieve(K)” to J that is in charge of (a,b)

(K,V)(1) a = hx(K) b = hy(K)

node I::retrieve(K)

I

CAN: Object Retrieval

J

Page 41: P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.

Some Research Topics

Content-based Image Retrieval in P2P

Location Management in P2P Security Considerations for DHT P2P Backup Wireless P2P