An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

59
An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati

Transcript of An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

Page 1: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

An Introduction to Peer-to-Peer System

Diganta GoswamiIIT Guwahati

Page 2: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

2

Outline

Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks

Structured overlay networks Unstructured overlay networks Overlay multicast networks

2

Page 3: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

3

Overview of P2P overlay networks

What is P2P systems? P2P refers to applications that take advantage of

resources (storage, cycles, content, …) available at the end systems of the internet.

What is overlay networks? Overlay networks refer to networks that are constructed

on top of another network (e.g. IP). What is P2P overlay network?

Any overlay network that is constructed by the Internet peers in the application layer on top of the IP network.

3

Page 4: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

4

What is P2P systems?

Multiple sites (at edge) Distributed resources Sites are autonomous (different owners) Sites are both clients and servers Sites have equal functionality

4

Page 5: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

5

Internet P2P Traffic Statistics

Between 50 and 65 percent of all download traffic is P2P related.

Between 75 and 90 percent of all upload traffic is P2P related.

And it seems that more people are using p2p today

So what do people download? 61.4 % video

11.3 % audio27.2 % games/software/etc.

Source: http://torrentfreak.com/peer-to-peer-traffic-statistics/

5

Page 6: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

6

P2P overlay networks properties

Efficient use of resources Self-organizing

All peers organize themselves into an application layer network on top of IP.

Scalability Consumers of resources also donate resources Aggregate resources grow naturally with

utilization

6

Page 7: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

7

P2P overlay networks properties

Reliability No single point of failure Redundant overlay links between the peers Redundant data source

Ease of deployment and administration The nodes are self-organized No need to deploy servers to satisfy demand. Built-in fault tolerance, replication, and load

balancing No need any change in underlay IP networks

7

Page 8: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

8

P2P Applications

P2P File SharingNapster, Gnutella, Kazaa, eDonkey, BitTorrentChord, CAN, Pastry/Tapestry, Kademlia

P2P CommunicationsSkype, Social Networking Apps

P2P Distributed ComputingSeti@home

8

Page 9: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

9

Popular file sharing P2P Systems

Napster, Gnutella, Kazaa, Freenet Large scale sharing of files.

User A makes files (music, video, etc.) on their computer available to others

User B connects to the network, searches for files and downloads files directly from user A

Issues of copyright infringement

9

Page 10: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

10

P2P/Grid Distributed Processing

seti@home Search for ET intelligence Central site collects radio telescope data Data is divided into work chunks of 300 Kbytes User obtains client, which runs in background Peer sets up TCP connection to central computer,

downloads chunk Peer does FFT on chunk, uploads results, gets

new chunk Not P2P communication, but exploit Peer

computing power

10

Page 11: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

11

Key Issues

Management How to maintain the P2P system under high rate of

churn efficiently Application reliability is difficult to guarantee

Lookup How to find out the appropriate content/resource that a

user wants Throughput

Content distribution/dissemination applications How to copy content fast, efficiently, reliably

11

Page 12: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

12

Management Issue A P2P network must be self-organizing.

Join and leave operations must be self-managed. The infrastructure is untrusted and the components are

unreliable. The number of faulty nodes grows linearly with system size. Tolerance to failures and churn

Content replication, multiple paths Leverage knowledge of executing application

Load balancing Dealing with free riders

Freerider : rational or selfish users who consume more than their fair share of a public resource, or shoulder less than a fair share of the costs of its production.

12

Page 13: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

13

Lookup Issue

How do you locate data/files/objects in a large P2P system built around a dynamic set of nodes in a scalable manner without any centralized server or hierarchy?

Efficient routing even if the structure of the network is unpredictable. Unstructured P2P : Napster, Gnutella, Kazaa Structured P2P : Chord, CAN, Pastry/Tapestry,

Kademlia

13

Page 14: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

14

Classification of overlay networks

Structured overlay networks Are based on Distributed Hash Tables (DHT) the overlay network assigns keys to data items and

organizes its peers into a graph that maps each data key to a peer.

Unstructured overlay networks The overlay networks organize peers in a random graph

in flat or hierarchical manners. Overlay multicast networks

The peers organize themselves into an overlay tree for multicasting.

14

Page 15: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

15

Structured overlay networks Overlay topology construction is based on NodeID’s that

are generated by using Distributed Hash Tables (DHT).

The overlay network assigns keys to data items and organizes its peers into a graph that maps each data key to a peer.

This structured graph enables efficient discovery of data items using the given keys.

It Guarantees object detection in O(log n) hops.

15

Page 16: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

16

Unstructured P2P overlay networks An Unstructured system composed of peers

joining the network with some loose rules, without any prior knowledge of the topology.

Network uses flooding or random walks as the mechanism to send queries across the overlay with a limited scope.

16

Page 17: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

17

Unstructured P2P File Sharing Networks

Centralized Directory based P2P systems Pure P2P systems Hybrid P2P systems

17

Page 18: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

18

Unstructured P2P File Sharing Networks Centralized Directory based P2P systems

All peers are connected to central entity Peers establish connections between each

other on demand to exchange user data (e.g. mp3 compressed data)

Central entity is necessary to provide the service

Central entity is some kind of index/group database

Central entity is lookup/routing table Examples: Napster, Bittorent

18

Page 19: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

19

Napster was used primarily for file sharing NOT a pure P2P network=> hybrid system Ways of action:

Client sends server the query, server ask everyone and responds to client

Client gets list of clients from server All Clients send ID’s of the data they hold to the

server and when client asks for data, server responds with specific addresses

peer downloads directly from other peer(s)

19

Page 20: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

20

Centralized Network

Napster model• Nodes register their

contents with server• Centralized server for

searches• File access done on a peer

to peer basis

– Poor scalability

– Single point of failure

Client

Server

Client

Query

Reply

File Transfer

20

Page 21: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

21

Napster

Further services: Chat program, instant messaging service, tracking

program,…

Centralized system Single point of failure => limited fault tolerance Limited scalability (server farms with load balancing)

Query is fast and upper bound for duration can be given

21

Page 22: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

22

Gnutella

pure peer-to-peer very simple protocol no routing "intelligence" Constrained broadcast

Life-time of packets limited by TTL (typically set to 7)

Packets have unique ids to detect loops

22

Page 23: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

23

Query flooding: Gnutella fully distributed

no central server

public domain protocol

many Gnutella clients implementing protocol

overlay network: graph edge between peer X and

Y if there’s a TCP connection

all active peers and edges is overlay net

Edge is not a physical link Given peer will typically

be connected with < 10 overlay neighbors

23

Page 24: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

24

Gnutella: protocol

Query

QueryHit

Query

Query

QueryHit

Query

Query

QueryHit

File transfer:HTTP

Query messagesent over existing TCPconnections peers forwardQuery message QueryHit sent over reversepath

Scalability:limited scopeflooding

24

Page 25: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

25

Gnutella : ScenarioStep 0: Join the networkStep 1: Determining who is on the network

• "Ping" packet is used to announce your presence on the network. • Other peers respond with a "Pong" packet. • Also forwards your Ping to other connected peers• A Pong packet also contains:

• an IP address • port number • amount of data that peer is sharing • Pong packets come back via same route

Step 2: Searching•Gnutella "Query" ask other peers (usually 7) if they have the file you desire• A Query packet might ask, "Do you have any content that matches the string ‘Hey Jude"? • Peers check to see if they have matches & respond (if they have any matches) & send packet to connected peers if not (usually 7)• Continues for TTL (how many hops a packet can go before it dies, typically 10 )

Step 3: Downloading• Peers respond with a “QueryHit” (contains contact info)• File transfers use direct connection using HTTP protocol’s GET method

25

Page 26: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

26

Gnutella: Peer joining

1. Joining peer X must find some other peer in Gnutella network: use list of candidate peers

2. X sequentially attempts to make TCP with peers on list until connection setup with Y

3. X sends Ping message to Y; Y forwards Ping message.

4. All peers receiving Ping message respond with Pong message

5. X receives many Pong messages. It can then setup additional TCP connections

26

Page 27: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

27

Gnutella - PING/PONG

1 52

4

3

6

7

8

Ping 1

Ping 1

Ping 1

Ping 1

Ping 1

Ping 1

Ping 1Known Hosts:2

3,4,5

6,7,8

Pong 2

Pong 4

Pong 3

Pong 5Pong 3,4,5

Pong 6,7,8 Pong 6

Pong 7

Pong 8

Pong 6,7,8

Query/Response analogous

27

Page 28: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

28

Unstructured Blind - Gnutella

= forward query

= processed query

= source

= found result

= forward response

Breadth-First Search (BFS)

28

Page 29: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

29

Unstructured Blind - Gnutella

A node/peer connects to a set of Gnutella neighbors

Forward queries to neighbors

Client which has the Information responds.

Flood network with TTL for termination

+ Results are complete

– Bandwidth wastage

29

Page 30: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

30

Gnutella : Reachable Users(analytical estimate)

T : TTL, N : Neighbors for Query

30

Page 31: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

31

Gnutella : Search Issue

Flooding based search is extremely wasteful with bandwidth A large (linear) part of the network is covered irrespective of

hits found Enormous number of redundant messages All users do this in parallel: local load grows linearly with

size What can be done?

Controlling topology to allow for better search Random walk, Degree-biased Random Walk

Controlling placement of objects Replication

31

Page 32: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

32

Gnutella : Random Walk Basic strategy

In scale-free graph: high degree nodes are easy to find by (biased) random walk

Scale-free graph is a graph whose degree distribution follows a power law

And high degree nodes can store the index about a large portion of the network

Random walk avoiding the visit of last visited node

Degree-biased random walk Select highest degree node, that has

not been visited This first climbs to highest degree node,

then climbs down on the degree sequence Provably optimal coverage

32

Page 33: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

33

Gnutella : Replication

Spread copies of objects to peers: more popular objects can be found easier

Replication strategies Owner replication Path replication Random replication

But there is still the difficulty with rare objects.

33

Page 34: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

34

Random Walkers

Improved Unstructured Blind•Similar structure to Gnutella•Forward the query (called walker) to random subset of its neighbors+ Reduced bandwidth requirements– Incomplete results

Peer nodes34

Page 35: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

35

Unstructured Informed Networks

Zero in on target based on information about the query and the neighbors.

Intelligent routing

+ Reduces number of messages

+ Not complete, but more accurate

– COST: Must thus flood in order to get initial information

35

Page 36: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

36

Informed Searches: Local Indices

Node keeps track of information available within a radius of r hops around it.

Queries are made to neighbors just beyond the r radius.

+ Flooding limited to bounded part of network

36

Page 37: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

37

Routing Indices For each query, calculate goodness of each

neighbor.

Calculating goodness: Categorize or separate query into themes Rank best neighbors for a given theme based on

number of matching documents

Follows chain of neighbors that are expected to yield the best results

Backtracking possible37

Page 38: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

38

Free riding

File sharing networks rely on users sharing data Two types of free riding

Downloading but not sharing any data Not sharing any interesting data

On Gnutella 15% of users contribute 94% of content 63% of users never responded to a query

Didn’t have “interesting” data

38

Page 39: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

39

Gnutella:summary Hit rates are high High fault tolerance Adopts well and dynamically to changing peer

populations High network traffic No estimates on duration of queries No probability for successful queries Topology is unknown => algorithm cannot exploit it Free riding is a problem

A significant portion of Gnutella peers are free riders Free riders are distributed evenly across domains Often hosts share files nobody is interested in

39

Page 40: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

40

Gnutella discussion Search types:

Any possible string comparison Scalability

Search very poor with respect to number of messages Updates excellent: nothing to do Routing information: low cost

Robustness High, since many paths are explored

Autonomy: Storage: no restriction, peers store the keys of their files Routing: peers are target of all kind of requests

Global knowledge None required

40

Page 41: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

41

Exploiting heterogeneity: KaZaA Each peer is either a group

leader or assigned to a group leader. TCP connection between

peer and its group leader. TCP connections between

some pairs of group leaders.

Group leader tracks the content in all its children.

ordinary peer

group-leader peer

neighoring re la tionshipsin overlay network

41

Page 42: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

42

iMesh, Kazaa Hybrid of centralized Napster and

decentralized Gnutella Super-peers act as local search

hubs Each super-peer is similar to a

Napster server for a small portion of the network

Super-peers are automatically chosen by the system based on their capacities (storage, bandwidth, etc.) and availability (connection time)

Users upload their list of files to a super-peer

Super-peers periodically exchange file lists

Queries are sent to a super-peer for files of interest

42

Page 43: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

43

Overlay Multicasting

IP multicast has not been deployed over the Internet due to some fundamental problems in congestion control, flow control, security, group management and etc.

For the new emerging applications such as multimedia streaming, internet multicast service is required.

Solution: Overlay Multicasting Overlay multicasting (or Application layer multicasting) is

increasingly being used to overcome the problem of non-ubiquitous deployment of IP multicast across heterogeneous networks.

43

Page 44: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

44

Overlay Multicasting

Main idea Internet peers organize themselves into an

overlay tree on top of the Internet. Packet replication and forwarding are

performed by peers in the application layer by using IP unicast service.

44

Page 45: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

45

Overlay Multicasting

Overlay multicasting benefits Easy deployment

It is self-organized it is based on IP unicast service There is not any protocol support requirement by the Internet

routers. Scalability

It is scalable with multicast groups and the number of members in each group.

Efficient resource usage Uplink resources of the Internet peers is used for multicast data

distribution. It is not necessary to use dedicated infrastructure and bandwidths

for massive data distribution in the Internet.

45

Page 46: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

46

Overlay Multicasting

Overlay multicast approachesDHT basedTree based Mesh-tree based

46

Page 47: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

47

Overlay Multicasting

DHT based Overlay tree is constructed on top of the DHT based

P2P routing infrastructure such as pastry, CAN, Chord, etc.

Example: Scribe in which the overlay tree is constructed on a Pastry networks by using a multicast routing algorithm

47

Page 48: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

48

Structured Overlay Networks / DHTs

Keys of ValuesKeys of Values

Keys of NodesSet of Nodes

Chord, Pastry, Tapestry, CAN, Kademlia, P-Grid, Viceroy

Node IdentifierValue Identifier

Hashing

Hashing

Common Identifier Space

ConnectThe nodesSmartly

48

Page 49: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

49

The Principle Of Distributed Hash Tables A dynamic distribution of a hash table onto a set of cooperating

nodes

Key Value

1 Algorithms

9 Routing

11 DS

12Peer-to-Peer

21 Networks

22 Grids

• Basic service: lookup operation • Key resolution from any node

• Each node has a routing table • Pointers to some other nodes• Typically, a constant or a logarithmic number of pointers

node A

node D

node B

node C

→Node D : lookup(9)

49

Page 50: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

50

DHT Desirable Properties

Keys mapped evenly to all nodes in the network

Each node maintains information about only a few other nodes

Messages can be routed to a node efficiently

Node arrival/departures only affect a few nodes

50

Page 51: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

51

Chord [MIT]

Problem adressed: efficient node localization

Distributed lookup protocol Simplicity, provable performance, proven

correctness Support of just one operation: given a key,

Chord maps the key onto a node

51

Page 52: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

52

The Chord algorithm –Construction of the Chord ring

the consistent hash function assigns each node and each key an m-bit identifier using SHA 1 (Secure Hash Standard).

m = any number big enough to make collisions improbable

Key identifier = SHA-1(key)

Node identifier = SHA-1(IP address) Both are uniformly distributed Both exist in the same ID space

52

Page 53: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

53

Chord consistent hashing (SHA-1) assigns each

node and object an m-bit ID IDs are ordered in an ID circle ranging from

0 – (2m-1). New nodes assume slots in ID circle

according to their ID Key k is assigned to first node whose ID ≥ k successor(k)

53

Page 54: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

54

Consistent Hashing - Successor Nodes

6

1

2

6

0

4

26

5

1

3

7

2identifier

circle

identifier

node

X key

successor(1) = 1

successor(2) = 3successor(6) = 0

54

Page 55: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

55

Consistent Hashing – Join and Departure

When a node n joins the network, certain keys previously assigned to n’s successor now become assigned to n.

When node n leaves the network, all of its assigned keys are reassigned to n’s successor.

55

Page 56: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

56

Consistent Hashing – Node Join

0

4

26

5

1

3

7

keys1

keys2

keys

keys

7

5

56

Page 57: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

57

Consistent Hashing – Node Dep.

0

4

26

5

1

3

7

keys1

keys2

keys

keys6

7

57

Page 58: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

58

Simple node localization

// ask node n to find the successor of id

n.find_successor(id)

if (id (n; successor])

return successor;

else

// forward the query around the

circle

return successor.find_successor(id);

=> Number of messages linear in the number of nodes !

58

Page 59: An Introduction to Peer-to-Peer System Diganta Goswami IIT Guwahati.

59

Scalable Key Location – Finger Tables To accelerate lookups, Chord maintains additional routing

information. This additional information is not essential for correctness,

which is achieved as long as each node knows its correct successor.

Each node n, maintains a routing table with up to m entries (which is in fact the number of bits in identifiers), called finger table.

The ith entry in the table at node n contains the identity of the first node s that succeeds n by at least 2i-1 on the identifier circle.

s = successor(n+2i-1). s is called the ith finger of node n, denoted by n.finger(i)

59