Introduction to Peer-to-Peer Networks - HAW Hamburgschmidt/maw/p2p... · 2007-03-25 · Connect to...

Introduction to Peer-to-Peer Networks

The Story of Peer-to-PeerThe Nature of Peer-to-Peer: Generals & ParadigmsUnstructured Peer-to-Peer SystemsStructured Peer-to-Peer Systems

Chord

1 Prof. Dr. Thomas Schmidt http:/www.informatik.haw-hamburg.de/~schmidt

CAN

A Peer-to-Peer system is a self-organizing system of equal, autonomous entities (peers) which aims for the shared usage of distributed resources in a networked environment avoiding central services.

Andy Oram


The Old Days


NetNews (nntp)Usenet since 1979, initially based on UUCP

Exchange (replication) of news articles by subscription

Group creation/deletion decentralised

DNSDistributed delegation of name authorities: file sharing of host tables

Name “Servers” act as peers

Hierarchical information space permits exponential growth

Systems are manually configured distributed peers

SETI@home: Distributed Computing

From Anderson et. al.: SETI@home, Comm. ACM, 45 (11), Nov. 2002


Search for Extraterrestrial Intelligence (SETI)

Analyse radio sig-nals from space

Globally sharedcomputing res.

Idea 1995

First version 1998

2002 ≈ 4 Mio clnt

E.g. Screensaver

http://setiathome.berkeley.edu/ - ongoing

Napster

MAY 1999: Disruption of the Internet communityFirst Generation of File sharing: Introduction of Napster

Users not only consume and download, but also offer content

Users establish a virtual network, entirely independent from physical network and administrative authorities or restrictions

Basis: UDP and TCP connections between the peers

Napster provides centralised indexingClients upload their file list to Napster Server

Clients query Index Server and receive full provider list

Data exchange directly between peers


Napster

December 1999: RIAA files a lawsuit against Napster Inc.

March 2000: University of Wisconsin reports 25 % of its IP traffic is Napster traffic

February 2001: 2.79 billion files per month exchanged via NapsterJuly 2001: Napster Inc. is convicted

Target of the RIAA: the central lookup server of NapsterNapster has to stop the operation of the Napster serverNapster network breaks down

Napster failed (technically & legally) at its single server point.



GnutellaFile sharing fully decentralised

Open source software

March 2000:

Release 0.4 –with network flooding

Spring 2001:

Release 0.6 –improved scalability

Gnutella 0.4

Pure P2P system – no central indexing server

Operations:1. Connect to at least one active peer (address received

from bootstrap)2. Explore your neighborhood (PING/PONG)3. Submit Query with a list of keywords to your

neighbors (they forward it)4. Select “best” of correct answers (which we receive

after a while)5. Connect to providing host/peer

Scaling Problems due to network flooding


Gnutella 0.4: How Does It Work


Gnutella 0.6

Hybrid P2P System – Introduction of Superpeers

Improved scalability: signalling reduced to Superpeers

Election mechanism decides which node becomes a Superpeer or a Leafnode (depending on capabilities (storage, processing power) network connection, the uptime of a node,…)

Leafnodes announce their shared content to the Superpeer they are connected to

Superpeers carry local routing tables


Gnutella 0.6: How Does It Work


From: J. Eberspächer, R. Schollmeier: First and Second Generation Peer-to-Peer Systems, in LNCS 3485

The Gnutella Network

Measurements from May 2002




SkypeVoIP conferencing system

Released 2003

Central login server

Hybrid P2P system otherwise

Main focuses:

Detect users

Traverse NAT & Firewalls (STUN)

Elects Superpeersaccording to network connectivity

Uses Superpeers as relays

Impacts of P2P at the Abilene Backbone


0 %

1 0 %

2 0 %

3 0 %

4 0 %

5 0 %

6 0 %

7 0 %

8 0 %

9 0 %

1 0 0 %

18.02.2

00218.0

4.2002

18.06.2

00218.0

8.2002

18.10.2

00218.1

2.2002

18.02.2

00318.04.2

00318.0

6.2003

18.08.2

00318.1

0.2003

18.12.2

00318.0

2.2004

18.04.2

00418.0

6.2004

18.08.2

004

Traf

fic p

ortio

ns in

% p

er w

eek

U n id e n t if ie dD a ta _ T ra n s fe r sF ile _ S h a r in g

Core of Internet2 infrastructure, connecting 190 US universities and research

centers

Only Signaling

Possible data transfers

Unidentified + data_transfers + file_sharingcauses 90% of the traffic

Unidentified traffic and data_transfersincreased significantly

Parts of P2P is hidden (port hopping,…)

Some P2P applications use port 80 data_transfers

Data source: http://netflow.internet2.edu/weekly/

The Nature of P2P

P2P Networks overlay network infrastructure

Implemented on applicationlayer

Overlay Topology forms avirtual signaling network established via TCP connects

Peers are content provider+content requestor+router in the overlay network

Address: General Unique ID


P2P & Distributed Systems Paradigm

Coordination among equal components

Decentralised & self organising

Independence of individual peers

Scalability over tremendous ranges

High dynamic from volatile members

Fault resilience against infrastructure & nodes

Incentives instead of control


P2P & Internetworking Paradigm


Loose, stateless coupling among peers

Serverless & without infrastructural entities

Dynamic adaptation to network infrastructure

Overcome of NATs or port barriers

Client-Server principle reduced to communication programming, not an application paradigm anymore

Somewhat “Back to the Internet roots”:

Freedom of information

Freedom of scale

But: Freedom of Internet infrastructure & regulation

P2P Application Areas

File sharing

Media Conferencing

Resource Sharing: Grids

Collaborative Communities

Content based networking: e.g. Semantic Nets

Business Applications: e.g. “Market Management”

Ubiquitous Computing: e.g. Vehicular Communication

…



Psychoacoustics


The Challenge in Peer-to-Peer Systems


Location of resources (data items) distributed among systemsWhere shall the item be stored by the provider?

How does a requester find the actual location of an item?

Scalability: limit the complexity for communication and storage

Robustness and resilience in case of faults and frequent changes

D

?

Data item „D“

distributed system

7.31.10.25

peer-to-peer.info

12.5.7.31

95.7.6.10

86.8.10.18

planet-lab.orgberkeley.edu 89.11.20.15

?

I have item „D“.Where to place „D“?

?

I want item „D“.Where can I find „D“?

Unstructured P2P Revisited


Basically two approaches:

Centralized

Simple, flexible searches at server (O(1))

Single point of failure, O(N) node states at server

Decentralized Flooding

Fault tolerant, O(1) node states

Communication overhead ≥ O(N2), search may fail

But:

No reference structure between nodes imposed

Unstructured P2P: Complexities


Com

mun

icat

ion

Ove

rhea

d

Flooding

CentralServer

O(N)

O(N)O(1)

O(1)

O(log N)

O(log N)

Bottleneck:•CommunicationOverhead

•False negativesBottlenecks:•Memory, CPU, Network•Availability?

Scalable solutionbetween both

extremes?

Idea: Distributed Indexing


Initial ideas from distributed shared memories (1987 ff.)

Nodes are structured according to some address space

Data is mapped into the same address space

Intermediate nodes maintain routing information to target nodes

Efficient forwarding to „destination“ (content – not location)

Definitive statement about existence of content

H(„my data“)= 3107

2207

29063485

201116221008709

611

?

H(„my data“)= 3107

2207

29063485

201116221008709

611

?

Scalability of Distributed IndexingCommunication effort: O(log(N)) hops

Node state: O(log(N)) routing entries


H(„my data“)= 3107

2207

7.31.10.25

peer-to-peer.info

12.5.7.31

95.7.6.10

86.8.10.18

planet-lab.orgberkeley.edu

29063485

201116221008709

611

89.11.20.15

?

Routing in O(log(N)) steps to the node storing the data

Nodes store O(log(N)) routing information to

other nodes

Distributed Indexing: Complexities


Com

mun

icat

ion

Ove

rhea

d

Node State

Flooding

CentralServer

O(N)

O(N)O(1)

O(1)

O(log N)

O(log N)

Bottleneck:•CommunicationOverhead

•False negativesBottlenecks:•Memory, CPU, Network•AvailabilityDistributed

Hash Table

Scalability: O(log N)

No false negatives

Resistant against changes

Failures, Attacks

Short time users

Fundamentals of Distributed Hash Tables


Desired Characteristics: Flexibility, Reliability, Scalability

Challenges for designing DHTs

Equal distribution of content among nodesCrucial for efficient lookup of content

Permanent adaptation to faults, joins, exits of nodesAssignment of responsibilities to new nodes

Re-assignment and re-distribution of responsibilities in case of node failure or departure

Maintenance of routing information

Distributed Management of Data

1. Mapping of nodes and data into same address space

Peers and content are addressed using flat identifiers (IDs)

Nodes are responsible for data in certain parts of the address space

Association of data to nodes may change since nodes may disappear

2. Storing / Looking up data in the DHT

Search for data = routing to the responsible nodeResponsible node not necessarily known in advance

Deterministic statement about availability of data


Addressing in Distributed Hash TablesStep 1: Mapping of content/nodes into linear space

Usually: 0, …, 2m-1 à number of objects to be stored

Mapping of data and nodes into an address space (with hash function)

E.g., Hash(String) mod 2m: H(„my data“) 2313

Association of parts of address space to DHT nodes


H(Node Y)=3485

3485 -610

1622 -2010

611 -709

2011 -2206

2207-2905

(3485 -610)

2906 -3484

1008 -1621

Y

X

2m-1 0

Often, the address space is viewed as a circle.

Data item “D”:H(“D”)=3107 H(Node X)=2906

Mapping Address Space to NodesEach node is responsible for part of the value range

Often with redundancy (overlapping of parts)

Continuous adaptation

Real (underlay) and logical (overlay) topology so far uncorrelated


Logical view of the Distributed Hash Table

Mapping on the real topology

2207

29063485

201116221008709

611

Node 3485 is responsible for data items in range 2907 to 3485 (in case of a Chord-DHT)

Routing to a Data Item


Step 2: Locating the data (content-based routing)Routing to a Key-Value-pair

Start lookup at arbitrary node of DHT

Routing to requested data item (key) recursively according to node tables

(3107, (ip, port))

Value = pointer to location of data

Key = H(“my data”)

Node 3485 manages keys 2907-3485,

Initial node(arbitrary)

H(„my data“)= 3107

2207

29063485

201116221008

709

611

?

Routing to a Data Item (2)

Getting the contentK/V-pair is delivered to requester

Requester analyzes K/V-tuple(and downloads data from actual location – in case of indirect storage)


H(„my data“)= 3107

2207

29063485

201116221008

709

611?

Get_Data(ip, port)

Node 3485 sends (3107, (ip/port)) to requester

In case of indirect storage:After knowing the actual

Location, data is requested

DHT Algorithms


Lookup algorithm for nearby objects (Plaxton et al 1997)

Before P2P … later used in Tapestry

Chord (Stoica et al 2001)

Straight forward 1-dim. DHT

Pastry (Rowstron & Druschel 2001)

Proximity neighbour selection

CAN (Ratnasamy et al 2001)

Route optimisation in a multidimensional identifier space

Kademlia (Maymounkov & Mazières 2002) …

Chord: Overview


Early and successful algorithm

Simple & eleganteasy to understand and implement

many improvements and optimizations exist

Main responsibilities:Routing

Flat logical address space: l-bit identifiers instead of IPs

Efficient routing in large systems: log(N) hops, with N number of total nodes

Self-organizationHandle node arrival, departure, and failure

Chord: Topology

Hash-table storageput (key, value) inserts data into Chord

Value = get (key) retrieves data from Chord

Identifiers from consistent hashingUses monotonic, load balancing hash function

E.g. SHA-1, 160-bit output → 0 <= identifier < 2160

Key associated with data itemE.g. key = sha-1(value)

ID associated with hostE.g. id = sha-1 (IP address, port)


Chord: Topology


Keys and IDs on ring, i.e., all arithmetic modulo 2160

(key, value) pairs managed by clockwise next node: successor 6

1

2

6

0

4

26

5

1

3

7

2ChordRing

IdentifierNode

X Key

successor(1) = 1

successor(2) = 3successor(6) = 0

Chord: Topology

Topology determined by links between nodes

Link: knowledge about another node

Stored in routing table on each node

Simplest topology: circular linked list

Each node has link to clockwise next node


0

4

26

5

1

3

7

Routing on Ring ?


Primitive routing:Forward query for key x until successor(x) is found

Return result to source of query

Pros:Simple

Little node state

Cons:Poor lookup efficiency:O(1/2 * N) hops on average(with N nodes)

Node failure breaks circle

0

4

26

5

1

3

7

1

2

6

Key 6?

Node 0

Improved Routing on Ring?


Improved routing:Store links to z next neighbors, Forward queries for k to farthest known predecessor of k

For z = N: fully meshed routing systemLookup efficiency: O(1)

Per-node state: O(N)

Still poor scalability in linear routing progress

Scalable routing:Mix of short- and long-distance links required:

Accurate routing in node’s vicinity

Fast routing progress over large distances

Bounded number of links per node

Chord: RoutingChord’s routing table: finger table

Stores log(N) links per node

Covers exponentially increasing distances:Node n: entry i points to successor(n + 2i) (i-th finger)


0

4

26

5

1

3

7

finger tablei succ.

keys1

012

330

start235

finger tablei succ.

keys2

012

000

start457

124

130

finger tablestart succ.

keys6

012

i

Chord: Routing


Chord’s routing algorithm:

Each node n forwards query for key k clockwiseTo farthest finger preceding k

Until n = predecessor(k) and successor(n) = successor(k)

Return successor(n) to source of query63

4

7

16

1413

19

23

263033

3739

42

45

49

52

5456

60

i 2 i Target Link0 1 53 541 2 54 542 4 56 563 8 60 604 16 4 45 32 20 23

i 2î Target Link0 1 24 261 2 25 262 4 27 303 8 31 334 16 39 395 32 55 56

i 2î Target Link0 1 40 421 2 41 422 4 43 453 8 47 494 16 55 565 32 7 7

45

42

49

i 2î Target Link0 1 43 451 2 44 452 4 46 493 8 50 524 16 58 605 32 10 13 44

lookup (44)lookup (44) = 45

Chord: Self-Organization


Handle changing network environmentFailure of nodes

Network failures

Arrival of new nodes

Departure of participating nodes

Maintain consistent system state for routingKeep routing information up to date

Routing correctness depends on correct successor information

Routing efficiency depends on correct finger tables

Failure tolerance required for all operations

Chord: Failure Tolerance in Storage

Layered designChord DHT mainly responsible for routing

Data storage managed by applicationpersistence

consistency

Chord soft-state approach:Nodes delete (key, value) pairs after timeout

Applications need to refresh (key, value) pairs periodically

Worst case: data unavailable for refresh interval after node failure


Chord: Failure Tolerance in Routing


Finger failures during routingquery cannot be forwarded to finger

forward to previous finger (do not overshoot destination node)

trigger repair mechanism: replace finger with its successor

Active finger maintenanceperiodically check fingers“fix_fingers”

replace with correct nodes onfailures

trade-off: maintenance trafficvs. correctness & timeliness

634

7

16

1413

19

23

263033

3739

42

45

49

52

5456

60

45

42

49

44

Chord: Failure Tolerance in Routing


Successor failure during routingLast step of routing can return node failure to source of query-> all queries for successor fail

Store n successors in successor listsuccessor[0] fails -> use successor[1] etc.

routing fails only if n consecutive nodes fail simultaneously

Active maintenance of successor listperiodic checks similar to finger table maintenance“stabilize” uses predecessor pointer

crucial for correct routing

Chord: Node ArrivalNew node picks IDContact existing node (known from bootstrapping) Initiates search on personal ID -> successor returnedRetrieve (key, value) pairs from successorConstruct finger table via standard routing/lookup()


0

4

26

5

1

3

7

finger tablei succ.

keys1

012

330

start235

finger tablei succ.

keys2

012

000

start457

124

130


keys6

012

i

702

003


keys

012

i

Chord: Node Departure


Deliberate node departureclean shutdown instead of failure

For simplicity: treat as failuresystem already failure tolerant

soft state: automatic state restoration

state is lost briefly

invalid finger table entries: reduced routing efficiency

For efficiency: handle explicitlynotification by departing node to

successor, predecessor, nodes at finger distances

copy (key, value) pairs before shutdown

Chord: Summary


ComplexityMessages per lookup: O(log N)

Memory per node: O(log N)

Messages per management action (join/leave/fail): O(log² N)

AdvantagesTheoretical models and proofs about complexity

Simple & flexible

DisadvantagesNo notion of node proximity and proximity-based routing optimizations

Chord rings may become disjoint in realistic settings

Many improvements publishede.g. proximity, bi-directional links, load balancing, etc.

CAN: Overview


Maps node IDs to regions, which partition d-dimensional space

Keys correspondingly are coordinate points in a d-dim. torus: <k1, …, kd>

Routing from neighbour to neighbour –neighbourhood enhanced in high dimensionality

d tuning parameter of the system

CAN: Space Partitioning


Keys mapped into [0,1]d (or other numerical interval)

Node’s regions always cover the entire torus

Data is placed on node, who owns zone of its key

Zone management isdone by splitting / re-merging regions

Dimensional orderingto retain spatial coherence

CAN Routing


Each node maintains a coordinate neighbour set(Neighbours overlap in (d-1) dim. and abut in the remaining dim.)

Routing is done from neighbour to neighbour along the straight line path from source to destination:

Forwarding is done to that neighbour with coordinate zoneclosest to destination

CAN Node Arrival


The new node

1. Picks a random coordinate

2. Contacts any CAN node and routes a join to the owner of the corresponding zone

3. Splits zone to acquire region of its picked point & learns neighbours from previous owner

4. Advertises its presence to neighbours

Node Failure / Departure

Node failure detected by missing update messages

Leaving gracefully, a node notifies neighbours and copies its content

On node’s disappearance zone needs re-occupation in a size-balancing approach:

Neighbours start timers invers. proportional to their zone size

On timeout a neighbour requests ‘takeover’, responded only by those nodes with smaller zone sizes


CAN Optimisations

Redundancy: Multiple simultaneous coordinate spaces - Realities

Expedited Routing: Cartesian Distance weighted by network-level measures

Path-length reduction: Overloading coordinate zones

Proximity neighbouring: Topologically sensitive construction of overlay (landmarking)

…


CAN: Summary


ComplexityMessages per lookup: O(N1/d )

Messages per mgmt. action (join/leave/fail): O( d/2 N1/d)/O(2 d)

Memory per node: O(d)

AdvantagesPerformance parametrisable through dimensionality

Simple basic principle, easy to analyse & improve

DisadvantagesLookup complexity is not logarithmically bound

Due to its simple construction, CAN is open to many variants, improvements and customisations

References


• Andy Oram (ed.): Peer-to-Peer, O‘Reilly, 2001.• R. Steinmetz, K. Wehrle (eds.): Peer-to-Peer Systems and Applications,

Springer LNCS 3485, 2005. • C.Plaxton, R. Rajaraman, A. Richa: Accessing Nearby Copies of Replicated Objects in a Distributed Environment, Proc. of 9th ACM Sympos. on parallel Algor. and Arch. (SPAA), pp.311-330, June 1997.• I. Stoica, R. Morris, D. Karger, F. Kaashoek, and H. Balakrishnan: Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications. Proc. of the 2001ACM SigComm, pp. 149 – 160, ACM Press, 2001. • S. Ratnasamy, P. Francis, M. Handley, R. Karp: A Scalable Content-Addressable Network. Proc. of the 2001 ACM SigComm, pp. 161 – 172, ACM Press, 2001. .

Introduction to Peer-to-Peer Networks - HAW Hamburgschmidt/maw/p2p... · 2007-03-25 · Connect to...

Documents

Transcript of Introduction to Peer-to-Peer Networks - HAW Hamburgschmidt/maw/p2p... · 2007-03-25 · Connect to...