Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY...

52
Peer-to-Peer Systems Winter semester 2014 Jun.-Prof. Dr.-Ing. Kalman Graffi Heinrich Heine University Düsseldorf

Transcript of Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY...

Page 1: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

Peer-to-Peer Systems

Winter semester 2014 Jun.-Prof. Dr.-Ing. Kalman Graffi Heinrich Heine University Düsseldorf

Page 2: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

Peer-to-Peer Systems

Unstructured P2P Overlay Networks – Unstructured Heterogeneous Overlays

This slide set is based on the lecture "Communication Networks 2" of Prof. Dr.-Ing. Ralf Steinmetz at TU Darmstadt

Page 3: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

3 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Unstructured P2P Structured P2P

Centralized P2P Homogeneous P2P Heterogeneous P2P DHT-Based Heterogeneous P2P

1.  All features of Peer-to-Peer included

2.  Central entity is necessary to provide the service

3.  Central entity is some kind of index/group database

Examples: §  Napster

1.  All features of Peer-to-Peer included

2.  Any terminal entity can be removed without loss of functionality

3.  ! no central entities

Examples: §  Gnutella 0.4 §  Freenet

1.  All features of Peer-to-Peer included

2.  Any terminal entity can be removed without loss of functionality

3.  ! dynamic central entities

Examples: §  Gnutella 0.6 §  Fasttrack §  eDonkey

1.  All features of Peer-to-Peer included

2.  Any terminal entity can be removed without loss of functionality

3.  ! No central entities

4.  Connections in the overlay are “fixed”

Examples: §  Chord §  CAN §  Kademlia

1.  All features of Peer-to-Peer included

2.  Peers are organized in a hierarchical manner

3.  Any terminal entity can be removed without loss of functionality

Examples: •  AH-Chord •  Globase.KOM

from R.Schollmeier and J.Eberspächer, TU München

Unstructured Heterogeneous P2P Overlays

Page 4: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

4 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Principles – Hierarchical / Heterogeneous

Approach: to combine best of both worlds §  Robustness by distributed

indexing §  Fast searches by server

queries

Components §  Supernodes

•  Mini servers / super peers •  Used as servers for queries

–  To build a sub-network between supernodes

–  Queries distributed at sub-network between supernodes

§  “Normal” peers •  Have only overlay

connections to supernodes

++ Advantages §  More robust than

centralized solutions §  Faster searches than in

pure P2P systems

-- Disadvantages §  Need of algorithms to

choose reliable supernodes

Picture from R.Schollmeier and J.Eberspächer, TU München

Page 5: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

5 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

History of P2P Filesharing Networks

8 / 25

© F

rau

nh

ofe

r-G

esellschaft

20

12

Peer-to-Peer Filesharing

Page 6: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

6 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Decentralized File Sharing with Distributed Servers

For example: eDonkey see e.g.

•  http://www.overnet.org/ •  http://www.emule-project.net/ •  http://savannah.gnu.org/projects/mldonkey/

eDonkey file-sharing protocol

§  Most successful/used file-sharing protocol in •  e.g. Germany & France in 2003 [see sandvine.org]

–  52% of generated P2P file sharing traffic –  KaZaA only for 44% in Germany

§  Stopped by law •  February 2006 largest server „Razorback 2.0“ disconnected be

Belgium police –  http://www.heise.de/newsticker/eDonkey-Betreiber-wirft-endgueltig-das-

Handtuch--/meldung/78093

Page 7: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

7 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

The eDonkey Network - Principle

Distributed server(s) §  Set up and RUN BY POWER-USERS

§  à nearly impossible to shut down all servers §  Exchange their server lists with other servers

•  using UDP as transport protocol §  Manages file indices

Client application

§  Connects to one random server and stays connected

§  Using a TCP connection §  Searches are directed to the server §  Clients can also extend their search

•  by sending UDP search messages to additional servers

Page 8: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

8 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Edonkey functionality

eDonkey hash can be used for several queries §  eDonkey server

•  Search for peers –  Servers block requests if too many requests are sent

§  Kad network à additional structure p2p overlay •  Search for peers (including peers behind a firewall)

–  Very efficient (10 requests per second) Queries to peers –  Finds more peers than found using servers

•  Ratings and comments for all Kad peers –  Not used very widely

§  Directly from the peer (requests to a specific file) •  Query for the filename

–  About 65 % of all peers answer with filename •  Ratings and comments of the peer •  Search for further peers

Page 9: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

9 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

The eDonkey Network

Server List Exchange

Download

Extended Search

Search

TCP

UDP

Supernode

Node

Page 10: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

10 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

The eDonkey Network

Procedure §  New servers send

•  their port + IP to other servers (UDP)

§  Servers send •  server lists (other servers

they know) to the clients §  Server lists can also be

downloaded on various websites

Server List Exchange

Download

Extended Search

Search

Supernode

TCP

UDP

Node

Page 11: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

11 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

The eDonkey Network

Files are identified by §  Unique MD4

•  Message-Digest Algorithm4, RFC 1186 file hashes

•  16 byte long §  Are not identified by

filenames

§  This helps in •  Resuming a download

from a different source •  Downloading the same file

from multiple sources at the same time

•  Verification that the file has been correctly downloaded

10 / 25

© F

rau

nh

ofe

r-G

esellschaft

20

12

eDonkey

● Filesharing network with most files

● Centralized P2P network with many eDonkey servers

● Additional DHT: Kad

● eDonkey hash is created directly from file content

Page 12: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

12 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

The eDonkey Network

à the SEARCH consists of two steps

1. Full text search to •  Connected server (TCP) or •  Extended search with UDP to other known servers.

§  Search result are the hashes of matching files 2. Query Sources

•  Query servers for clients offering a file with a certain hash

Later •  Download from these sources

Status: 1,229,568 users, 37,399,014 files (30.08.2012)

Page 13: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

13 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

The eDonkey Network

à the alternate SEARCH consists of two steps

0. Participate in the KAD network 1.  Know MD4 – hash of file 2. Query Sources in KAD

§  Send lookup to node responsible for file hash §  Query responsible node for clients offering the

Later

•  Download from these sources

Status: 600k-2M users, 200M-600M files (30.08.2012)

Page 14: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

14 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Testing the Content in Edonkey Networks

Forensic Test set Consists out of about 1500

files §  Images, music, videos,

documents and miscellaneous

§  Images: Fraunhofer, Windows 7, KDE, 4chan

§  Music: mainly three big music collections

§  Videos: YouTube, Open Source Films, P2P, ...

§  Documents: diverse PDFs, Fraunhofer, BitTorrent

§  Miscellaneaous: Zips, executables, Malware, ...

21 / 25

© F

rau

nh

ofe

r-G

esellschaft

20

12

Hits

● Hitrate: 385 / 1479 (26 %)

Page 15: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

15 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

KaZaA: Decentralized File Sharing with Super Nodes

see §  www.kazaa.com, gift.sourceforge.net, http://www.my-k-lite.com/

System §  Developer: Fasttrack §  Clients: KaZaA

Properties: §  Most successful P2P network in USA in 2002/3

Architecture: neither completely central nor decentralized §  Supernodes to reduce communication overhead

Ca. 525.00010528 Mio.120.000Gnutella600.000650-260013 Mio.230.000eDonkey4 Mio.3550472 Mio.2,6 Mio.Fasttrack

#downloads(from

download.comterabytes#files#usersP2P system

Ca. 525.00010528 Mio.120.000Gnutella600.000650-260013 Mio.230.000eDonkey4 Mio.3550472 Mio.2,6 Mio.Fasttrack

#downloads(from

download.comterabytes#files#usersP2P system

Numbers are from 10‘2002

Page 16: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

16 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Decentralized File Sharing with Super Nodes

Examples: KaZaA, Gnutella 0.6 (Morpheus, mldonkey) Peers

§  Connected only to some super nodes §  Send IP address and file names only to super peers

Super nodes - super peers: §  Peers with high-performance network connections §  Take the role of the central server and proxy for simple peers §  Answer search messages for all peers (reduction of comm. load) §  One or more supernodes can be removed without problems

Additionally, the communication between nodes is encrypted

Download

Service Delivery

Search

Superpeer

Peer

Search

Page 17: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

17 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Example for KaZaA

Page 18: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

18 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Decentralized File Sharing with Complete Files

At the beginning

Later

Drone 1 has §  50 KB/s upload rate §  not utilized

until he has whole file

From www.wtata.com

Drone 1 receives §  25% of the file §  at 12,5 KB/s rate

Queen Bee has §  100 MB file §  50 KB/s upload rate

in total

Page 19: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

19 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Issues with KaZaA / Gnutella 0.6

Keyword-based search §  You do not know what you get §  Pollution a problem

•  Music companies flooded the network with false files •  Chance to get a “good” file ~ 10% •  Problem for “small” files

Full file download before uploading

§  User go offline after download finished §  Only few uploaders online §  Problem for “large” files

Page 20: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

20 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Google Trends for KaZaA, Limewire, Torrent, Emule

http://www.google.com/trends?q=kazaa,+limewire,+torrent,+emule

Page 21: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

21 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Unstructured Hybrid Resource Sharing: Skype

Offered Services §  IP Telephony features §  File exchange §  Instant Messaging

Features §  KaZaA technology §  Encrypted high media quality §  Support for teleconferences §  Multi-platform

Further Information §  Very popular, low-cost IP

telephony §  SkypeOut extension to call regular

phone numbers (not free) §  Great business potential if

combined with free WIFIs Oct.2011 bought by Microsoft

§  Super nodes are now servers From www.skype.com

Very popular

Page 22: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

22 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Skype

Network Architecture §  formerly KaZaA based

Skype login server

message exchange at login

super node

regular node

Page 23: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

23 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Exercise

21

3

5

4

7

6

8

5

43

5

1

6

2

8

1

7

43

52

6

2 1

7

910

21 3

4

4

76

8

5

910

43

536

2 1

78

... ... ...

... ... ...

... ... ...

... ... ...

Page 24: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

24 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Problem 2.2 - Super Hyper Hierarchical Networks

Let us imagine a hierarchical overlay with three hierarchy steps: there are normal peers, super peers, and hyper peers.

a) Number of hyper peers needed §  Assume that a super peer cares for 100 normal peers, and a

hyper peer is responsible for 100 super peers. How many hyper peers would we need in a network of 999 000 peers in total?

b) Querying in the hyper-super-overlay §  Suggest a way how such a network could handle search queries.

Which information should a super peer maintain? What should a hyper peer know? How would a search query be processed?

Page 25: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

Peer-to-Peer Systems

Structured Homogenous P2P Overlay Networks – Distributed Indexing and DHTs

This slide set is based on the lecture "Communication Networks 2" of Prof. Dr.-Ing. Ralf Steinmetz at TU Darmstadt

Page 26: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

26 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Unstructured P2P Structured P2P

Centralized P2P Pure P2P Hybrid P2P DHT-Based Hybrid P2P

1.  All features of Peer-to-Peer included

2.  Central entity is necessary to provide the service

3.  Central entity is some kind of index/group database

Examples: §  Napster

1.  All features of Peer-to-Peer included

2.  Any terminal entity can be removed without loss of functionality

3.  ! no central entities

Examples: §  Gnutella 0.4 §  Freenet

1.  All features of Peer-to-Peer included

2.  Any terminal entity can be removed without loss of functionality

3.  ! dynamic central entities

Examples: §  Gnutella 0.6 §  Fasttrack §  eDonkey

1.  All features of Peer-to-Peer included

2.  Any terminal entity can be removed without loss of functionality

3.  ! No central entities

4.  Connections in the overlay are “fixed”

Examples: §  Chord §  CAN §  Kademlia

1.  All features of Peer-to-Peer included

2.  Peers are organized in a hierarchical manner

3.  Any terminal entity can be removed without loss of functionality

Examples: •  AH-Chord •  Globase.KOM

from R.Schollmeier and J.Eberspächer, TU München

Structured P2P Overlays: Principles

Page 27: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

27 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Structured Overlay Networks: Interconnection Networks

Structured Overlay Networks §  Give peers and objects (unique) identifier

•  PeerIDs and ObjectIDs shall be from the SAME key set •  Each peer is responsible for a specific range of ObjectIDs

§  Indexing (knowledge on location of resources) to be distributed §  No search needed anymore (local indexing) §  No server knowing all (global indexing) available

New challenge: to find peer(s) with specific ID in overlay

§  Lookup: •  “Route” queries across the overlay network to peer with specific ID

§  Once peer is found •  Initiate direct communication •  Upload / download resources

Page 28: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

28 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Functions in a Structured P2P Overlay (all)

IsMyKey(K) à true if node is responsible for Key K Route(K, M, hint) à send message M to node responsible for K

§  Hint: Optional first hop GetNodeHandle (K, hint) à get contact details of responsible node Send(M, q) à Send Message M to node q

Page 29: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

29 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Schematic View on Distributed Hash Table

Page 30: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

30 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Additional Functions in a Distributed Hash Table

Put (Data D, Key K) à Copies Data to node responsible for K GetData (Key K) à Gets Data stored under the Key K Optional further functions:

§  Replication §  Secure Communication §  Access Control

H(„my data“)= 3107

2207

7.31.10.25

peer-to-peer.info

12.5.7.31

95.7.6.10

86.8.10.18

planet-lab.orgberkeley.edu

29063485

201116221008709

611

89.11.20.15

?

Page 31: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

31 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Look up in Structured P2P Systems

Principle §  Location of the objects is found via routing

•  � Node A (provider) advertises object at responsible peer B »  Advertisement is routed to B.

•  � Node C looking for object sends query »  Query is routed to responsible node.

•  � Node B replies to C by sending contacting information of A

1. Publish link at responsible Peer

3. P2P com-munication. Get link to object.

2. “Routing” to / Lookup of desired Object

?

Node A

Node B

Node C

Page 32: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

32 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Strategies for Data Retrieval: Distributed Indexing

Goal is scalable complexity for §  Communication effort: O(log(N)) hops §  Node state: O(log(N)) routing entries

H(„my data“)= 3107

2207

7.31.10.25

peer-to-peer.info

12.5.7.31

95.7.6.10

86.8.10.18

planet-lab.orgberkeley.edu

29063485

201116221008709

611

89.11.20.15

?

Routing in O(log(N)) steps to the node storing the data

Nodes store O(log(N)) routing information to other nodes

The content of this slide has been adapted from “Peer-to-Peer Systems and Applications”, ed. by Steinmetz, Wehrle

Page 33: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

33 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Recall Hash Function & Hash Table

Hash function H(x) §  maps

•  large input domain §  onto

•  smaller target domain/range (most often subset of integer)

§  such that •  we get few collisions •  i.e.

–  it would be possible to uniquely identify most of these strings using this hash

Hash table

§  data structure that provides fast lookup

§  of a record indexed by a key §  where

•  the domain of the key is too large for simple indexing; as would occur if an array were used

Like arrays, hash tables can provide O(1) lookup §  with respect to the number

of records in the table.

And .. Question

§  IF H(x) ≠ H(y) •  THEN (implies) x ≠ y ?

(yes)

§  IF H(x) = H(y) •  THEN (implies) x = y ?

(no)

Page 34: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

34 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Recall Hash Tables & Hash Functions

Hash tables are a well-known data structure §  A fixed-size array §  Elements of array also called “hash buckets”

Properties

§  allow insertions §  allow deletions §  allow to find entry in constant (average) time

Hash functions

§  map keys to elements onto (in) the array Properties of good hash functions:

§  Fast to compute §  Good distribution of keys into hash table §  Example: SHA-1 algorithm

•  SHA = Secure Hash Algorithm

Page 35: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

35 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Hash Tables: An Example

Assume a network of N=10 nodes Hash function:

§  hash(x) = x mod 10 (=N)

Example

§  Insert numbers 0, 1, 4, 9, §  16, and 25

Properties

§  Easy to find if a given key is present in the table

Hash Table, an example

Keys Values

2 3 4 5 6 7 8 9

1 0 0

1

4 25 16

9

Page 36: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

36 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Hash Tables: An Example

Drawback of the example §  Collisions are likely to happen

§  Time to search grows linearly

with amount of peers

§  To insert and remove a peer scales also linearly

§  Hash function must be adapted to the amount of available peers and it is extremely time consuming

Distributed Hash Table DHT §  Huge hash table (2^160 entries) §  Assigns concatenated input

RANGE to peers •  (instead of individual numbers)

Hash Table, an example

2 3 4 5 6 7 8 9

1 0 0

1

4 25 16

9

Keys Values

Page 37: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

37 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Design Aspects for Distributed Hash Tables

1. Choice of an identifier space 2. Mapping of resources and peers to the identifier space 3. Management of the identifier space by the peers 4. Graph embedding (structure of the logical network) 5. Routing strategy 6. Maintenance strategy

Page 38: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

38 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Group of peers P Group of resources R Overlay maps resources R and peers P on identifier space I Example I: §  Chord: [0, 2^160[ §  Pastry: [0, 2^128[ §  CAN: multidimensional

Overlay Network: Design Decisions

IPFP →:

IRFR →:

Page 39: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

39 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

(1) Choice of Identifier Space

Importance of Identifier Space: §  Identifier space needed for addressing resources and peers

•  Often: ID (object ) = hash (object content) §  Identifier space should be large to support large systems §  Identifier space independent from physical location of peer à

mobility of peers §  Clustering of resources due to closeness metric of identifier space §  Message routing uses identifier space

Page 40: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

40 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

(1) Choice of Identifier Space

à Main addressing ID space The identifier space must posses closeness metric d: Which MUST satisfy the following conditions: And SHOULD satisfy the following conditions:

R: →× IId

( ) 0,:, ≥∈∀ yxdIyx( ) 0,: =∈∀ xxdIx( ) yxyxdIyx =→=∈∀ 0,:,

( ) ( )xydyxdIyx ,,:, =∈∀( ) ( ) ( )zydyxdzxdIzyx ,,,:,, +≤∈∀

Page 41: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

41 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

(2) Mapping to the Identifier Space

à Assigning ID addresses to peers (à Peer ID) Possible design decisions: Completeness: may be complete or partial Identifier space should be injective Dynamicity: may be fixed or change dynamically over time

)()(:, qFpFqpPqp PP ≠⇒≠∈∀

PF

PF

Page 42: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

42 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

(3) Management of Identifier Space

Peers are responsible for resource identifiers Identifier space I is managed by peers P: Responsibility function: Which associates

§  the identifiers of a resource §  with a set of peers managing the resource

Through M §  each peer p is assigned responsibility §  for a set of identifier

Locating a resource corresponds to finding a peer p in

PIM 2: →

))(( IrFi R ∈=

)(1 pM −

))(( rFM R

Rr∈

Page 43: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

43 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Responsibility Function

Basic properties of M: Completeness: Cardinality:

§  One or more peers are responsible for given identifier M §  OFTEN induced by proximity: identifiers are associated with peers

that are numerically closest

Dynamicity:

§  the responsibility function typically changes as peers join and leave Uniformity / non-uniformity of replication

)(:: iMpPpIi ∈∈∃∈∀

)),((min)),(()( iqFdipFdiMp PPqp∈

=⇒∈

Page 44: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

44 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

(4) Graph Embedding

à Creating a graph / network with the peers Overlay can be modeled as a graph G=(P,E) A neighborhood function

defines the neighbor set N(p) of the peer p Which means that maintains all connected nodes as

neighbors: Typically: Here the overlays differ in the implementation

§  Which nodes to connect to §  How to set up the routing table

PPN 2: →

Pp∈EqppNq ∈∈∃ ),(:)(

Page 45: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

45 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

(4) Graph Embedding – Desired properties

Desired properties of the graph are: Graph diameter:

§  a small graph diameter should provide lower bounds for latencies during routing

Connectivity: §  the graph should be connected at any time

Local connectivity: §  a peer should be connected to a subset of its immediate neighbors

Long-range connectivity: §  overlay connectivity should be structurally similar to small world graphs and

should satisfy the condition:

§  with d denoting the dimension of the identifier space §  (Many “close” contacts, few “far away” contacts)

[ ] dPP qFpFdpNqP −≈∈ ))(),((

1)(

Page 46: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

46 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

(5) Routing strategy

à Routing in the network to the queried identifier Routing is modeled as asynchronous message passing

§  which forwards a message m with identifier i to peer p

route(p,i,m) A routing strategy defined as a non-deterministic function:

Which selects at

§  A given element of I as destination ID §  a given peer p §  with neighborhood N(p) §  the set of next peers

PIPR 2: →×

)(),( pNipR ∈

Page 47: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

47 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Routing strategy: Greedy routing

Mostly used in structured overlays For a routing step from peer p to peer q with the follow condition holds: Which means that

§  the distance to the target §  after one routing step is §  less or equal to the distance before

•  (Ideally: distance is halved in order to reach O(log(N)) routing steps)

)),(()),(( iqFdipFd Pp ≤

),( ipRq∈

Page 48: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

48 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

(6) Maintenance Strategy

à Keeping the routing information up-to-date Maintenance strategies can be classified into:

§  Proactive correction (e.g. using heartbeat messages) §  Reactive mechanisms

•  Correction on use •  Correction on failure •  Correction on change

Goal for maintenance strategy: §  Sufficient level of consistency §  Minimize effort

Page 49: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

49 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Design Concepts for P2P Overlays

Key design concepts : •  choice of an identifier space •  mapping of resources and peers to the identifier space •  management of the identifier space by the peers •  graph embedding (structure of the logical network, selection of

contacts) •  routing strategy •  maintenance strategy

Unstructured vs. structured §  Structured: object IDs and peer IDs share same ID space

•  Every object ID is assigned to one single peer •  Lookup possible := routing to peer being responsible for desired

object ID

Page 50: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

50 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Motivation Distributed Indexing

Communication overhead i.e.

§  no. of hops vs. §  State(s) of node

•  (i.e. amount of routing entries stored in node, e.g. server)

Com

mun

icat

ion

Ove

rhea

d

State(s) of Node

Flooding

Central Server

O(Nk)

O(N) O(1)

O(1)

O(log N)

O(log N)

Bottleneck: • Communication Overhead

• False negatives (viele Fehlmeldungen) Bottlenecks: • Memory, CPU, Network • Availability ?

Scalable solution between both extremes?

The content of this slide has been adapted from “Peer-to-Peer Systems and Applications”, ed. by Steinmetz, Wehrle

Page 51: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

51 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Motivation Distributed Indexing

Communication overhead vs. node state

Com

mun

icat

ion

Ove

rhea

d

Node State

Flooding

Central Server

O(N)

O(N) O(1)

O(1)

O(log N)

O(log N)

Bottleneck: • Communication Overhead

• False negatives Bottlenecks: • Memory, CPU, Network • Availability Distributed

Hash Table

Scalability: O(log N) No false negatives

§  i.e. Never ( answer YES .. if it is NOT there )

More resistant against changes §  Failures, Attacks §  Short time users

The content of this slide has been adapted from “Peer-to-Peer Systems and Applications”, ed. by Steinmetz, Wehrle

Page 52: Peer-to-Peer Systems · The eDonkey Network - Principle Distributed server(s) ! Set up and RUN BY POWER-USERS ! " nearly impossible to shut down all servers ! Exchange their server

52 HHU – Technology of Social Networks – JProf. Dr. Kalman Graffi – Peer-to-Peer Systems – http://tsn.hhu.de/teaching/lectures/2014ws/p2p.html

Fundamentals of Distributed Hash Tables

Challenges for designing DHTs: 1. Desired Characteristics

§  Flexibility §  Reliability §  Scalability

2. Load balancing

§  Equal content “load” for all nodes §  vs. content load proportional to node capacity §  vs. content load proportional to content consumption

3. Permanent adaptation to faults, join, leave of nodes

§  Assignment of responsibilities to new nodes §  Re-assignment and re-distribution of responsibilities

in case of node failure or departure