An Introduction to Peer-to-Peer networks Diganta Goswami IIT Guwahati

Click here to load reader

download An Introduction to Peer-to-Peer networks Diganta Goswami IIT Guwahati

of 197

  • date post

  • Category


  • view

  • download


Embed Size (px)

Transcript of An Introduction to Peer-to-Peer networks Diganta Goswami IIT Guwahati

  • Slide 1

An Introduction to Peer-to-Peer networks Diganta Goswami IIT Guwahati Slide 2 2 Outline Overview of P2P overlay networks Applications of overlay networks Classification of overlay networks Structured overlay networks Unstructured overlay networks Overlay multicast networks 2 Slide 3 3 Overview of P2P overlay networks What is P2P systems? P2P refers to applications that take advantage of resources (storage, cycles, content, human presence) available at the end systems of the internet. What is overlay networks? Overlay networks refer to networks that are constructed on top of another network (e.g. IP). What is P2P overlay network? Any overlay network that is constructed by the Internet peers in the application layer on top of the IP network. 3 Slide 4 4 What is P2P systems? Multiple sites (at edge) Distributed resources Sites are autonomous (different owners) Sites are both clients and servers Sites have equal functionality 4 Slide 5 5 Internet P2P Traffic Statistics Between 50 and 65 percent of all download traffic is P2P related. Between 75 and 90 percent of all upload traffic is P2P related. And it seems that more people are using p2p today So what do people download? 61.4 % video 11.3 % audio 27.2 % games/software/etc. Source: statistics/ statistics/ 5 Slide 6 6 P2P overlay networks properties Efficient use of resources Self-organizing All peers organize themselves into an application layer network on top of IP. Scalability Consumers of resources also donate resources Aggregate resources grow naturally with utilization 6 Slide 7 7 P2P overlay networks properties Reliability No single point of failure Redundant overlay links between the peers Redundant data source Ease of deployment and administration The nodes are self-organized No need to deploy servers to satisfy demand. Built-in fault tolerance, replication, and load balancing No need any change in underlay IP networks 7 Slide 8 8 P2P Applications P2P File Sharing Napster, Gnutella, Kazaa, eDonkey, BitTorrent Chord, CAN, Pastry/Tapestry, Kademlia P2P Communications Skype, Social Networking Apps P2P Distributed Computing Seti@home 8 Slide 9 9 Popular file sharing P2P Systems Napster, Gnutella, Kazaa, Freenet Large scale sharing of files. User A makes files (music, video, etc.) on their computer available to others User B connects to the network, searches for files and downloads files directly from user A Issues of copyright infringement 9 Slide 10 10 P2P/Grid Distributed Processing seti@home Search for ET intelligence Central site collects radio telescope data Data is divided into work chunks of 300 Kbytes User obtains client, which runs in background Peer sets up TCP connection to central computer, downloads chunk Peer does FFT on chunk, uploads results, gets new chunk Not P2P communication, but exploit Peer computing power 10 Slide 11 11 Key Issues Management How to maintain the P2P system under high rate of churn efficiently Application reliability is difficult to guarantee Lookup How to find out the appropriate content/resource that a user wants Throughput Content distribution/dissemination applications How to copy content fast, efficiently, reliably 11 Slide 12 12 Management Issue A P2P network must be self-organizing. Join and leave operations must be self-managed. The infrastructure is untrusted and the components are unreliable. The number of faulty nodes grows linearly with system size. Tolerance to failures and churn Content replication, multiple paths Leverage knowledge of executing application Load balancing Dealing with free riders Freerider : rational or selfish users who consume more than their fair share of a public resource, or shoulder less than a fair share of the costs of its production. 12 Slide 13 13 Lookup Issue How do you locate data/files/objects in a large P2P system built around a dynamic set of nodes in a scalable manner without any centralized server or hierarchy? Efficient routing even if the structure of the network is unpredictable. Unstructured P2P : Napster, Gnutella, Kazaa Structured P2P : Chord, CAN, Pastry/Tapestry, Kademlia 13 Slide 14 14 Classification of overlay networks Structured overlay networks Are based on Distributed Hash Tables (DHT) the overlay network assigns keys to data items and organizes its peers into a graph that maps each data key to a peer. Unstructured overlay networks The overlay networks organize peers in a random graph in flat or hierarchical manners. Overlay multicast networks The peers organize themselves into an overlay tree for multicasting. 14 Slide 15 15 Structured overlay networks Overlay topology construction is based on NodeIDs that are generated by using Distributed Hash Tables (DHT). The overlay network assigns keys to data items and organizes its peers into a graph that maps each data key to a peer. This structured graph enables efficient discovery of data items using the given keys. It Guarantees object detection in O(log n) hops. 15 Slide 16 16 Unstructured P2P overlay networks An Unstructured system composed of peers joining the network with some loose rules, without any prior knowledge of the topology. Network uses flooding or random walks as the mechanism to send queries across the overlay with a limited scope. 16 Slide 17 17 Unstructured P2P File Sharing Networks Centralized Directory based P2P systems Pure P2P systems Hybrid P2P systems 17 Slide 18 18 Unstructured P2P File Sharing Networks Centralized Directory based P2P systems All peers are connected to central entity Peers establish connections between each other on demand to exchange user data (e.g. mp3 compressed data) Central entity is necessary to provide the service Central entity is some kind of index/group database Central entity is lookup/routing table Examples: Napster, Bittorent 18 Slide 19 19 Napster was used primarily for file sharing NOT a pure P2P network=> hybrid system Ways of action: Client sends server the query, server ask everyone and responds to client Client gets list of clients from server All Clients send IDs of the data they hold to the server and when client asks for data, server responds with specific addresses peer downloads directly from other peer(s) 19 Slide 20 20 Centralized Network Napster model Nodes register their contents with server Centralized server for searches File access done on a peer to peer basis Poor scalability Single point of failure Client Server Client Query Reply File Transfer 20 Slide 21 21 Napster Further services: Chat program, instant messaging service, tracking program, Centralized system Single point of failure => limited fault tolerance Limited scalability (server farms with load balancing) Query is fast and upper bound for duration can be given 21 Slide 22 22 Gnutella pure peer-to-peer very simple protocol no routing "intelligence" Constrained broadcast Life-time of packets limited by TTL (typically set to 7) Packets have unique ids to detect loops 22 Slide 23 23 Query flooding: Gnutella fully distributed no central server public domain protocol many Gnutella clients implementing protocol overlay network: graph edge between peer X and Y if theres a TCP connection all active peers and edges is overlay net Edge is not a physical link Given peer will typically be connected with < 10 overlay neighbors 23 Slide 24 24 Gnutella: protocol Query QueryHit Query QueryHit Query QueryHit File transfer: HTTP r Query message sent over existing TCP connections r peers forward Query message r QueryHit sent over reverse path Scalability: limited scope flooding 24 Slide 25 25 Gnutella : Scenario Step 0: Join the network Step 1: Determining who is on the network "Ping" packet is used to announce your presence on the network. Other peers respond with a "Pong" packet. Also forwards your Ping to other connected peers A Pong packet also contains: an IP address port number amount of data that peer is sharing Pong packets come back via same route Step 2: Searching Gnutella "Query" ask other peers (usually 7) if they have the file you desire A Query packet might ask, "Do you have any content that matches the string Hey Jude"? Peers check to see if they have matches & respond (if they have any matches) & send packet to connected peers if not (usually 7) Continues for TTL (how many hops a packet can go before it dies, typically 10 ) Step 3: Downloading Peers respond with a QueryHit (contains contact info) File transfers use direct connection using HTTP protocols GET method 25 Slide 26 26 Gnutella: Peer joining 1. Joining peer X must find some other peer in Gnutella network: use list of candidate peers 2. X sequentially attempts to make TCP with peers on list until connection setup with Y 3. X sends Ping message to Y; Y forwards Ping message. 4. All peers receiving Ping message respond with Pong message 5. X receives many Pong messages. It can then setup additional TCP connections 26 Slide 27 27 Gnutella - PING/PONG 1 5 2 4 3 6 7 8 Ping 1 Known Hosts: 2 3,4,5 6,7,8 Pong 2 Pong 4 Pong 3 Pong 5Pong 3,4,5 Pong 6,7,8 Pong 6 Pong 7 Pong 8 Pong 6,7,8 Query/Response analogous 27 Slide 28 28 Unstructured Blind - Gnutella = forward query = processed query = source = found result = forward response Breadth-First Search (BFS) 28 Slide 29 29 Unstructured Blind - Gnutella A node/peer connects to a set of Gnutella neighbors Forward queries to neighbors Client which has the Information responds. Flood network with TTL for termination + Results are complete Bandwidth wastage 29 Slide 30 30 Gnutella : Reachable Users (analytical estimate) T : TTL, N : Neighbors for Query 30 Slide 31 31 Gnutella : Search Issue Flooding based search is extremely wasteful with bandwidth A large (linear) part of the network is covered irrespective of hits found Enormous number of redundant messages All users do this in parallel: local load grows linearly with size Wh