Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems

28
ORGANISASI KEHIDUPAN

description

Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems. Bruce Hammer, Steve Wallis, Raymond Ho. 10.1:Introduction. Peer-to-Peer Systems Where data and computational resources are contributed by many hosts - PowerPoint PPT Presentation

Transcript of Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems

Page 1: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Distributed Systems Concepts and Design Chapter 10: Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho

Page 2: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 2

10.1: Introduction Peer-to-Peer Systems

Where data and computational resources are contributed by many hosts

Objective to balance network traffic and reduce the load on the primary host

Management requires knowledge of all hosts, their accessibility, (distance in number of hops), availability and performance.

They exploit existing naming, routing, data replication and security techniques in new ways

Page 3: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 3

10.1: Introduction Goal of Peer-to-Peer Systems

Sharing data and resources on a very large scale ‘Applications that exploit resources available at

the edges of the Internet – storage, cycles, content, human presence’ (Shirky 2000)

Uses data and computing resources available in the personal computers and workstations

Page 4: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 4

10.1: Introduction Characteristics of Peer-to-Peer Systems

Each computer contributes resources All the nodes have the same functional

capabilities and responsibilities No centrally-administered system Offers a limited degree of anonymity Algorithm for placing and accessing the data

Balance workload, ensure availability Without adding undue overhead

Page 5: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 5

10.1: Introduction Evolution of Peer-to-Peer Systems

Napster – download music, return address Freenet, Gnutella, Kazaa and BitTorrent

More sophisticated – greater scalability, anonymity and fault tolerance

Pastry, Tapestry, CAN, Chord, Kademlia Peer-to-peer middleware

Page 6: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 6

10.1: Introduction Evolution (Continued)

Immutable Files, (music, video) GUIDs (Globally Unique Identifiers) Middleware to provide better routing algorithms,

react to outages Evolve to mutable files Application within one company’s intranet

Page 7: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 7

10.2: Napster and its Legacy Napster

Provided a means for users to share music files – primarily MP3s

Launched 1999 – several million users Not fully peer-to-peer since it used central servers to

maintain lists of connected systems and the files they provided, while actual transactions were conducted directly between machines

Proved feasibility of a service using hardware and data owned by ordinary Internet users

Page 8: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 8

10.2: Napster and its Legacy

Napster serverIndex1. File location

2. List of peers

request

offering the file

peers

3. File request

4. File delivered5. Index update

Napster serverIndex

Page 9: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 9

10.2: Napster and its Legacy Bit Torrent

Designed and implemented 2001 Next generation from Napster - true Peer To Peer (P2P) Can handle large files e.g WAV, DVD, FLAC (e.g 1CD =

approx 500KB) After the initial pieces transfer from the seed, the pieces

are individually transferred from client to client. The original seeder only needs to send out one copy of the file for all the clients to receive a copy

Tracker URL hosted at Bit Torrent site e.g Traders Den

Page 10: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 10

10.2: Napster and its Legacy Bit Torrent (contd)

Many Bit Torrent clients e.g Vuze Keep track of seeders and leechers Torrent – contains metdata about the files to be shared

and about the tracker Tracker - coordinates the file distribution, and which

controls which other peers to download the pieces of the file.

Page 11: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 11

10.2: Napster and its Legacy

Page 12: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 12

10.2: Napster and its Legacy

Page 13: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 13

10.2: Napster and Its Legacy

Page 14: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 14

10.2: Napster and its Legacy

Page 15: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 15

10.2: Napster and its Legacy

Page 16: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 16

10.2: Napster and its Legacy

Page 17: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 17

10.3: Peer-to-Peer Middleware Peer To Peer Middleware

To provide mechanism to access data resources anywhere in network

Functional Requirements : Simplify construction of services across many hosts in wide

network Add and remove resources at will Add and remove new hosts at will Interface to application programmers should be simple and

independent of types of distributed resources

Page 18: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 18

10.3: Peer-to-Peer Middleware Peer To Peer Middleware (contd)

Non-Functional Requirements : Global Scalability Load Balancing Optimization for local interactions between neighboring peers Accommodation to highly dynamic host availability Security of data in an environment simplify construction of

services across many hosts in wide network Anonymity, deniability and resistance to censorship

Page 19: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 19

10.3: Peer-to-Peer Middleware Peer To Peer Middleware (contd)

Global scalability, dynamic host availability and load sharing and balancing across large numbers of computers pose major design challenges.

Design of Middleware layer Knowledge of locations of objects must be distributed

throughout network Use of replication to achieve this

Page 20: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 20

10.4: Routing Overlays Routing Overlays

Sub-systems, APIs, within the peer-to-peer middleware Responsible for locating nodes and objects Implements a routing mechanism in the application layer

Separate from any other routing mechanisms such as IP routing

Ensures that any node can access any object by routing each request thru a sequence of nodes Exploits knowledge at each node to locate the destination

Page 21: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 21

10.4: Routing Overlays GUIDs

‘pure’ names or opaque identifiers Reveal nothing about the locations of the objects Building blocks for routing overlays

Computed from all or part of the state of the object using a function that deliver a value that is very likely to be unique. Uniqueness is then checked against all other GUIDs

Not human readable

Page 22: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 22

10.4: Routing Overlays Tasks of a routing overlay

Client submits a request including the object GUID, routing overlay routes the request to a node at which a replica of the object resides

A node introduces a new object by computing its GUID and announces it to the routing overlay

Clients can remove an object Nodes may join and leave the service

Page 23: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 23

10.4: Routing Overlays Types of Routing Overlays

DHT – Distributed Hash Tables Computes the GUID from all or part of the state of the object

DOLR – Distributed Object Location and Routing DOLR is a layer over the DHT that maps GUIDs and address of

nodes

DHT – GUIDs are stored based on the hash value DOLR – GUIDs host address is notified using the

Publish() operation

Page 24: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 24

10.5: Overlay Case Studies: Pastry, Tapestry

Both Pastry and Tapestry adopt the prefix routing approach

Pastry has a straightforward but effective design. It is the message routing infrastructure deployed in applications such as PAST, an archival file system

Page 25: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 25

10.5: Overlay Case Studies: Pastry, Tapestry

Tapestry is the basis for OceanStore storage system. It has a more complex architecture than Pastry because it aims to support a wider range of locality approaches

Page 26: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 26

10.5: Overlay Case Studies: Pastry, Tapestry

Let’s talk about Pastry

Page 27: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 27

10.5: Overlay Case Studies: Pastry, Tapestry

Pastry A routing overlay with the common

characteristics All the nodes and objects are assigned 128-bit

GUIDs Nodes are computed by applying a secure hash

function such as SHA-1 to the public key with each node is provided

Page 28: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 28

10.5: Overlay Case Studies: Pastry, Tapestry

Objects such as files he GUIDs is computed by a secure hash function to the object’s name or to some part of the object’s stored state

The resulting GUID has the usual properties of secure hash values randomly distributed in the range 0 to 2128

-1 In a network with N participating nodes, the

Pastry routing algorithm will correctly route a message addressed to an GUID in O(log N) steps

Page 29: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 29

10.5: Overlay Case Studies: Pastry, Tapestry

GUID delivers message to an identified active node, otherwise, delivers to another active node numerically closest to the original one

Active nodes take responsibility of rprocessing requests addressed to al objects in their numerical neighborhood

Routing steps involve the user of an underlying transport protocol (normally UDP) to transfer the message to a Pastry node that is ‘closer’ to its destination

Page 30: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 30

10.5: Overlay Case Studies: Pastry, Tapestry

The real transport of a message across the Internet between two Pastry nodes may requires a substantial number of IP hops

Pastry users a locality metric based on network distance in the underlying network to select appropriate neighbors when setting up the routing tables used at each node

Page 31: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 31

10.5: Overlay Case Studies: Pastry, Tapestry

The participated Hosts are fully self organizing and obtaining the data need to construct a routing table and other required state from existing members in O(log N) messages, where N is the number of hosts participating in the overlay

When a node fails, the remaining nodes can detect its absence and cooperatively reconfigure to reflect the required changes in the routing structure

Page 32: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 32

10.5: Overlay Case Studies: Pastry, Tapestry

Pastry Routing Algorithm The algorithm involves the user of a routing table

at each node to route messages efficiently. Describe the algorithm in two stages

Stage 1: Simplified form to routes messages correctly but inefficiently without a routing table

Stage 2: Describe full routing algorithm with routing table which routes a request to any node in O(log N) messages

Page 33: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 33

10.5: Overlay Case Studies: Pastry, Tapestry

Stage 1: Each active node stores a leaf set – a vector L (of

size of 2l) The vector contains the GUIDs and IP addresses

of the nodes whose GUIDs are numerically closest on either side of its own (l above and l below)

Leaf sets are maintained by Pastry as nodes join and leave

Page 34: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 34

10.5: Overlay Case Studies: Pastry, Tapestry

Even after a node failure they will be corrected within a short time within the defined maximum rate of failure

The GUID space is treated as circular

Page 35: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 35

10.5: Overlay Case Studies: Pastry, Tapestry Stage 1: Circular routing alone is correct but

inefficient

Node A (65A1FC) receives message M with destination address D (D46A1C)

Destination D

Page 36: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 36

10.5: Overlay Case Studies: Pastry, Tapestry

Stage 2: Full Pastry algorithm Efficient routing is achieved with the aid of

routing tables Each Pastry node maintains a tree-structured

routing table giving GUIDs and IP address for a set of nodes spread throughout the entire range of 2128 possible GUID values

Page 37: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 37

10.5: Overlay Case Studies: Pastry, Tapestry

Structure of a routing table

Page 38: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 38

10.5: Overlay Case Studies: Pastry, Tapestry Routing a message with the aid of routing table and the message can be

delivered in ~log 16 (N) hops.

Page 39: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 39

10.5: Overlay Case Studies: Pastry, Tapestry

Pastry’s routing algorithm

Page 40: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 40

10.5: Overlay Case Studies: Pastry, Tapestry

Host integration New nodes use a joining protocol to acquire their

routing table and leaf set contents Notify other nodes of changes they must make to

their tables.

Page 41: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 41

10.5: Overlay Case Studies: Pastry, Tapestry

Host failure or departure Nodes in Pastry infrastructure may fail or depart

without warning A node is considered failed when its immediate

neighbours can no longer communicate with it Required to repair the leaf sets that contain the

failed node’s GUID

Page 42: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 42

10.5: Overlay Case Studies: Pastry, Tapestry

Locality The locality metric is used to compare candidates and the

closest available node is chosen This mechanism cannot produce globally optimal routings

because available information is not comprehensive

Page 43: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 43

10.5: Overlay Case Studies: Pastry, Tapestry

Fault tolerance Use ‘at-least-once’ delivery mechanism and

repeat several time sin the absence of a response to allow Pastry a longer time window to detect and repair node failures

Page 44: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 44

10.5: Overlay Case Studies: Pastry, Tapestry

Both Pastry and Tapestry adopt the prefix routing approach Pastry has a straightforward but effective design. Tapestry has a more complex architecture than

Pastry because it aims to support a wider range of locality approaches

Page 45: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 45

10.5: Overlay Case Studies: Pastry, Tapestry

Let’s talk about Pastry

Page 46: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 46

10.5: Overlay Case Studies: Pastry, Tapestry

Pastry A routing overlay network All the nodes and objects are assigned 128-bit

GUIDs Nodes are computed by applying a secure hash

function such as SHA-1 to the public key with each node is provided

Page 47: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 47

10.5: Overlay Case Studies: Pastry, Tapestry

Objects such as files the GUIDs is computed by a secure hash function to the object’s name or to some part of the object’s stored state

The resulting GUID has the usual properties of secure hash values randomly distributed in the range 0 to 2128

-1 In a network with N participating nodes, the

Pastry routing algorithm will correctly route a message addressed to an GUID in O(log N) steps

Page 48: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 48

10.5: Overlay Case Studies: Pastry, Tapestry

GUID delivers message to an identified active node, otherwise, delivers to another active node numerically closest to the original one

Active nodes take responsibility of processing requests addressed to all objects in their numerical neighborhood

Routing steps involve the user of an underlying transport protocol (normally UDP) to transfer the message to a Pastry node that is ‘closer’ to its destination

Page 49: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 49

10.5: Overlay Case Studies: Pastry, Tapestry

The real transport of a message across the Internet between two Pastry nodes may requires a substantial number of IP hops

Pastry uses a locality metric based on network distance in the underlying network to select appropriate neighbors when setting up the routing tables used at each node

Page 50: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 50

10.5: Overlay Case Studies: Pastry, Tapestry

The participated Hosts are fully self organizing Nodes obtains data from network to construct a

routing table and other required state from existing members

When a node fails, the remaining nodes reconfigure the required changes in the routing structure

Page 51: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 51

10.5: Overlay Case Studies: Pastry, Tapestry

Pastry Routing Algorithm The algorithm involves the use of a routing table

at each node to route messages efficiently. Describe the algorithm in two stages

Stage 1: Simplified form to routes messages correctly but inefficiently without a routing table

Stage 2: Describe full routing algorithm with routing table which routes a request to any node in O(log N) messages

Page 52: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 52

10.5: Overlay Case Studies: Pastry, Tapestry

Stage 1: Each active node stores a leaf set – a vector L (of

size of 2l) The vector contains the GUIDs and IP addresses

of the nodes whose GUIDs are numerically closest on either side of its own (l above and l below)

Leaf sets are maintained by Pastry as nodes join and leave

Page 53: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 53

10.5: Overlay Case Studies: Pastry, Tapestry

Even after a node failure they will be corrected within a short time within the defined maximum rate of failure

The GUID space is treated as circular

Page 54: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 54

10.5: Overlay Case Studies: Pastry, Tapestry Stage 1: Circular routing alone is correct but

inefficient

Node A (65A1FC) receives message M with destination address D (D46A1C)

Destination D

Page 55: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 55

10.5: Overlay Case Studies: Pastry, Tapestry

Stage 2: Full Pastry algorithm Efficient routing is achieved with the aid of

routing tables Each Pastry node maintains a tree-structured

routing table giving GUIDs and IP address for a set of nodes spread throughout the entire range of 2128 possible GUID values

Page 56: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 56

10.5: Overlay Case Studies: Pastry, Tapestry

Structure of a routing table

Page 57: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 57

10.5: Overlay Case Studies: Pastry, Tapestry Routing a message with the aid of routing table and the message can be

delivered in ~log 16 (N) hops.

Node A

Page 58: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 58

10.5: Overlay Case Studies: Pastry, Tapestry

Pastry’s routing algorithm

Page 59: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 59

10.5: Overlay Case Studies: Pastry, Tapestry

Host integration New nodes use a joining protocol to acquire their

routing table and leaf set contents Notify other nodes of changes they must make to

their tables.

Page 60: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 60

10.5: Overlay Case Studies: Pastry, Tapestry

Host failure or departure Nodes in Pastry infrastructure may fail or depart

without warning A node is considered failed when its immediate

neighbours can no longer communicate with it Required to repair the leaf sets that contain the

failed node’s GUID

Page 61: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 61

10.5: Overlay Case Studies: Pastry, Tapestry Locality

The Pastry routing structure is highly redundant

The locality metric is used to compare candidates and the closest available node is chosen

This mechanism cannot produce globally optimal routings because available information is not comprehensive

Page 62: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 62

10.5: Overlay Case Studies: Pastry, Tapestry

Fault tolerance Send ‘heartbeat’ messages to neighboring nodes Use ‘at-least-once’ delivery mechanism and

repeat several times in the absence of a response to allow Pastry a longer time window to detect and repair node failures

Introduce randomness into the Pastry routing algorithm to overcome the problem

Page 63: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 63

10.5: Overlay Case Studies: Pastry, Tapestry

Dependability Additional dependability measures and some

performance optimizations in the host management algorithms were included in the update version called MSPastry by the authors

Page 64: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 64

10.5: Overlay Case Studies: Pastry, Tapestry

Evaluation Work Use MSPastry to evaluate the impact on

perfromance and dependability of the host join/leave rate and the associated dependability mechanisms

Page 65: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 65

10.5: Overlay Case Studies: Pastry, Tapestry

Tapestry Implements a distributed hash table and routes

messages to nodes based on GIDs associated with resources using prefix routing in a manner similar to Pastry.

API conceals the distributed hash table from applications behind a Distributed Object Location and Routing (DOLR) interface

Page 66: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 66

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Squirrel OceanStore Ivy file stores

Page 67: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 67

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Squirrel

The SHA-1 secure hash function is applied to the RL of each cached object to produce a 128-bit Pastry GUID

Authors based on the end-to-end argument, the HTTPS protocol should be used to achieve a much better guarantee of those interactions that require it

Page 68: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 68

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Squirrel implementation

the node whose GUID is numerically closest to the GUID of an object becomes that object’s home node to hold any cached copy of the object

Client nodes respond to cache local and remote web objects

Request a fresh copy of a object from home node if no copy in local cache

Page 69: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 69

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Evaluation of Squirrel

The reduction in total external bandwidth used The latency perceived by users for access to web

objects The computational and storage load imposed on

client nodes

Page 70: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 70

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel OceanStore file store

Aims to provide a very large scale, incrementally-scalable persistent storage facility for mutable data objects long-term persistence reliability in constantly changing network and

computing resources used for NFS-like file service, electronic mail hosting,

database sharing persistent storage of large numbers of data objects

Page 71: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 71

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel

Built prototype, called Pond to validate the OceanStore design and compare its performance with traditional approaches

Pond uses Tapestry routing overlay mechanism to place blocks of data at nodes distributed throughout the Internet and to dispatch requests to them

Page 72: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 72

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel

Page 73: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 73

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel

OceanStore/Pond Storage organization Data objects are analogous to files, with their data

stored in a set of blocks Each object represents an ordered sequence of

immutable version Three types of identifier used in the storage

BGUID - Secure hash of a data block VGUID - BGUID of the root block of a version AGUID – Uniquely identifies all the version of an object

Page 74: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 74

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Performance of OceanStore/Pond

Pond is prototype to prove feasibility of a scalable peer-to-peer file service

Evaluated against several purpose-designed benchmarks including Andrew benchmark

Use Simple emulation of an NFS client and server

Page 75: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 75

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Conclusions of OceanStore/Pond

When operating over a wide-area network Substantially exceeds NFS for reading Within a factor of three of NFS for updating files and

directories LAN results are substantially worse Overall, the results suggest that an Internet-scale peer-

to-peer file service would be an effective solution for the distribution of files that do not change very rapidly

Page 76: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 76

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Ivy file system

A read/write file system supports multiple readers and writers

Implemented over an overlay routing layer Distributed hash-addressed data store Emulates a Sun FNS server Stores the state of files as logs Scans the logs to reconstructs the files

Page 77: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 77

10.6: Application Case Studies: Squirrel, OceanStore, Ivy Squirrel Ivy file system

Resolved issues to host files in partially trusted or unreliable machines The maintenance of consistent file metadata Partial trust between participants and vulnerability Continued operation during partitions

Page 78: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 78

10.7: Summary Napster – immutable data, unsophisticated

routing Current – mutable data, routing overlays,

sophisticated algorithms Internet or company intranet support Distributed Computing (SETI)

Page 79: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 79

10.7: Summary Benefits of Peer-to-Peer Systems

Ability to exploit unused resources (storage, processing) in the host computers

Scalability to support large numbers of clients and hosts with load balancing of network links and host computer resources

Self-organizing properties of the middleware platforms reduces costs

Page 80: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 80

10.7: Summary Weaknesses of Peer-to-Peer Systems

Costly for the storage of mutable data compared to trusted, centralized service

Can not yet guarantee anonymity to hosts

Page 81: Distributed Systems Concepts and Design  Chapter 10:  Peer-to-Peer Systems

Bruce Hammer, Steve Wallis, Raymond Ho 81

10: Peer-to-Peer Systems Questions????

Comments??