Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms...
-
Upload
piers-richard -
Category
Documents
-
view
216 -
download
1
Transcript of Routing Indices For P-to-P Systems ICDCS 2002. Introduction Search in a P2P system –Mechanisms...
Routing Indices For P-to-P Systems
ICDCS 2002
Introduction• Search in a P2P system
– Mechanisms without an index– Mechanisms with specialized index nodes (cent
ralized search)– Mechanisms with indices at each node
• Structure P2P network• Unstructure P2P network
• Parallel v.s. sequentially search– Response time– Network traffic
Routing indices(RI)• Query
– Documents are on zero or more “topics”, and queries request documents on particular topics.
– Documents topics are independent
• Local index• RI
– Each node has a local routing index which contains following information
• The number of documents along each path• The number of documents on each topic of interest
– Allow a node to select the “best” neighbors to send a query to
• The RI may be “coarser” than the local indices – overcounts– Undercounts
• Goodness measure– Number of results in a path
• Using Routing indices
– Storage space• N: number of nodes in the P2P network
• b: branching factor
• c: number of categories
• s: counter size in bytes
Centralized index : s*( c+1) *N
Distributed system: s*(c+1)*b (each node)
• Creating routing indices
• Maintaining Routing Indices– Trade off between RI freshness and update cost– No requiring the participation of a
disconnecting node
• Discussion– If the search topics is dependent?– Can the number of “hops” necessary to reach a
document be estimated?
Alternative Routing Indices
• Hop-count RI– Aggregated RIs for each “hop” up to a maximu
m number of hops are stored
– Search cost• Number of messages
– The goodness of a neighbor• The ratio between the number of documents availabl
e through that neighbor and the number of messages required to get those documents
– Regular tree with fanout F
– It takes Fh messages to find all documents at hop h
– Storage cost?
• Exponentially aggregated RI– Store the result of applying the regular-tree cost
formula to a hop-count RI
– How to compute the goodness of a path for the query containing several topics?
Cycles in the P2P network (HW)
Improving Search in Peer-to-Peer Networks
ICDCS 2002
Beverly YangHector Garcia-Molina
Outline
• Introduction
• Techniques
• Experiment
Introduction
• We present three techniques for efficient search in P2P systems.– Basic idea is to reduce the number of nodes that
process a query
Current Techniques
• Gnutella– BFS with depth limit D.– Waste bandwidth and processing resources
• Freenet– DFS with depth limit D.– Poor response time.
Iterative Deepening
• Under policy P= { a, b, c} ;waiting time W
• See example.
Directed BFS
• A source send query messages to just a subset of its neighbors
• A node maintains simple statistics on its neighbors– Number of results received from each neighbor– Latency of connection
Candidate nodes
• Returned the Highest number of results
• Low hop-count
• High messages
Local Indices
• Each node n maintains an index over the data of all nodes within r hops radius.
• All nodes at depths not listed in the policy simply forward the query.
• Example: policy P= { 1, 5}
Experimental Setup
• For each response ,we log:– Number of hops took– IP from which the Response message came– Response time– Individual results
Experimental result
Efficient Content Location Using Interest-Based Locality in Peer-to-
Peer SystemsKunwadee Sripanidkulchai
Bruce Maggs
Hui Zhang
IEEE INFOCOM 2003
motivation
• Although flooding is simple and robust, it is not scalable.
• A content location solution in which peers organized into an interest-based structure on top of Gnutella.
• The algorithm is called interest-based shortcuts
Interest-based locality
Shortcuts Architecture and Design Goals
• To create additional links on top of a peer-to-peer system’s overlay
• As a separate performance enhancement layer on top of existing content location mechanisms
Content location paths
Shortcut Discovery
• The first lookup returns a set of peers that store the content
• These are potential candidates.
• One peer is selected at random from the set and added
• For scalability, each peer allocates a fixed-size amount of storage to implement shortcuts.
Shortcut selection
• We rank shortcuts based on their perceived utility
• A peer sequentially asking all of the shortcuts on its list.
Ranking metrics
• Probability of providing content
• Latency of the path to the shortcut
• Load at the shortcut
• A combination of metrics can be used based on each peer’s preference
Performance indices
• Success rate
• Load characteristics
• Query scope
• Minimum reply path lengths
• Additional state
Potential and Limitations
• Adding 5 shortcuts at a time produces success rates that are close to the best possible.
• Slightly increase the shortest path length from 1 to 2 hops will perform better success rate.
Conclusion
• A simple and practical mechanism was proposed.
Similarity Discovery in structured P2P Overlays
ICPP
Introduction• Structured P2P network
– Only support search with a single keyword
• Similarity between two documents– Keyword sets– Vector space– Measure
• Problems– Search problem– New keyword?
||||cos 1
ba
baab
Meteorograph
• Absolute angle
Publishing and Searching
• Publish– Hash
– Publish the item to a node np with the hash key closest to hash value
• Search problem– Nearest answers– K_nearest answers–
• Partial
• Comprehensive
• Search strategy
• Discussions
• What happened when keyword vector is represented by ?
Other issues
• Load balance
• Changes of vector space– Republished?– Comprehensive set of keywords– Other methods?
SWAM: A Family of Access Methods for Similarity-Search in
Peer-to-Peer Data NetworksFarnoush Banaei-KashaniCyrus Shahabi
(CIKM04)
PDN access method
• Defines
• How to organize the PDN topology to an index-like structure
• How to use the index structure
Hilbert space
• Hilbert space (V, Lp)• Key k = (a1,a2, … , ad)
– d: the dimension of a Vector space– The domain is a contiguous and finite interval o
f R
• The Lp norm with p belongs to Z+– The distance function to measure the dissimilari
ty
Topology
• Topology of a PDN can be modelled as a directed graph G(N, E)
• A(n) is the set of neighbors for node n
• A node maintains– A limited amount of information about its neigh
bors Includes • the key of the tuples maintained at neighbors
• The physical addresses of neighbors
• The processing of the query is completed when all expected tuples in the relevant result set are visited
• Access methods– Join, leave for virtual nodes– Forward for using local information to process
queries and make forwarding decisions
The small world example
• Grid component
• Random graph component
• The process of queries (exact, range, kNN) in the highly locality topology
Flat partitioning
• SWAM also employs the space partitioning idea: flat partitioning
Query Processing
• Exact-Match query processing
• Range query processing
• kNN Query processing
Data Indexing in Peer-to-Peer DHT Networks
ICDCS 2004
• Locating data using incomplete information.– How to search data in a DHT
• Data descriptors and queries– Semi-structured XML data
– Query• Most specific query for d
• Relationship between queries
• Given the most specific query, finding the location of the file is simple
• How about less specific queries
• Solution– Provide query-to-query service
• For a given query q, the index service returns a list of more specific queries, covered by q
– DHT storage system must be extended• Insert(q.qi), q->qi, adds a mapping (q;qi) to the index
of the node responsible for key q.