Index and Distributed Index Methods Zachary G. Ives University of Pennsylvania CIS 455 / 555 –...
-
Upload
drusilla-parks -
Category
Documents
-
view
214 -
download
0
description
Transcript of Index and Distributed Index Methods Zachary G. Ives University of Pennsylvania CIS 455 / 555 –...
Index and Distributed Index Methods
Zachary G. IvesUniversity of Pennsylvania
CIS 455 / 555 – Internet and Web Systems
February 19, 2008
Some portions derived from slides by Raghu Ramakrishnan
2
Publish-Subscribe Model Summarized XFilter has an elegant model for matching XPaths
A good deal more complex than HW2, in that it supports wildcards (*) and //
Currently not commonly used Partly because XML isn’t that widespread This may change with the adoption of an XML format
called RSS (Rich Site Summary or Really Simple Syndication)
Many news sites, web logs, mailing lists, etc. use RSS to publish daily articles
Seems like a good fit for publish-subscribe models!
3
Finding a Happy MediumWe’ve seen two approaches:
Do all the work at the data stores: flood the network with requests
Do all the work via a central crawler: record profiles and disseminate matches
An alternative, two-step process: Build a content index over what’s out there
An index is a key -> value map Typically limited in what kinds of queries can be
supported Most common instance: an index of document keywords
4
Inverted Indices A conceptually very simple data structure:
<keyword, {list of occurrences}>
In its simplest form, each occurrence includes a document pointer (e.g., URI), perhaps a count and/or position
Requires two components, an indexer and a retrieval system
We’ll consider cost of building the index, plus searching the index using a single keyword
5
How Do We Lay Out an Inverted Index? Some options:
Unordered list Ordered list Tree Hash table
6
Unordered and Ordered Lists Assume that we have entries such as:
<keyword, #items, {list of occurrences}> What does ordering buy us?
Assume that we adopt a model in which we use:<keyword, item><keyword, item>
Do we get any additional benefits?
How about:<keyword, {items}> where we fix the
size of thekeyword and the number of items?
7
Tree-Based IndicesTrees have several benefits over lists:
Potentially, logarithmic search time, as with a well-designed sorted list, IF it’s balanced
Ability to handle variable-length records
We’ve already seen how trees might make a natural way of distributing data, as well
How does a binary search tree fare? Cost of building? Cost of finding an item in it?
B+ Tree: A Flexible, Height-Balanced, High-Fanout Tree Insert/delete at log F N cost
(F = fanout, N = # leaf pages) Keep tree height-balanced
Minimum 50% occupancy (except for root) Each node contains d <= m <= 2d entries
d is called the order of the tree Can search efficiently based on equality (or also
range, though we don’t need that here)Index Entries
Data Entries("Sequence set")
(Direct search)
Example B+ Tree Data (inverted list ptrs) is at leaves; intermediate
nodes have copies of search keys Search begins at root, and key comparisons
direct it to a leaf Search for be↓, bobcat↓ ...
Based on the search for bobcat*, we know it is not in the tree!
Root
best but dog
a↓ am ↓ an↓ ant↓ art↓ be↓ best↓ bit↓ bob↓ but↓can↓cry↓ dog↓ dry↓ elf↓ fox↓
art
Inserting Data into a B+ Tree Find correct leaf L Put data entry onto L
If L has enough space, done! Else, must split L (into L and a new node L2)
Redistribute entries evenly, copy up middle key Insert index entry pointing to L2 into parent of L
This can happen recursively To split index node, redistribute entries evenly, but push
up middle key. (Contrast with leaf splits.) Splits “grow” tree; root split increases height
Tree growth: gets wider or one level taller at top
11
Inserting “and↓” Example: Copy up
Want to insert here; no room, so split & copy up:
a↓ am ↓ an↓ ant↓ and↓
anEntry to be inserted in parent node.(Note that key “an” is copied up andcontinues to appear in the leaf.)
and↓
Root
best but dog
a↓ am ↓ an↓ ant↓ art↓ be↓ best↓ bit↓ bob↓ but↓can↓cry↓ dog↓ dry↓ elf↓ fox↓
art
12
Inserting “and↓” Example: Push up 1/2
Root
art↓ be↓ best↓ bit↓ bob↓ but↓can↓ cry↓
an
Need to split node & push up
best but dogart
a↓ am ↓ dog↓ dry↓ elf↓ fox↓
an↓ ant↓ and↓
13
Inserting “and↓” Example: Push up 2/2
Root
art↓ be↓ best↓ bit↓ bob↓ but↓can↓ cry↓
an but dog
best
art
Entry to be inserted in parent node.(Note that best is pushed up and onlyappears once in the index. Contrastthis with a leaf split.)
a↓ am ↓ dog↓ dry↓ elf↓ fox↓
an↓ ant↓ and↓
14
Copying vs. Splitting, Summarized Every keyword (search key) appears in at most
one intermediate node Hence, in splitting an intermediate node, we push
up Every inverted list entry must appear in the
leaf We may also need it in an intermediate node to
define a partition point in the tree We must copy up the key of this entry
Note that B+ trees easily accommodate multiple occurrences of a keyword
Virtues of the B+ Tree B+ tree and other indices are quite efficient:
Height-balanced; logF N cost to search High fanout (F) means depth rarely more than 3 or 4 Almost always better than maintaining a sorted file Typically, 67% occupancy on average
Berkeley DB library (C, C++, Java; Oracle) is a toolkit for B+ trees that you are using Interface: open B+ Tree; get and put items based on
key Handles concurrency, caching, etc.
16
How Do We Distribute a B+ Tree? We need to host the
root at one machine and distribute the rest
What are the implications for scalability? Consider building the
index as well as searching
17
Eliminating the Root Sometimes we don’t want a tree-
structured system because the higher levels can be a central point of congestion or failure
Two strategies: Modified tree structure (e.g., BATON, Jagadish
et al.) Non-hierarchical structure
18
A “Flatter” Scheme: Hashing Start with a hash function
with a uniform distribution of values: h(name) a value (e.g., 32-bit
integer)
Map from values to hash buckets Generally using mod (#
buckets) Put items into the buckets
May have “collisions” and need to chain
0
1
2
3
048
12
…
buckets
{h(x) values
overflow chain
19
Dividing Hash Tables Across Machines Simple distribution – allocate some number of
hash buckets to various machines Can give this information to every client, or provide
a central directory Can evenly or unevenly distribute buckets Lookup is very straightforward
A possible issue – data skew: some ranges of values occur frequently Can use dynamic hashing techniques Can use better hash function, e.g., SHA-1 (160-bit
key)
20
Some Issues Not Solved withConventional Hashing What if the set of servers holding the
inverted index is dynamic? Our number of buckets changes How much work is required to reorganize the
hash table?
Solution: consistent hashing
21
Consistent Hashing – the Basis of “Structured P2P”Intuition: we want to build a distributed hash table
where the number of buckets stays constant, even if the number of machines changes Requires a mapping from hash entries to nodes Don’t need to re-hash everything if node joins/leaves Only the mapping (and allocation of buckets) needs to
change when the number of nodes changes
Many examples: CAN, Pastry, Chord For this course, you’ll use Pastry But Chord is simpler to understand, so we’ll look at it
22
Basic Ideas We’re going to use a giant hash key space
SHA-1 hash: 20B, or 160 bits We’ll arrange it into a “circular ring” (it wraps
around at 2160 to become 0)
We’ll actually map both objects’ keys (in our case, keywords) and nodes’ IP addresses into the same hash key space “abacus” SHA-1 k10 130.140.59.2 SHA-1 N12
23
Chord Hashes a Key to its Successor
N32
N10
N100
N80
N60
Circularhash
ID Space
Nodes and blocks have randomly distributed IDs Successor: node with next highest ID
k52
k30
k10
k70
k99
Node ID k112k120
k11
k33k40
k65
Key Hash
24
Basic Lookup: Linear Time
N32
N10N5
N20
N110
N99
N80
N60
N40
“Where is k70?”
“N80”
Lookups find the ID’s predecessor Correct if successors are correct
25
“Finger Table” Allows O(log N) Lookups
N80
½¼
1/8
1/161/321/641/128
Goal: shortcut across the ring – binary search Reasonable lookup latency
26
Node Joins How does the node
know where to go?(Suppose it knows 1
peer)
What would need to happen to maintain connectivity?
What data needs to be shipped around?
N32
N10N5
N20
N110
N99
N80
N60
N40
N120
27
A Graceful Exit: Node Leaves
What would need to happen to maintain connectivity?
What data needs to be shipped around?
N32
N10N5
N20
N110
N99
N80
N60
N40
28
What about Node Failure? Suppose a node just dies?
What techniques have we seen that might help?
29
Successor Lists Ensure Connectivity
N32
N10N5
N20
N110
N99
N80
N60 Each node stores r successors, r = 2 log N Lookup can skip over dead nodes to find objects
N40
N10, N20, N32N20, N32, N40
N32, N40, N60
N40, N60, N80
N60, N80, N99
N80, N99, N110
N99, N110, N5
N110, N5, N10
N5, N10, B20
30
Objects are Replicated as Well When a “dead” peer is detected, repair the
successor lists of those that pointed to it Can take the same scheme and replicate
objects on each peer in the successor list Do we need to change lookup protocol to find
objects if a peer dies? Would there be a good reason to change lookup
protocol in the presence of replication?
What model of consistency is supported here? Why?
31
Stepping Back for a Moment:DHTs vs. Gnutella and Napster 1.0 Napster 1.0: central directory; data on peers Gnutella: no directory; flood peers with requests Chord, CAN, Pastry: no directory; hashing scheme to
look for data
Clearly, Chord, CAN, and Pastry have guarantees about finding items, and they are decentralized
But non-research P2P systems haven’t adopted this paradigm: Kazaa, BitTorrent, … still use variations of the Gnutella
approach Why? There must be some drawbacks to DHTs..?
32
Distributed Hash Tables, Summarized
Provide a way of deterministically finding an entity in a distributed system, without a directory, and without worrying about failure
Can also be a way of dividing up work: instead of sending data to a node, might send a task Note that it’s up to the individual nodes to do
things like store data on disk (if necessary; e.g., using B+ Trees)
33
Applications of Distributed Hash Tables To build distributed file systems (CFS,
PAST, …) To distribute “latent semantic indexing”
(U. Rochester) As the basis of distributed data integration
(U. Penn, U. Toronto, EPFL) and databases (UC Berkeley)
To archive library content (Stanford)
34
Distributed Hash Tables andYour ProjectIf you’re building a mini-Google, how might DHTs
be useful in: Crawling + indexing URIs by keyword? Storing and retrieving query results?
The hard parts: Coordinating different crawlers to avoid redundancy Ranking different sites (often more difficult to
distribute) What if a search contains 2+ keywords?
(You’ll initially get to test out DHTs in Homework 3)
35
From Chord to Pastry What we saw was the basic data
algorithms for the Chord system Pastry is a slightly different:
It uses a different mapping mechanism than the ring (but one that works similarly)
It doesn’t exactly use a hash table abstraction – instead there’s a notion of routing messages
It allows for replication of data and finds the closest replica
It’s written in Java, not C … And you’ll be using it in your projects!
36
Pastry API Basics (v 1.4.3_02) See freepastry.org for details and downloads Nodes have identifiers that will be hashed:
interface rice.p2p.commonapi.Id 2 main kinds of NodeIdFactories – we’ll use socket-based
Nodes are logical entities: can have more than one virtual node Several kinds of NodeFactories: create virtual Pastry
nodes All Pastry nodes have built in functionality to
manage routingDerive from “common API” class rice.p2p.commonapi.Application
37
Creating a P2P Network Example code in DistTutorial.java Create a Pastry node:
Environment env = new Environment();PastryNodeFactory d = new SocketPastryNodeFactory(new
NodeFactory(keySize), env);
// Need to compute InetSocketAddress of a host to be addrNodeHandle aKnownNode =
((SocketPastryNodeFactory)d).getNodeHandle(addr);PastryNode pn = d.newNode(aKnownNode);MyApp = new MyApp(pn); // Base class of your
application! No need to call a simulator – this is real!
38
Pastry Client APIs Based on a model of routing messages
Derive your message from class rice.p2p.commonapi.Message
Every node has an Id (NodeId implementation) Every message gets an Id corresponding to its key Call endpoint.route(id, msg, hint) (aka routeMsg) to
send a message (endpoint is an instance of Endpoint) The hint is the starting point, of type NodeHandle
At each intermediate point, Pastry calls a notification: forward(id, msg, nextHop)
At the end, Pastry calls a final notification: deliver(id, msg) aka messageForAppl
39
IDs Pastry has mechanisms for creating node IDs itself Obviously, we need to be able to create IDs for
keys Need to use java.security.MessageDigest:
MessageDigest md = MessageDigest.getInstance("SHA"); byte[] content = myString.getBytes();md.update(content);byte shaDigest[] = md.digest();
rice.pastry.Id keyId = new rice.pastry.Id(shaDigest);
40
How Do We Create a Hash Table (Hash Map/Multiset) Abstraction?
We want the following: put (key, value) remove (key) valueSet = get (key)
How can we use Pastry to do this?
41
Next Time We’ve been looking at data distribution to
this point We saw XML as a means of message
passing – one of several means of communication, in some ways analogous to event-based scheduling
We’ll talk about remote procedure calls, remote method invocations, and Web services