Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1....

96
Data Structure Mohsen Arab Yazd University January 13, 2015 Mohsen Arab (Yazd University ) Data Structure January 13, 2015 1 / 86

Transcript of Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1....

Page 1: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Data Structure

Mohsen Arab

Yazd University

January 13, 2015

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 1 / 86

Page 2: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Table of Content

Binary Search Tree

Treaps

Skip Lists

Hash Tables

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 2 / 86

Page 3: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Fundamental Data-structuring Problem

fundamental data-structuring problem: maintain a collectionS1,S2, ... of sets of items to efficiently support certain types of queriesand operations:

MAKESET(S): create a new (empty) set S.

INSERT(i, S): insert item i into the set S.

DELETE(k,S): delete the item indexed by the key value k from theset S.

FIND(k, S): return the item indexed by the key value k in the set S.

JOIN(S1, i, S2): replace the sets S1 and S2 by the new setS = S1 ∪ i ∪ S2, where

1 for all items j ∈ S1, k(j) < k(i),2 for all items j ∈ S2, k(j) > k(i).

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 3 / 86

Page 4: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Fundamental Data-structuring Problem(cont.)

Paste(S1 , S2): replace the sets S1 and S2 by the new setS = S1 ∪ S2, where for all items i ∈ S1 and j ∈ S2, k(i) < k(j).

Split(k,S): replace the set S by the new sets S1 and S2 where

S1 = j ∈ S | k(j) < kS2 = j ∈ S | k(j) > k

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 4 / 86

Page 5: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

binary search tree

binary search tree: binary tree in which keys satisfy search tree property.

Definition

Search tree property: for all nodes with key value k, the left sub-treecontains only key values smaller than k and the right sub-tree containsonly key values larger than k.

the key values in binary tree are in symmetric order, if they satisfysearch tree property.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 5 / 86

Page 6: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

we will assume BST are endogenous.

Definition

Endogenous: all key values are stored at internal nodes, and all leaf nodesare empty.

This will ensure that the trees are full, which means that everynon-leaf (internal) node has exactly two children.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 6 / 86

Page 7: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

standard implementations of operations

MakeSet(S): initialize an empty tree for the set S.

Joint(S1,k,S2): create a node containing key k as root, make S1 andS2 as its left and right sub-tree respectively.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 7 / 86

Page 8: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

search

Example:FIND(4,S)

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 8 / 86

Page 9: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Insert

perform Find(k,S), insert k where search fails (into the empty leafnode)

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 9 / 86

Page 10: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Insert

perform Find(k,S), insert k where search fails (into the empty leafnode)

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 9 / 86

Page 11: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

implementations of operations(Delete)

Delete(K,S):1) if the node v containing k has a leaf as one of its two children.For example, if the right child of v is a leaf, then replace v by L( v) as thechild of P(v).

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 10 / 86

Page 12: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

implementations of operations(Delete)

Delete(K,S):1) if the node v containing k has a leaf as one of its two children.For example, if the right child of v is a leaf, then replace v by L( v) as thechild of P(v).

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 10 / 86

Page 13: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

implementations of operations(Delete)

2. If neither of the children is a leaf,

let k ′ be the key value that is the predecessor of k in the set S.

Now, we can delete the node containing k ′ since its right child is aleaf, and replace the key value k by k ′ in the node v.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 11 / 86

Page 14: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

implementations of operations(Delete)

2. If neither of the children is a leaf,

let k ′ be the key value that is the predecessor of k in the set S.

Now, we can delete the node containing k ′ since its right child is aleaf, and replace the key value k by k ′ in the node v.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 11 / 86

Page 15: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

implementations of operations(cont.)

PASTE(S1, S2):1 delete the largest key value, say k, from S1.2 apply JOIN(S1,k,S2).

Note

k can be found by doing a FIND(∞,S1).

SPLIT(k, S):

if k is at the root of S, do the reverse of the steps employed inJOIN(S1,k,S2).else, make use of rotations to move it to the root.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 12 / 86

Page 16: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Problem:

Each operation can be performed in time proportional to the height ofthe tree. There is sequence of INSERT operations that result in tree ofheight linear in n.

Solution:

Perform rotations during update operations to ensure having all leavesin distance O(log n) from the root.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 13 / 86

Page 17: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Rotations

Each type of rotation moves a node together with one of its sub-treescloser to the root (and some others away from the root), whilepreserving the search tree property.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 14 / 86

Page 18: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

A different strategy: Splaying in self-adjusting search tree

Splaying

the splay operation moves a specified node to the root via a sequence ofrotations

Amortization

partitioning of the total cost of a sequence of operations among theindividual operations in that sequence.

Thus, amortized time bound can be viewed as the average cost of theoperations in a sequence

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 15 / 86

Page 19: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

idea behind self-adjusting trees

to use a particular implementation of the splay operation to move to theroot a node accessed by a FIND operation

How it can benefit us

nodes which accessed often enough, remain close to root. Thus, totalrunning time will increase not very much

for an infrequently accessed node,total running time will not increasevery much in any case.

Note

These self-adjusting trees guarantee only amortized logarithmic time peroperation.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 16 / 86

Page 20: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Advantages and drawbacks of self-adjusting trees

Advantages

They are relatively simple to implement.

do not require explicit balance information to be stored at nodes

splay trees can be shown to be optimal with respect to arbitraryaccess frequencies for the items being stored.

Drawbacks

they restructure the entire tree during updates and even simple searchoperations.

during any given operation splay trees may perform a logarithmicnumber of rotations

we do not have the guarantee that every operation will run quickly

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 17 / 86

Page 21: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Treaps

treaps are efficient randomized alternative to the balanced tree andself-adjusting tree.

Treaps achieve essentially the same time bounds in the expectedsense, but with following advantages:

1 do not require any explicit balance information2 expected number of rotations performed is small for each operation3 They are extremely simple to implement

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 18 / 86

Page 22: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

binary search tree

A (full, endogenous) binary tree whose nodes have key values associatedwith them is a binary search tree if the key values are in the symmetricorder

heap

If the key values decrease monotonically along any root-leaf path, we callthe structure a heap and say that the keys are stored in a heap order.

treap

Consider a binary tree where each node v contains a pair of values: a keyk( v) as well as a priority p( v).We call this structure a treap if it is a binary search tree with respect tothe key values and, simultaneously, a heap with respect to the priorities

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 19 / 86

Page 23: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

example of treaps

S = (k1,p1), ... ,(kn,pn)

S=(2, 13), (4, 26), (6,19), (7, 30), (9,14), (11, 27), (12, 22)

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 20 / 86

Page 24: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Theorem 8.1

Let S = (k1,p1), ... ,(kn,pn) be any set of key-priority pairs such thatthe keys and the priorities are distinct.Then, there exists a unique treapT(S) for it.

proof:

It is obvious that the theorem is true for n = 0 and for n = 1.

Suppose now that n ≥ 2, and assume that (k1, p1) has the highestpriority in S. Then, a treap for S can be constructed by putting item 1at the root of T(S).

A treap for the items in S of key value smaller (larger) than k1 can beconstructed recursively, and this is stored as the left (right) sub-treeof item 1.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 21 / 86

Page 25: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

implementation of Operations using treap

MAKESET(S) or a FIND(k, S) operation exactly as before.

INSERT(k, S):

Do FIND(k, S) and inserting k at the empty leaf node where the searchterminates with failure.if heap order property is violated ( parent(k).p < k.p):

Repeat:decrease k’s depth by performing a rotation at node w= parent(k) sothat k becomes the parent of w.until k either becomes the root or parent(k).p > k.p.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 22 / 86

Page 26: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

implementation of Operations using treap: Add(), Example

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 23 / 86

Page 27: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

implementation of Operations using treap: Add(), Example

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 23 / 86

Page 28: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

implementation of Operations using treap: Add(), Example

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 23 / 86

Page 29: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

implementation of Operations using treap: Delete(),Example

DELETE(k, S): operation is exactly the reverse of an insertion downwarduntil both its children are leaves, and then simply discard the node.

Note: The choice of the rotation (left or right) at each stage depends onthe relative order of the priorities of the children of the node being deleted.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 24 / 86

Page 30: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Delete(), Example

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 25 / 86

Page 31: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Delete(), Example

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 25 / 86

Page 32: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Delete(), Example

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 25 / 86

Page 33: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Delete(), Example

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 25 / 86

Page 34: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

JOIN(S1, k, S2): operation as before, and the resulting structure is atreap provided the priority of k is higher than that of any item in S1

or S2.If the new root (containing k) violates the heap order, we simplyrotate that node downward until each of the two children of the nodehas a smaller priority or is a leaf.

PASTE(S1, S2): As in BST.

SPLlT(k, S):1 delete k from S.2 inserting it into S with a priority of ∞.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86

Page 35: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

left spine of a tree: the path obtained by starting at the root andrepeatedly moving to the left child until a leaf is reached;

the right spine is defined similarly.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 27 / 86

Page 36: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

left spine of a tree: the path obtained by starting at the root andrepeatedly moving to the left child until a leaf is reached;

the right spine is defined similarly.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 27 / 86

Page 37: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Mulmuley Games

Mulmuley games are useful abstractions of processes underlying thebehavior of certain geometric algorithms.The cast of characters in these games is:

P = P1, ... ,Pp S = S1, ... ,Ss T = T1, ... ,Tt B = B1, ... ,Bb

The set P ∪ S is drawn from a totally ordered universe.all players are smaller than all stoppers: for all i and j, Pi < Sj

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 28 / 86

Page 38: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Exercise 8.5:

Let Hk =∑k

i=1 1/i . denote the kth Harmonic number.Show that:

∑nk=1 Hk = (n + 1)Hn+1 − (n + 1)

Recall that Hk = Ink + O(1) (Proposition B.4).

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 29 / 86

Page 39: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Depending upon the set of active characters, we formulate four differentgames, with each game being more general than the previous one.

Game A.

initial set of characters X = P ∪ B .The game proceeds by repeatedly sampling from X without replacement,until the set X becomes empty.

random variable V: the number of samples in which a player Pi is chosensuch that Pi is larger than all previously chosen players.

value of the game Ap = E [V ]

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 30 / 86

Page 40: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Lemma 8.2:

For all p ≥ 0, Ap = Hp.

Proof:

Assume that the set of players is ordered as P1 > P2 > ... > Pp.

in Game A, bystanders are not considered, so we can set b=0.

if the first chosen player is Pi , the expected value of the game is1 + Ai−1.

Ap =∑p

i=11+Ai−1

p = 1 +∑p

i=1Ai−1

p

Upon rearrangement, using the fact that A0 = 0,∑p−1

i=1 Ai = pAp − p.By Exercise 8.5: Harmonic numbers are the solution to the above equation.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 31 / 86

Page 41: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Game C.

initial set of characters X = P ∪ B ∪ S .

the stoppers are treated as players. But the game stops when astopper is chosen for the first time.

value of the game C sp = E [V + 1] = E [V ] + 1

Note

since all players are smaller than all stoppers, we will always get acontribution of 1 to the game value from the first stopper.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 32 / 86

Page 42: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Lemma 8.3

Lemma 8.3

For all p, s ≥ 0, C sp = 1 + Hs+p - Hs .

Proof

Assume that the set of players is ordered as P1 > P2 > ... > Pp.

As in Game A, bystanders are not considered, so we can set b=0.

if the first sample is Pi ,the probability of the this event is s/(s + p).The expected game value is 1 + C s

i−1.

if the first sample is a stopper, ,the probability of the this event iss/(s + p). The game value is 1...

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 33 / 86

Page 43: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Proof of Lemma 8.3

Proof of Lemma 8.3 (cont.)..C sp = ( s

s+p × 1) + ( 1s+p ×

∑pi=1(1 + C s

i−1)).Upon rearrangement, using the fact that C s

0 = 1, we obtain that

C sp = s+p+1

s+p +∑p−1

i=1 C si

s+p

which is equivalent to∑p−1i=1 C s

i = (s + p)C sp − (s + p + 1).

Once again, using Exercise 8.5 it can be verified that the solution to therecurrence is given by C s

p = 1 + Hs+p − Hs .

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 34 / 86

Page 44: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Game D and E.

Games D and E are similar to Games A and C, But:

in Game D, X = P ∪ B ∪ T and in Game E, X = P ∪ B ∪ S ∪ T .

The role of the triggers is that the counting process begins only afterthe first trigger has been chosen.

i.e,

a player or a stopper contributes to V only if it is sampled after atrigger and before any stopper (and of course it is larger than allpreviously chosen players).

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 35 / 86

Page 45: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Lemma 8.4: For all p, t ≥ 0, Dtp = Hp + Ht − Hp+t .

Lemma 8.5:For all p,s,t ≥ 0,E s,t

p = ts+t + (Hs+p − Hs)− (Hs+p+t − Hs+t) .

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 36 / 86

Page 46: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Analysis of Treaps

memory less property

Since the random priorities for the elements of S are chosenindependently, we can assume that the priorities are chosen before theinsertion process is initiated

Once the priorities have been fixed, Theorem 8.1 implies that thetreap T is uniquely determined.

This implies that the order in which the elements are inserted doesnot affect the structure of the tree.

without loss of generality, we can assume that the elements of set Sare inserted into T in the order of decreasing priority.

An advantage of this view is that it implies that all insertions take place atthe leaves and no rotations are required to ensure the heap order on thepriorities.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 37 / 86

Page 47: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Lemma 8.6

Let T be a random treap for a set S of size n. For an element x ∈ Shaving rank k,

E (depth(x)) = Hk + Hn−k+1 − 1

idea of proof

S− = y ∈ S |y ≤ x,S+ = y ∈ S |y ≥ xSince x has rank k, it follows that |S−| = k, |S+| = n − k + 1

Qx ⊆ S : the ancestors of x

Q−x = S− ∩ Qx , Q+x = S+ ∩ Qx

we will establish that E [|Q−x |] = Hk . By By symmetry, it follows thatE [|Q+

x |] = Hn−k+1 − 1

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 38 / 86

Page 48: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 39 / 86

Page 49: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Consider any ancestor y ∈ Q−x of the node x.

By the memoryless assumption, y must have been inserted prior to x:py > px .

Since y < x , it must be the case that x lies in the right sub-tree of y.

search for every element z whose value lies between y and x(y < z < x) must follow the path from the root to y, and in fact gointo the right sub-tree of y.

We conclude that y is an ancestor of every node containing an elementof value between y and x.By our assumption,z must have been inserted after y, and hence is oflower priority than y.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 40 / 86

Page 50: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

.. Continue of proof..

The preceding argument establishes that an element y ∈ S− is anancestor of x, or a member of Q−x ; if and only if it was the largestelement of S− in the treap at the time of its insertion.

the order of insertion is determined by the order of the priorities, andthe latter is uniformly distributed by the order of the priorities,

Thus, the order of insertion can be viewed as being determined byuniform sampling without replacement from the pool S.

We can now claim that the distribution of | Q−x | is the same as thatof the value of Game A when P = S− and B = S\S−. Since| S− |= k , the expected size of | Q−x |= Hk

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 41 / 86

Page 51: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

For any element x in a treap,Lx : length of the left spine of the right sub-tree of x.Rx : length of the right spine of the left sub-tree of x.

Lemma 8.7

Let T be a random treap for a set S of size n. For an element X ∈ S ofrank k,

E [Rx ] = 1− 1k , E [Lx ] = 1− 1

n−k+1

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 42 / 86

Page 52: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

proof: (1)an element z < x lies on the right spine of the left sub-tree of xif and only if (2) z is inserted after x, and all elements y whose values liebetween z and x (z < y < x) are inserted after z.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 43 / 86

Page 53: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

proof

z is inserted after x, and all elements y whose values lie between z and x(z < y < x) are inserted after z ⇒ element z lies on the right spine of theleft sub-tree of x .a. if x is ancestor of z: if x doesn’t lie on the spine right of left sub-treex, then: z < u < x (or z < v < x ) and since u (or v) is ancestor of z, it isinserted before z (contradiction).b. if x is not ancestor of z: let w be lowest common ancestor of z and x.we wee that z < w < x and since w is ancestor of z, it should have beeninserted before z (contradiction).

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 44 / 86

Page 54: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Proof (1)⇒ (2):an element z < x lies on the right spine of the left sub-tree of x⇒z is inserted after x, and all elements y whose values lie between z and x(z < y < x) are inserted after z.

since x is ancestor of z, so it is have been inserted before z. Also, since allelement y (z < y < x) should be inserted in the right sub-tree of z, thenthey will be inserted after z.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 45 / 86

Page 55: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 46 / 86

Page 56: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Search in Skip List

We search for a key x in a a skip list as follows:

We start at the first position of the top list

At the current position p, we compare x with y ← key(next(p))

x = y: we return element(next(p))x> y: we scan forwardx <y: we drop down

Example: search for 78

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 47 / 86

Page 57: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Tree representation of a skip list

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 48 / 86

Page 58: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Analyzing Random Skip Lists

A random leveling of the set S is defined as follows:

Given the choice of level Li , the level Li+1 is defined by independentlychoosing to retain each element x ∈ Li with probability

he process starts with L1 = S and terminates when a newlyconstructed level is empty.

alternate view:

let the levels l(x) for x ∈ S be independent random variables, eachwith the geometric distribution with parameter p=1/2.

Let r be maxx∈S(l(x)) + 1

Place x in each of the levels L1, ... , Ll(x).

Like random Treaps, a random level is chosen for every element of Supon its insertion and remains fixed until the element is deleted.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 49 / 86

Page 59: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Analyzing Random Skip Lists

A random leveling of the set S is defined as follows:

Given the choice of level Li , the level Li+1 is defined by independentlychoosing to retain each element x ∈ Li with probability

he process starts with L1 = S and terminates when a newlyconstructed level is empty.

alternate view:

let the levels l(x) for x ∈ S be independent random variables, eachwith the geometric distribution with parameter p=1/2.

Let r be maxx∈S(l(x)) + 1

Place x in each of the levels L1, ... , Ll(x).

Like random Treaps, a random level is chosen for every element of Supon its insertion and remains fixed until the element is deleted.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 49 / 86

Page 60: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Lemma 8.9

The number of levels r in a random leveling of a set S of size n hasexpected value E [r ] = O(logn). Moreover, r = O(logn) with highprobability.

Proof:

r = maxx∈S(l(x)) + 1.

Levels l(x) are i.i.d. random variables distributed geometrically withparameter 1/2.

pr [maxiXi > t] ≤ n(1− p)t = n2t ,

we have p=1/2, with choosing t = αlogn and r = maxixi we have:

pr [r > αlogn] ≤ 1nα−1

for any α > 1.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 50 / 86

Page 61: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

lemma 8.10

Define Ij(Y ) as the interval at level j that contains y.For an interval I at level i + 1, c(I) denotes the number of children it hasat level i.

Lemma 8.9

The number of levels r in a random leveling of a set S of size n hasexpected value E[r] = O(log n). Moreover, r = O(log n) with highprobability .

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 51 / 86

Page 62: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Hash Tables

1 static dictionary: we are given a set of keys S and must organize itinto a data structure that supports the efficient processing of FINDqueries.

2 dynamic dictionary: set S is not provided in advance. Instead it isconstructed by a series of INSERT and DELETE operations that areintermingled with the FIND queries.

Data Structuring problemAll data structures discussed earlier require (logn) time to process anysearch or update operation.

These time bounds are optimal

for data structures based on pointers and search trees we are facedwith a logarithmic lower bound.These time bounds are based on the fact that the only computation wecan perform over the keys is to compare them and thereby determinetheir relationship in the underlying total order.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 52 / 86

Page 63: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Hash Tables

Suppose:

keys in S are chosen from a totally ordered universe M of size m.w.l.o.g, M = 0, ...,m − 1keys are distinct.

The idea:Create an array T [0..m − 1] of size m in which

T[k]=1 if k ∈ ST[k] = NULL otherwise

This is called a direct-address table

Operations take O(1) time.So whats the problem?

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 53 / 86

Page 64: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Direct addressing works well when the range m of keys is relativelysmall.

But what if the keys are 32-bit integers?

Problem 1: direct-address table will have 232 entries, more than 4billion.Problem 2: even if memory is not an issue the time to initialize theelements to NULL may be.

we want to reduce the size of the table to value close to |S |, whilemaintaining the property that a search or update can be performed inO(1) time.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 54 / 86

Page 65: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

A table T consisting of n cells indexed by N = 0, ..., n − 1A hash function h(), which is a mapping from M into N

n < m ,otherwise use direct address table.

collision occurs when: two distinct keys x and y map in A collisionoccurs when: two distinct keys x and y map in the same location, i.e.h(x) = h(y).

Goal: maintain a small table, and use hash function h to map keysinto this table. If h behaves randomly, shouldn’t get too manycollisions.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 55 / 86

Page 66: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Hash Tables Chaining

Chaining puts elements that collide in a linked list:

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 56 / 86

Page 67: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Universal Hash Families

2-universal

Let M = 0, ...,m − 1 and N = 0, ..., n − 1, with m ≥ n.A family H of functions from M into N is said to be 2-universal if, for all x,y ∈ M such that x 6= y , and for h chosen uniformly at random from H,

Pr [h(x) = h(y)] ≤ 1n

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 57 / 86

Page 68: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

define the following indicator function for a collision between the keys xand y under the hash function h:

δ(x , y , h)=

1 for h(x)=h(y) and x 6= y0 otherwise

For all X ,Y ⊆ M, define the following extensions of the indicator functionδ:

δ(x , y ,H) = Σh∈Hδ(x , y , h) ,

δ(x ,Y , h) = Σy∈Y δ(x , y , h) ,

δ(X ,Y , h) = Σx∈X δ(x ,Y , h) ,

δ(x ,Y ,H) = Σy∈Y δ(x , y ,H) ,

δ(X ,Y ,H) = Σh∈Hδ(X ,Y , h) .

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 58 / 86

Page 69: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Note

For a 2-universal family H and any x 6= y , we have δ(x , y ,H) ≤ |H|/n.

Theorem 8.12:

For any family H of functions from M to N, there exist x , y ∈ M such that

δ(x , y ,H) > |H|n −

|H|m

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 59 / 86

Page 70: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Proof of Theorem 8.12

Proof

Fix some function h∈ H, and for each z ∈ N define the set of elements ofM mapped to z as

Az = x ∈ M|h(x) = z

The sets Az , for z ∈ N, form a partition of M. It is easy to verify that

δ(Aw ,Az , h)=

0 w 6= z

|Az |(|Az | − 1) w = z

The total number of collisions between all possible pairs of elements isminimized when these sets Az are all of the same size. We obtain

δ(M,M, h) =∑

z∈N |Az |(|Az | − 1)≥ n(mn (mn − 1)) = m2( 1

n −1m )

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 60 / 86

Page 71: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Proof(Cont.)

Proof(Cont.)

δ(M,M,H) =∑

h∈H δ(M,M, h) ≥ |H|m2( 1n −

1m ) .

By the pigeonhole principle. ∃x , y ∈ M such that:δ(x , y ,H) ≥ δ(M,M,H)

m2

= |H|δ(M,M,h)m2

≥ |H|m2( 1

n− 1

m)

m2

= |H|( 1n −

1m )

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 61 / 86

Page 72: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Lemma 8.13:

For all x ∈ M, S ⊆ M, and random h ∈ H,

E [δ(x ,S , h)] ≤ |S|n

Proof:E (δ(x , S , h)) =

∑h∈H

δ(x ,S ,h)|H|

= 1|H|∑

h∈H∑

y∈S δ(x , y , h)

= 1|H|∑

y∈S∑

h∈H δ(x , y , h)

= 1|H|∑

y∈S δ(x , y ,H)

≤ 1|H|∑

y∈S|H|n

= |S|n .

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 62 / 86

Page 73: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

in Our dynamic dictionary scheme :

Notes

a hash function h ∈ H is chosen uniformly at random, remains fixedduring entire sequence of updates and queries.

An inserted key x is stored at the location h(x),and due to collisions there could be other keys also stored at thatlocation.

The keys colliding at a given location are organized into a linked list

Assuming that the set of keys currently stored in the table is S ⊆ M,

the length of the linked list is δ(x ,S , h), which has expectation |S |/n .

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 63 / 86

Page 74: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Theorem 8.14:

Consider a request sequence R = Rl ,R2 ... Rr of update and searchoperations starting with an empty hash table.Suppose that this sequence contains S INSERT operations.Let ρ(h,R) denote the total cost of processing these requests using thehash function h ∈ H.

Theorem 8.14:

For any sequence R of length r with S INSERTS, and h chosenuniformly at random from a 2-universal family H,

E [ρ(h,R)] ≤ r(1 + sn )

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 64 / 86

Page 75: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Constructing Universal Hash Families

Fix m and n. choose a prime p ≥ m.We will work over the field zp = 0, 1, ..., p − 1.let g : zp → N be the function given by g(x) = x mod n.

For all a, b ∈ zp, define the linear function fa,b : zp → zp and the hashfunction ha,b : zp → N as follows.

fa,b(x)=ax+b mod p.ha,b(x) = g(fa,b(x)) =(ax+b mod p) mod n

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 65 / 86

Page 76: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

We the family of hash functions H = ha,b | a, b ∈ zp with a 6= 0

Lemma 8.15

or all x, y ∈ zp such that x 6= y ,

δ(x , y ,H) = δ(zp, zp, g).

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 66 / 86

Page 77: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

proof

Suppose that x and y collide under a specific function ha,b. Letfa,b(X ) = r and fa,b(y) = s.observe that r 6= s since a 6= 0 and x 6= y.A collision takes place if and onlyif g(r) = g(s), or equivalently, r ≡ s (mod n).

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 67 / 86

Page 78: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Now, having fixed x and y, for each such choice of r 6= s, the values of aand b are uniquely determined by solution of:

ax + b ≡ r (mod p)ay + b ≡ s (mod p)

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 68 / 86

Page 79: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Theorem 8.16:

The family H= ha,b|a, b ∈ Zp with a 6= 0 is a 2-universal family.

Proof:For each z ∈ N, let Az = x ∈ zp with g(x) = z; it is clear that|Az | ≤ dp/ne. In other words, for every r ∈ Zp there are at most dp/nedifferent choices of s ∈ Zp such that g(r)=g(s). Since there are p differentchoices of r ∈ Zp to start with,

δ(ZP ,Zp, g) ≤ p(dpne − 1) ≤ p(p−1)n

lemma 8.15: δ(x , y ,H) = δ(zp, zp, g), This Proof: δ(ZP ,Zp, g) ≤ p(p−1)n ,

so:δ(x , y ,H) ≤ p(p−1)

n . Since |H| = p(p − 1), Therefore: δ(x , y ,H) ≤ |H|n .

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 69 / 86

Page 80: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Definition 8.6

Let M = 0, 1, ...,m− 1 and N = 0, 1, ..., n− 1, with m ≥ n,. A familyH of functions from M into N is said to be strongly 2-universal if for allx1 6= x2 ∈ M, any y1, y2 ∈ N, and h chosen uniformly at random from H,

pr[h(x1) = y1 and h(x2) = y2]= 1n2 .

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 70 / 86

Page 81: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Definition 8.7

Definition

A family of hash functions H = h : M → N, is said to be a perfect hashfamily if for each set S ⊂ M of size s < n there exists a hash function h ∈H that is perfect for S.

Note:It is clear that perfect hash families exist: for example, the family of allpossible functions from M to T, is a perfect hash family.Given a perfect hash family H, we solve static dictionary by:

1 finding h ∈ H perfect for S.

2 storing each key x ∈ S at the location T [h(x)].

3 responding to a search query for a key q by examining the contents ofT [h(q)].

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 71 / 86

Page 82: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

The preprocessing cost:

depends on the cost of identifying a perfect hash function for a specificchoice of S.

search cost:

depends on the time required to evaluate the hash function.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 72 / 86

Page 83: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

since the choice of the hash function will depend on the set S, itsdescription must also be stored in the table.

Suppose that the size of the perfect hash family H is r.

storing the description of a hash function from H will require Ω(log r)bits.

it is essential that the description of the hash function should fit into0(1) locations in the table T.

A cell in the table, can be used to encode at most log m bits ofinformation.

Note

therefore, we will only be interested in constructing hash families whosesize r is bounded by a polynomial in m

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 73 / 86

Page 84: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Exercise 8.13:

Assume for simplicity that n = s. Show that for m = 2Ω(s), there existperfect hash families of size polynomial in m.

Thus, The existence of a perfect hash family is guaranteed only for valuesof m that are extremely large relative to n.

Exercise 8.14:

Assuming that n = s, show that any perfect hash family must have size2Ω(s).

Thus, we need to have m = 2Ω(s), or s = O( 1og m), to guarantee eventhe existence of a perfect hash family of size polynomial in m.Unfortunately, in practice the case s = O(1og m) is not very interesting fortypical values of m, e.g, for m=232.Solution: using double hashing.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 74 / 86

Page 85: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Definition 8.8

Let S ⊂ M and h: M → N. For each table location 0 ≤ i ≤ n − 1, wedefine the bin

Bi (h, S) = x ∈ S | h(x) = i

The size of a bin is denoted by bi (h, S) =| Bi(h, S) |.

Definition 8.9:

A hash function h is b-perfect for S if bi (h,S) ≤ b, for each i. A family ofhash functions h: M → N is said to be a b-perfect hash family if foreach S ⊂ M of size s there exists a hash function h ∈ H that is b-perfectfor S.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 75 / 86

Page 86: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Exercise 8.15:

Show that there exists a b-perfect hash family H such that b = O(log n)and | H |≤ m, for any m ≥ n.

Double hashing:

At the first level we use a (log m)-perfect hash function h to map Sinto the primary table T.

Consider the bin Bi consisting of all keys from S mapped into aparticular cell T[i].

elements of the bin Bi mapped into the secondary table Ti associatedwith that location using a secondary hash function hi .

Since the size of Bi is bounded by b, we can find a hash function hi that isperfect for Bi provided 2b is polynomially bounded in m. For b = O(logm) this condition holds.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 76 / 86

Page 87: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

the double hashing scheme can be implemented with O( 1) query time, forany m ≥ n.

the goal of the primary hash functions should be to create bins smallenough that some perfect hash functions can be used as the secondaryhash functions.

Exercise.8.16:

Consider a table of size r indexed by R=0, ..., r − 1, show that thereexists a perfect hash family H = M → R with | H |≤ m provided thatr = Ω(s2), for all m ≥ s.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 77 / 86

Page 88: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Towards our final solution

We will use a primary table of size n = s, choosing a primary hashfunction that ensures that the bin sizes are small.

the perfect hash functions from Exercise 8.16 are then used toresolve the collisions by using secondary hash tables of size quadraticin the bin sizes,

Total space required by the double hashing scheme

s + O(∑s−1

i=0 b2i )

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 78 / 86

Page 89: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Achieving Bounded Query Time

Our goal now is:

1 to find primary hash functions which ensure that the sum of thesquares of the bin sizes is linear.

2 to find perfect hash functions for the secondary tables, which use atmost quadratic space.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 79 / 86

Page 90: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Definition 8.10:

Consider any V ⊆ M with | V | = v, and let R=0, ..., r − 1 with r ≥ v.For 1 ≤ k ≤ p - 1, define the function hk : M → R as follows,

hk(x)=(kx mod p) mod r .

For each i ∈ R, the bins corresponding to the keys colliding at i aredenoted as

Bi (k , r ,V ) = x ∈ V | hk(x) = i

and their sizes are denoted by bi (k , r ,V ) =| Bi (k, r ,V ) |.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 80 / 86

Page 91: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Lemma 8.17:

For all V ⊆ M of size v, and all r ≥ v,∑p−1k=1

∑r−1i=0

(bi (k, r ,V )

2

)< (p−1)v2

r = mv2

r .

Proof:The left-hand side of (8.2)counts the number of tuples (k, x , y) suchthat hk causes x and y to collide. i.e,

1 x,y ∈ V with x 6= y , and

2 ((kx mod p) mod r) = ((ky mod p) mod r).

The relation between k and x,y is as follows:

k(x − y) mod p ∈ ±r ,±2r ,±3r , ...,±b(p − 1)/rcr

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 81 / 86

Page 92: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

proof(cont.)

Since p is a prime and Zp is a field, for any fixed value of x - y there is aunique solution for k satisfying the equation

k(x-y) mod p= jr

for any value of j. This immediately implies that the number of values of kthat cause a collision between x and y is at most 2(p−1)

r .

Finally, noting that the number of choices of the pair x , y is

(v2

). we

obtain

∑p−1k=1

∑r−1i=0

(bi (k, r ,V )

2

)≤(v2

)2(p−1)

r < (p−1)v2

r

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 82 / 86

Page 93: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Corollary 8.18

For all V ⊆ M of size v, and all r ≥ v, there exists k ∈ 1, ...,m such that

∑r−1i=0

(bi (k , r ,V )

2

)< v2

r .

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 83 / 86

Page 94: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

Theorem 8.19

For any S ⊆ M with | S | = s and m ≥ s, there exists a hash tablerepresentation of S that uses space O(s) and permits the processing of aFIND operation in O( 1) time.

proof:The double hashing scheme is as described above, and all that remains tobe shown is that there are choices of the primary hash function hk and thesecondary hash functions hk1 , ..., hks that ensure the promised performancebounds.

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 84 / 86

Page 95: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

proof(cont.)

Consider first the primary hash function hk . The only property desired ofthis function is that the sum of squares of the colliding sets (the bins) belinear in n to ensure that the space used by the secondary hash tables isO(s).Applying Corollary 8.18 to the case where V = S and R = T, implyingthat v = r = s, we obtain that there exists a k ∈ I , ...,m such that

∑s−1i=0

(bi (k , s,S)

2

)< s.

or that ∑s−1i=0 bi (k , s,S)[bi (k , s, S)− 1)] < 2s.

Since ∪s−1i=0Bi (k , s,S) = S and

∑s−1i=0 bi (k , s,S) = s,

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 85 / 86

Page 96: Data Structure - Yazdcs.yazd.ac.ir/farshi/Teaching/RandAlg3931/Slides/ch8_data... · 2015. 1. 17. · Mohsen Arab (Yazd University ) Data Structure January 13, 2015 26 / 86. left

∑s−1i=0 bi (k, s, S)2 < 2s +

∑s−1i=0 bi (k, s, S) = 3s

Consider now the secondary hash function hki for the set Sj = Bi (k , s, S)of size si . Applying Corollary 8.18 to the case where V = Si (or v = si )and using a secondary hash table of size r=s2

i , it follows that there exists aki ∈ 1, ...,m such that

∑s2i −1j=0

(bj(ki , s

2i ,Si )

2

)< 1.

where b bj(ki , s2i , Si ) is the number of collisions at the jth location of the

secondary hash table for T[i]. This can be the case only when each term ofthe summation is zero, implying that bj(ki , s

2i ,Si ) ≤ 1 for all j. Thus, it

follows that there exists a perfect secondary hash function hki .

Mohsen Arab (Yazd University ) Data Structure January 13, 2015 86 / 86