The Limit of Buffering Dynamic External...

41
1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin Zhang Hong Kong University of Science & Technology Joint work with Zhewei Wei and Ke Yi SPAA 2009

Transcript of The Limit of Buffering Dynamic External...

Page 1: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

1-1

Dynamic External Hashing:The Limit of Buffering

August 13, 2009

Qin Zhang

Hong Kong University of Science & Technology

Joint work with Zhewei Wei and Ke Yi

SPAA 2009

Page 2: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

2-1

(internal) Hashing!

null

nullnull

nullnull

null

nullnullnull

universalhashing

dynamichashing

distributedhashing

externalhashing

perfecthashing

One of the most important datastructures in computer science!

Page 3: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

3-1

External hashing!

A cell has b (= 2) words

null

null

null null

null

externalhashing

universalhashing

dynamichashing

distributedhashing

perfecthashing Extremely useful in

Database System!

nullnull

Page 4: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

4-1

Model and problem

Model: External memory model.Block size b words, cache size mwords. Cost is “number of blocksread/write (I/Os)”

Problem: Maintain a hash table tosupport update and query.

Try to understand “the inherenttradeoff between queries and up-dates”

Hashing

Page 5: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

5-1

Results

Page 6: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

6-1

Hashing in the internal memory is well under-stood (under random inputs).

Knuth, 1970s: tq = 12(1+1/(1−α)), tu = 1+

12(1 + 1/(1− α)2). α: load factor: minimum

storage should be use/storage actually used

Previous results

Page 7: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

6-2

Hashing in the internal memory is well under-stood (under random inputs).

Knuth, 1970s: tq = 12(1+1/(1−α)), tu = 1+

12(1 + 1/(1− α)2). α: load factor: minimum

storage should be use/storage actually used

Previous results

In external memory (random inputs)

Knuth, 1970s: Expected average cost of aquery is 1 + 1/2Ω(b) I/Os, provided the loadfactor α is less than a constant smaller than1. Update has a similar bound.

Page 8: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

7-1

Exact Numbers Calculated by D. E. Knuth

The Art of Computer Programming, volume 3, 1998, page 542

Page 9: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

8-1

Can we batch updates?

However, in external memory model, disk reads/writesare expensive and powerful. Can we hope for lower than1 I/O per update?

Page 10: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

8-2

Can we batch updates?

However, in external memory model, disk reads/writesare expensive and powerful. Can we hope for lower than1 I/O per update?

No, if there is no space in main memory for buffering.But, not the case in reality!

Page 11: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

8-3

Can we batch updates?

However, in external memory model, disk reads/writesare expensive and powerful. Can we hope for lower than1 I/O per update?

No, if there is no space in main memory for buffering.But, not the case in reality!

Maybe yes, if we have an Ω(b) main memory for buffer-ing! Like numerous problems in external memory, e.g.stack. More: priority queue, buffer tree ...

Can the amortized update cost be something likeO(1/bc) (for some 0 < c ≤ 1) for hashing?

Page 12: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

8-4

Can we batch updates?

However, in external memory model, disk reads/writesare expensive and powerful. Can we hope for lower than1 I/O per update?

No, if there is no space in main memory for buffering.But, not the case in reality!

Maybe yes, if we have an Ω(b) main memory for buffer-ing! Like numerous problems in external memory, e.g.stack. More: priority queue, buffer tree ...

Can the amortized update cost be something likeO(1/bc) (for some 0 < c ≤ 1) for hashing?

Conjectured by Jensen and Pagh (2007):

The insertion cost must be Ω(1) I/Os if the query cost is

required to be O(1) I/Os.

Page 13: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

9-1

Our results

1 + 1/2Ω(b)

1−O(1/b(c−1)/6)

Ω(bc−1)

O(bc−1)

Ω(1)

O(1)

Insertion

Query

1 + Θ(1/b)

1 + Θ(1/bc), c < 11

upper bounds

lower bounds

1+Θ(1/bc)c > 1

expectedamortized

expected average

Due to Knuth (on random inputs)

successful queries

Page 14: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

9-2

Our results

1 + 1/2Ω(b)

1−O(1/b(c−1)/6)

Ω(bc−1)

O(bc−1)

Ω(1)

O(1)

Insertion

Query

1 + Θ(1/b)

1 + Θ(1/bc), c < 11

upper bounds

lower bounds

1+Θ(1/bc)c > 1

expectedamortized

expected average

Due to Knuth (on random inputs)

successful queries

Page 15: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

9-3

Our results

1 + 1/2Ω(b)

1−O(1/b(c−1)/6)

Ω(bc−1)

O(bc−1)

Ω(1)

O(1)

Insertion

Query

1 + Θ(1/b)

1 + Θ(1/bc), c < 11

upper bounds

lower bounds

1+Θ(1/bc)c > 1

Almost a completeunderstanding forsuccessful queries!

expectedamortized

expected average

Due to Knuth (on random inputs)

successful queries

Page 16: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

10-1

Other related results

Upper bounds

Remove ideal hash function assumption [Carter and Wegman1979], making query worst-case [i.e. Fredman, Komlos andSzemeredi, 1984] ... (internal)

Queries and updates in 1 + O(1/b12 ) I/Os with α = 1 −

O(1/b12 ) [Jensen and Pagh, 2007]. (external, no memory)

Page 17: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

10-2

Other related results

Upper bounds

Remove ideal hash function assumption [Carter and Wegman1979], making query worst-case [i.e. Fredman, Komlos andSzemeredi, 1984] ... (internal)

Lower bounds (internal)

Very sparse, only with some strong requirements, e.g., the al-gorithm is deterministic and query is worst-case [Dietzfelbingeret. al. 1994].

Queries and updates in 1 + O(1/b12 ) I/Os with α = 1 −

O(1/b12 ) [Jensen and Pagh, 2007]. (external, no memory)

Page 18: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

10-3

Other related results

Upper bounds

Remove ideal hash function assumption [Carter and Wegman1979], making query worst-case [i.e. Fredman, Komlos andSzemeredi, 1984] ... (internal)

Lower bounds (internal)

Very sparse, only with some strong requirements, e.g., the al-gorithm is deterministic and query is worst-case [Dietzfelbingeret. al. 1994].

Lower bounds in other dynamic external memory problems

Only known are query-update tradeoffs for the predecessor[Fagerberg and Brodal 2003], range reporting [Yi 2009].

Queries and updates in 1 + O(1/b12 ) I/Os with α = 1 −

O(1/b12 ) [Jensen and Pagh, 2007]. (external, no memory)

Page 19: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

11-1

Technical details:

Lowerbounds

Page 20: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

12-1

Preliminaries

U = 0, 1, . . . , u− 1: universe. |U | = u.

m: size of main memory. n: total number of items.b: size of one block. All in words.

Page 21: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

12-2

Preliminaries

U = 0, 1, . . . , u− 1: universe. |U | = u.

Some mild assumptions

Atomic elements

n ≥ Ω(m log u · b2c

)for some constant c > 0

m: size of main memory. n: total number of items.b: size of one block. All in words.

Page 22: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

12-3

Preliminaries

U = 0, 1, . . . , u− 1: universe. |U | = u.

Deterministic data structure + a random distrib. of inputs

(Via a method similar to Yao’s Minimax Principle) =⇒Randomized data structure

Some mild assumptions

Atomic elements

n ≥ Ω(m log u · b2c

)for some constant c > 0

m: size of main memory. n: total number of items.b: size of one block. All in words.

Page 23: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

13-1

Observations

Two extreme cases

One exterme: only use afixed mapping for all items.

1, 11

21, 31

32 13, 73

43, 23

63, 33

null

Update is expensive!b = 2

15, 55

45

Page 24: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

13-2

Observations

Two extreme cases

One exterme: only use afixed mapping for all items.

1, 11

21, 31

32 13, 73

43, 23

63, 33

null

Update is expensive!

Another exterme: for every bitems come, write to a newblock.

Too manypossiblemappings.b = 2

15, 55

45

Page 25: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

13-3

Observations

Two extreme cases

One exterme: only use afixed mapping for all items.

1, 11

21, 31

32 13, 73

43, 23

63, 33

null

Update is expensive!

Also easy to see

If with only the information in memory, the hash table cannotlocate the item, then querying it takes at least 2 I/Os.

Another exterme: for every bitems come, write to a newblock.

Too manypossiblemappings.b = 2

15, 55

45

Page 26: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

14-1

The abstraction

Consider the layout of a hash table at any snapshot. Denoteall the blocks on disk by B0, B1, B2, . . . , Bd (B0 = M). Letf : U → 0, 1, . . . , d be any function computable withinmemory.

When querying x ∈ U , f(x): index of the first block the DSwill probe. If f(x) = 0, the DS will still probe the memory.

Page 27: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

14-2

The abstraction

Consider the layout of a hash table at any snapshot. Denoteall the blocks on disk by B0, B1, B2, . . . , Bd (B0 = M). Letf : U → 0, 1, . . . , d be any function computable withinmemory.

When querying x ∈ U , f(x): index of the first block the DSwill probe. If f(x) = 0, the DS will still probe the memory.

Memory zone M : set of itemsstored in memory. tq = 0.

Fast zone F : set ofitems x such thatx ∈ Bf(x). tq = 1.

Slow zone S:The rest of items.tq ≥ 2.

Memory

Disk

We divide items inserted into 3 zones with respect to f .

Page 28: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

15-1

The key idea

The hash table can employ a family F of at most2m log u distinct f ’s.

Note that the current f adopted by the hash ta-ble is dependent upon the already inserted items,but the family F has to be fixed beforehand.

Page 29: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

16-1

Size of the slow zone is small.

Suppose the hash table answers a successful query withan expected average cost of tq = 1 + δ I/Os. Considerthe snapshot when k random items have been inserted.

E[|S|] ≤ m + δk. Memory zone M : tq = 0

Fast zone F : tq = 1 Slow zone S: tq ≥ 2

small!

Page 30: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

16-2

Size of the slow zone is small.

Suppose the hash table answers a successful query withan expected average cost of tq = 1 + δ I/Os. Considerthe snapshot when k random items have been inserted.

E[|S|] ≤ m + δk.

The high-probability version.

LEMMA 1. Let φ ≥ 1/b(c−1)/4 and let k ≥ φn. Atthe snapshot when k items have been inserted, withprobability at least 1− 2φ, |S| ≤ m + δ

φk.

Memory zone M : tq = 0

Fast zone F : tq = 1 Slow zone S: tq ≥ 2

small!

Page 31: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

17-1

Basic idea of the lower bound proof

Consider any f : U → 0, 1, . . . , d. For i = 0, . . . , d,let αi = |f−1(i)|/u, and we call (α0, α1, . . . , αd) thecharacteristic vector of f .

Page 32: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

17-2

Basic idea of the lower bound proof

Consider any f : U → 0, 1, . . . , d. For i = 0, . . . , d,let αi = |f−1(i)|/u, and we call (α0, α1, . . . , αd) thecharacteristic vector of f .

After φn random insertions. Pick a fixed threshold ρ.Assume α0 ≥ α1 ≥ . . . ≥ αd

α0 α1 α2 αk αd

ρ

f

If ∃ too many large αi’s,S too large, violatingthe query requirement(LEMMA 1).

(A)

Page 33: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

17-3

Basic idea of the lower bound proof

Consider any f : U → 0, 1, . . . , d. For i = 0, . . . , d,let αi = |f−1(i)|/u, and we call (α0, α1, . . . , αd) thecharacteristic vector of f .

After φn random insertions. Pick a fixed threshold ρ.Assume α0 ≥ α1 ≥ . . . ≥ αd

α0 α1 α2 αk αdf αj

ρElse f is likely to distributerecent randomly inserteditems evenly, leading to ahigh insertion cost.

(B)

Page 34: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

17-4

Basic idea of the lower bound proof

Consider any f : U → 0, 1, . . . , d. For i = 0, . . . , d,let αi = |f−1(i)|/u, and we call (α0, α1, . . . , αd) thecharacteristic vector of f .

After φn random insertions. Pick a fixed threshold ρ.Assume α0 ≥ α1 ≥ . . . ≥ αd

α0 α1 α2 αk αd

ρ

f

If ∃ too many large αi’s,S too large, violatingthe query requirement(LEMMA 1).

αj

ρElse f is likely to distributerecent randomly inserteditems evenly, leading to ahigh insertion cost.

Both hold with very high probability, even after taking unionof all O(2m log u) different f .

(A) (B)

Page 35: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

18-1

Upper bounds

Easy!

Logarithmic method

+ Query start from the last (biggest) layer

+ Tricks to keep the last layer large

Page 36: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

19-1

Beyond hashing:Subsequent and future work

Page 37: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

20-1

Beyond hashing

Hashing (successful)

Page 38: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

20-2

Beyond hashing

Hashing (successful)

Membership

Problem: Maintain a set S ⊆ U . Given anx ∈ U , answer x ∈ S?

Again, tradeoffs between update and query

Membership: if tq ≤ 1 + δ(0 ≤ δ < 1/2), then tu ≥ Ω(1)[Yi and Zhang 2009]

(1) Without atomic assumption(2) Consider both successful andunseccessful query

Page 39: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

20-3

Beyond hashing

Hashing (successful)

Membership

Problem: Maintain a set S ⊆ U . Given anx ∈ U , answer x ∈ S?

Again, tradeoffs between update and query

Membership: if tq ≤ 1 + δ(0 ≤ δ < 1/2), then tu ≥ Ω(1)[Yi and Zhang 2009]

(1) Without atomic assumption(2) Consider both successful andunseccessful query

General Membership

General Hashing

Page 40: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

21-1

Lower bounds of other dynamic problems in thecell probe with cache setting.

1. predecessor, range-sum

2. union-find

3. . . .

More problems

Page 41: The Limit of Buffering Dynamic External Hashinghomes.sice.indiana.edu/qzhangcs/papers/hashinglbslides.pdf1-1 Dynamic External Hashing: The Limit of Buffering August 13, 2009 Qin

22-1

The End

T HANK YOUQ and A

The Banff National Park