Equivalence Between Priority Queues and Sorting in External Memory
-
Upload
agnes-bailey -
Category
Documents
-
view
257 -
download
0
description
Transcript of Equivalence Between Priority Queues and Sorting in External Memory
![Page 1: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/1.jpg)
Equivalence Between Priority Queues and Sorting in External Memory
Zhewei WeiRenmin University of China
MADALGO, Aarhus University
Ke YiThe Hong Kong University of Science and Technology
![Page 2: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/2.jpg)
Priority Queue
• Maintain a set of keys• Support insertions, deletions and findmin
(deletemin)• Fundamental data structure• Used as subroutines in greedy algorithms– Dijkstra’s single source shortest path algorithm– Prim’s minimum spanning tree algorithm
![Page 3: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/3.jpg)
Sorting to Priority Queue
• Priority queue can do sorting• Given N unsorted keys– Insert the keys to the priority queue– Perform N deletemin operations (find minimum
and delete it)• If a priority queue can support insertion,
deletion, findmin in S(N) time, then the sorting algorithm runs in O(NS(N)) time.
![Page 4: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/4.jpg)
Priority Queue to Sorting
• Thorup [2007]: sorting can do priority queue!A sorting algorithm sorts N keys in
N*S(N) time in RAM model
• O(Nloglog N) sorting -> O(loglog N) priority queue
• O() sorting -> O() priority queue
A priority queue support all operations in O(S(N)) time
Use sorting algorithm as a black box
![Page 5: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/5.jpg)
The I/O Model [Aggarwal and Vitter 1988]
DiskMemor
yCPU
Block
• Complexity: # of block transfers (I/Os)• CPU computations and memory accesses are free
Size: M Unlimited sizeSize: B
![Page 6: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/6.jpg)
Cache-Oblivious Model
DiskMemor
yCPU
Block
• Optimal without knowledge of M and B • Optimal for all M and B
Size: ?
Unlimited sizeSize: ?
![Page 7: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/7.jpg)
Sorting in the I/O Model
• Sorting bound:
• Upper bound: external merge sort• Lower bound: holds for comparison model or
indivisibility assumption• Conjecture: lower bound holds for B not too
small, even without indivisibility assumption
Sort(N)= Θ(N/B * logM/BN ) I/Os
Treat keys as atoms
![Page 8: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/8.jpg)
Priority Queue in External Memory
• Tree-based: do not give any priority queue-to-sorting reduction
O(1/B*logM/BN ) amortized cost
• I/O model– Buffer tree [Arge 1995]– M/B-ary heaps [Fadel et. al. 1999]– Array heaps[Brodal and Katajainen 1998]
![Page 9: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/9.jpg)
Priority Queue in External Memory
• Cache-oblivious priority queue [Arge et.al. 2002]
• Keys are moving around in loglog N levels
O(1/B*logM/BN) with tall cache assumption
M>B2
• Reduction: Given an external sorting algorithm that sorts N keys in NS(N)/B I/Os, there is an external priority queue that support all operations in O(S(N)loglog N/B) amortized I/Os
![Page 10: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/10.jpg)
Our Results
• S(N)/B for S(N) = Ω(2log*N), or M = Ω(B*log(c)N)• Other wise O((S(N) log*N) /B)• No new bounds for external priority queue• External priority queue lower bound -> external
sorting lower bound
A sorting algorithm sorts N keys in N*S(N)/B time in the I/O model
A priority queue support all operations in 1/B*Σi≥0S(Blog(i)(N/B)) amortized I/Os
Use sorting algorithm as a black box
S(N) + S(B*log N) + S(B*loglog N)) + …
![Page 11: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/11.jpg)
Outline
• How Thorup did it (on a high level)
• How we extend it in external memory (on a high level)
• Open problems
![Page 12: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/12.jpg)
Thorup’s Reduction
• Word RAM model: – each word consists of w ≥ log N bits– constant number of registers, each with capacity
for one word
• Atomic heap [Han 2004]: support insertions, deletions, and predecessor queries in set of O(log2 N) size in constant time
![Page 13: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/13.jpg)
Thorup’s Reduction – O(S(N)*log N)
O(log N) levels
…
N keys
N/2 keys
c keys
2c keys
N/4 keys
Keep min in the head
Invariant: Keys in higher level are larger than keys in Lower level
![Page 14: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/14.jpg)
Thorup’s Reduction – O(S(N)*log N)
• Rebalance cost for level 2j: 2j*S(N) • # of sorts in N updates: N/2j
• Amortized cost in level 2j: S(N)• log N levels
N keys
N/2 keys
c keys
2c keys
N/4 keysO(log N) levels
…
Cost: O(S(N)*logN)
![Page 15: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/15.jpg)
Thorup’s ReductionN/log N base sets
N/2log Nbase sets
1 base sets
2 base sets
N/4log NBase sets
log NSplit/merge base sets: S(N) amortized Rebalancing level 2j: 2jS(N)/log N# of rebalance in N updates: N/2j Amortized cost for level 2j: S(N)/log N
…
O(log N) levels
O(S(N)) Amortized
cost
![Page 16: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/16.jpg)
Thorup’s ReductionN/log N base sets
N/2log Nbase sets
1 base sets
2 base sets
N/4log NBase sets
Atomic heapof size log N
log NSplit/merge base sets: S(N) amortized Rebalancing level 2j: 2jS(N)/log N# of rebalance in N updates: N/2j Amortized cost for level 2j: S(N)/log N
…
O(1) cost
O(S(N)) Amortized
cost
![Page 17: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/17.jpg)
Thorup’s Reduction
Amortized Cost: O(S(N))
Atomic heapof size log N
N/log N base sets
N/2log Nbase sets
1 base sets
2 base sets
N/4log NBase sets
Atomic heap of size log N
Buffer size: N/log N
Buffer size: N/2log N
Buffer size: N/4log N
…
O(S(N)) Amortized
cost
O(1) cost
![Page 18: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/18.jpg)
Externalize Thorup’s Reduction
• Where does B come in?
• How to replace atomic heap?
• How to handle deletions in external memory?
![Page 19: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/19.jpg)
Where does B come in?
Bufferof size B*log N
N/Blog N base sets
N/2Blog Nbase sets
1 base sets
2 base sets
N/4Blog NBase sets
Buffer size: N/log N
Buffer size: N/2log N
Buffer size: N/4log NB*log N
…
![Page 20: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/20.jpg)
I/O-efficient Flush OperationBuffer size |R|
k substructures
• Sort keys in buffer: O(R*S(R)/B)• Distribute keys to k substructures: O(R/B+k)
Total I/O cost: O(RS(N)/B + k)
• If k =O(R/B), total flush cost is O(RS(N)/B), amortized cost is O(S(N)/B)
![Page 21: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/21.jpg)
Where does B come in?
Base sets: 2j/(Blog N) Buffer size: 2j/log N
B*log N
… Amortized I/O cost for flushing level buffers: O(S(N)/B)
If a level holds 2j keysLargest buffer size: 2j/log NLargest # of base sets: 2j/Blog NSmallest base set (head) size: B*log N
![Page 22: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/22.jpg)
Replacing Atomic HeapR = B*log N
k = log N
Bufferof size B*log N
…
![Page 23: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/23.jpg)
Replacing Atomic Heap
Head of size O(Blog N)
Amortized I/O cost:
O(S(N)/B)
Bufferof size B*log N
…Recursively build the structure in the head
![Page 24: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/24.jpg)
Recursively Build LayersN keys
B*log (N/B) keys
cB keys
2^c*B keys
B*loglog(N/B) keys
O(log* N) Layers
… Levels rebalancing- Move base sets around - Redistribute buffer- S(N)/(Blog N) for one level- S(N)/B for one layer- S(N)log* N/B amortized I/O cost
![Page 25: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/25.jpg)
Recursively Build LayersN keys
B*log (N/B) keys
cB keys
2^c*B keys
B*loglog(N/B) keys
O(log* N) Layers
…
Layers Rebalancing- Rebuild the first (last) level- S(N)/B for one layer- S (N)log* N/B amortized I/O cost
![Page 26: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/26.jpg)
Recursively Build LayersN keys
B*log (N/B) keys
cB keys
2^c*B keys
B*loglog(N/B) keys
O(log* N) Layers
…
![Page 27: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/27.jpg)
Recursively Build LayersN keys
B*log (N/B) keys
cB keys
2^c*B keys
B*loglog(N/B) keys
Memorybufferof sizeO(B)
R = Bk = log* N
…
![Page 28: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/28.jpg)
Recursively Build LayersN keys
B*log (N/B) keys
cB keys
2^c*B keys
B*loglog(N/B) keys
Memorybufferof sizeO(B)
Amortized cost: log* N/B
I/O cost per update: O(S(N)log* N/B)
…
![Page 29: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/29.jpg)
Handle Deletions
• Follow a pointer to perform deletion takes 1 I/O per deletion
• Deleting signals: Delete x -> Insert (-, x)
• Perform actual deletion afterwards• Unlike buffer tree, we don’t have access to the
“leaves”(base sets)• Invariant: Only process deleting signals in the
head
![Page 30: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/30.jpg)
Schedule
• Avoid repeated sorting• If head or memory buffer unbalanced:– Flush stage: flush all overflowed buffers and
rebalance all unbalanced base sets– Push stage: rebalance all overflowed layers and
levels (expand)– Pull stage: deal with delete signals and rebalance
all underflowed layers and levels (shrink)
![Page 31: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/31.jpg)
Open problems
• Optimal reduction? – Priority queue that support insertions/deletions in
O(1/B) I/O cost for set of size O(B*log(c) N)– New reduction framework
• Better (than loglog N) reduction in Cache-oblivious model?– Hard to do I/O-efficient flushing and rebalancing
without knowing B
![Page 32: Equivalence Between Priority Queues and Sorting in External Memory](https://reader035.fdocuments.net/reader035/viewer/2022070605/5a4d1ad07f8b9ab0599713f8/html5/thumbnails/32.jpg)
Thank You!