Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of...
-
Upload
arlene-ford -
Category
Documents
-
view
216 -
download
0
Transcript of Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of...
![Page 1: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/1.jpg)
Lars Arge
Presented by Or Ozery
![Page 2: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/2.jpg)
I/O ModelPreviously defined:
N = # of elements in inputM = # of elements that fit into memoryB = # of elements per block
Measuring in terms of # of blocks:n = N / Bm = M / B
2The Buffer Tree
![Page 3: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/3.jpg)
I/O Model vs. RAM Model
RAM Model I/O Model
Scanning Θ(N) Θ(n)
List merging Θ(N) Θ(n)
Sorting Θ(N log 2 N) Θ(n log m n)
Searching Θ(log 2 N) Θ(log B N)
Sorting using a B-tree
Θ(N log 2 N) Θ(N log B N)
3The Buffer Tree
![Page 4: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/4.jpg)
Online vs. BatchedOnline Problems Batched ProblemsA single command is
given each time.Must be processed
before other commands are given.
Should be performed in a good W.C. time.
For example: Searching.
A stream of commands is given.
Can perform commands in any legal order.
Should be performed in a good amortized time.
For example: Sorting.
The Buffer Tree 4
![Page 5: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/5.jpg)
MotivationWe’ve seen that using an online-efficient data
structure (B-tree) for a batched problem (sorting) is inefficient.
We thus would like to design a data structure for efficient use on batched problems, such as:SortingMinimum reporting (priority queue)Range searchingInterval stabbing
The Buffer Tree 5
![Page 6: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/6.jpg)
The Main IdeaThere are 2 reasons why B-tree sorting is
inefficient:We work element-wise instead of block-wise.We don’t take advantage of the memory size m.
We can fix both problems by using buffers:It allows us to accumulate elements into
blocks.Using buffers of size Θ(m), we fully utilize the
memory.
The Buffer Tree 6
![Page 7: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/7.jpg)
The Buffer Tree
(m/4, m)-tree ⇒ branching factor Θ(m).Elements are stored in leaves, in blocks ⇒ O(n)
leaves.Each internal node has a buffer of size m.
The Buffer Tree 7
![Page 8: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/8.jpg)
Basic PropertiesThe height of the tree is O(log m n).The number of internal nodes is O(n/m).From now on define:
Leaf nodes: nodes that have children which are leaves.
Internal nodes: nodes that are not leaf nodes.
The buffer tree uses linear space:Each leaf takes O(1) space ⇒ O(n) space.Each node takes O(m) space ⇒ O(n) space.
The Buffer Tree 8
![Page 9: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/9.jpg)
Processing CommandsWe wait until we have a block of commands, then
we insert it to the buffer of the root.Because we process commands in a lazy way, we
need to time-stamp them.
When the buffer of the root gets full, we empty it, using a buffer-emptying process (BEP):We distribute elements to the buffers one level
down.If any of the child buffers gets full, we continue in
recursion.
The Buffer Tree 9
![Page 10: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/10.jpg)
Internal Node BEP1. Sort the elements in the buffer while deleting
corresponding insert and delete elements.2. Scan through the sorted buffer and distribute
the elements to the appropriate buffers one level down.
3. If any of the child buffers is now full, run the appropriate BEP recursively.
Internal node BEP takes O(x + m), where x is the number of elements in the buffer.
The Buffer Tree 10
![Page 11: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/11.jpg)
Leaf Node BEP1. Sort the elements in the buffer as for internal nodes.2. Merge the sorted buffer with the leaves of the node.3. If the number of leaves increased:
1. Place the smallest elements in the leaves of the node.2. Repeatedly insert one block of elements and rebalance.
4. If the number of leaves decreased:1. Place the elements in sorted order in the leaves, and
append “dummy-blocks” at the end.2. Repeatedly delete one dummy block and rebalance.
The Buffer Tree 11
![Page 12: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/12.jpg)
Rebalancing - Fission
The Buffer Tree 12
![Page 13: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/13.jpg)
Rebalancing - Fusion
The Buffer Tree 13
![Page 14: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/14.jpg)
Rebalancing CostRebalancing starts when inserting/deleting a block.The leaf node which sparked the rebalancing, will
not cause rebalancing for the next O(m) inserts/deletes.
Thus the total number of rebalancing operations on leaf nodes is O(n/m).
Each rebalancing operation on a leaf node can span O(log m n) rebalancing operations.
So there are O((n/m) log m n) rebalancing operations, each costs O(m) ⇒ Rebalancing takes O(n log m n).
The Buffer Tree 14
![Page 15: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/15.jpg)
Summing UpWe’ve seen rebalancing takes O(n log m n).BEP cost:
BEP of full buffers is linear in the number of blocks in the buffer ⇒ Each element pays O(1/B) to be pushed one level down the tree.
Because there are O(log m n) levels in the tree, each element pays O(log m n / B) ⇒ BEP takes O(n log m
n).
Therefore, a sequence of N operations on an empty buffer tree takes O(n log m n).
The Buffer Tree 15
![Page 16: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/16.jpg)
SortingAfter inserting all N items to the tree, we
need to empty all the buffers. We do this in a BFS order.
How much does emptying all buffers cost?Emptying a buffer takes O(m) amortized.There are O(n/m) buffers ⇒ Total cost is O(n).
Thus sorting using a buffer tree takes O(n log m n).
The Buffer Tree 16
![Page 17: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/17.jpg)
Priority QueueWe can easily transform our buffer tree into a PQ
by adding support for a delete-min operation:The smallest element is found on the path from the
root to the leftmost leaf.Therefore a delete-min operation will empty all the
buffers on the above path in O(m log m n).To make-up for the above cost, we delete the M/4
smallest elements and keep them in memory.This way we can answer the next M/4 delete-min’s
free.
Thus our PQ supports N operations in O(n log m n).
The Buffer Tree 17
![Page 18: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/18.jpg)
Time-Forward Processing
The problem:We are given a topologically ordered DAG.For each vertex v there is a function fv which
depends on all fu where u is a predecessor of v.
The goal is to compute fv for all v.
The Buffer Tree 18
![Page 19: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/19.jpg)
TWP Using Our PQ1. For each vertex v (sorted in topological
order):1. Extract the minimum d-(v) elements from the
PQ.2. Use the extracted elements to compute fv.
3. For each edge (v, u) insert fv in the PQ with priority u.
The above works in O(n log m n).
The Buffer Tree 19
![Page 20: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/20.jpg)
Buffered Range TreeWe want to extend our tree to support range
queries:Given an interval [x1, x2], report all elements of the tree
that our contained in it.
How will we distribute the query elements when emptying a buffer?As long as the interval is contained in a sub-tree, send
the query element to the root buffer of that sub-tree.Otherwise, we split the query into its 2 query elements,
and report the elements in the relevant sub-trees.
The Buffer Tree 20
![Page 21: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/21.jpg)
Time Order RepresentationWe say that a list of elements is in time order
representation (TOR) if it’s of the form D-S-I, where:D is a sorted list of delete elements.S is a sorted list of query elements.I is a sorted list of insert elements.
Lemma 1:A non-full buffer can be brought into TOR in O(m
+ r) where r · B is the number of queries reported in the process.
The Buffer Tree 21
![Page 22: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/22.jpg)
Merging of TOR ListsLemma 2:
Let S1 and S2 be TOR lists such that all elements of S2 are older then the elements of S1.
S1 and S2 can be merged into a TOR list in O(s1 + s2 + r) where s1 and s2 are the size in blocks of S1 and S2 and r · B is the number of queries reported in the process.
The Buffer Tree 22
![Page 23: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/23.jpg)
Proof of Lemma 2Let Sj = dj - sj - ij.
The Buffer Tree 23
d2 s2 i2 d1 s1 i1
d2 s2 d1 i2 s1 i1
d2 d1 s2 i2 s1 i1
d2 d1 s2 s1 i2 i1
d s i
Time
![Page 24: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/24.jpg)
Full Sub-Tree ReportingLemma 3:
All buffers of a sub-tree with x leaves can be emptied and collected to a TOR list in O(x + r).
Proof:1.For each level, prepare a TOR list of its
elements.2.Merge the TOR lists of all levels.
The Buffer Tree 24
After step 1 After step 2
![Page 25: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/25.jpg)
Internal Node BEP1. Compute the TOR of the buffer.2. Scan the delete elements and distribute them.3. Scan the range search elements and determine which sub-trees
should have their elements reported.4. For each such sub-tree:
1. Remove the delete elements from (2) and store them in temporary place.
2. Collect the elements of the sub-tree into TOR.3. Merge this TOR with the TOR of the removed delete elements.4. Distribute the insert and delete elements to leaf buffers.5. Merge a copy of the leaves with the TOR.6. Remove the range search elements from the TOR.7. Report the resulting elements to whoever needs it.
5. Distribute the range search elements.6. Distribute the insert elements (if sub-tree was emptied, to leaf
buffers).7. If any child buffer got full, apply the BEP recursively.
The Buffer Tree 25
![Page 26: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/26.jpg)
Leaf Node BEP1. Construct the TOR of the elements in the
buffer.2. Merge the TOR with the leaves.3. Remove all range search elements and
continue the BEP as in the normal buffer tree.
The Buffer Tree 26
![Page 27: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/27.jpg)
AnalysisThe main difference from the normal buffer
tree is the action of reporting all elements of a sub-tree.
By lemma 3, this action has a linear cost.We thus can split this cost between the delete
elements and query elements, as each element gets either deleted or reported.
Thus, a series of N operations on our buffered range tree costs O(n log m n + r).
The Buffer Tree 27
![Page 28: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/28.jpg)
Orthogonal Line Intersection
The problem:Given N line segments parallel to the axes,
report all intersections of orthogonal segments.
The Buffer Tree 28
![Page 29: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/29.jpg)
OLI Using Our Range TreeSort the segments, once by their top y coordinate,
and once by their bottom y coordinate.1.Merge the 2 sorted list of segments:
1. When encountering a top coordinate of a vertical segment, insert its x coordinate to the tree.
2. When encountering a bottom coordinate of a vertical segment, delete its x coordinate from the tree.
3. When encountering a horizontal segment, insert a query for its endpoints.
The above takes an optimal O(n log m n + r).
The Buffer Tree 29
![Page 30: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/30.jpg)
Buffered Segment TreeWe switch parts between points and
intervals:We insert and delete intervals from the tree.We use points as queries to get reported on all
intervals stabbed by a point.
We assume the intervals has (distinct) endpoints from a fixed given set E of size N.
The elements in leaves will be the points of E.We build our tree bottom-up in O(n).
The Buffer Tree 30
![Page 31: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/31.jpg)
Buffered Segment Tree
Define: slabs, multi-slabs, short/long segments.
The Buffer Tree 31
A BC D E F
![Page 32: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/32.jpg)
Internal Node BEP1. Repeatedly load m/2 blocks of elements into
memory, and perform the following:1. For every multi-slab list insert the relevant long
segments.2. For every multi-slab list that is stabbed by a point,
report intervals and remove expired ones.3. Distribute segments and queries.
2. If there’s a full child buffer, apply BEP recursively.
The above costs O(m + x + r) = O(x + r) amortized.
The Buffer Tree 32
![Page 33: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/33.jpg)
AnalysisBecause the tree structure is static, there is no
rebalancing, and also no emptying of non-full buffers.
Therefore the only cost is emptying of full buffers, which is linear.
Thus a series of N operations on our segment tree takes O(n log m n + r).
A write (flush) operation takes O(n log m n).
Therefore we have the desired O(n log m n + r).
The Buffer Tree 33
![Page 34: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/34.jpg)
Batched Range Searching
The problem:Given N points and N axis parallel rectangles
in the plane, report all points inside each rectangle.
The Buffer Tree 34
![Page 35: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/35.jpg)
BRS Using Our Segment Tree1. Sort points and rectangles by their top y
coordinate.2. Scan the sorted list:
1. For each rectangle, insert the interval that corresponds to its horizontal side, with a delete time matching its bottom y coordinate.
2. For each point, insert a stabbing query.
3. Flush the tree (empty all buffers).
The above takes an optimal O(n log m n + r).
The Buffer Tree 35
![Page 36: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/36.jpg)
Pairwise Rectangle Intersection
The problem:Given N axis parallel rectangles in the plane,
report all intersecting pairs.
The Buffer Tree 36
![Page 37: Lars Arge Presented by Or Ozery. I/O Model Previously defined: N = # of elements in input M = # of elements that fit into memory B = # of elements per.](https://reader037.fdocuments.net/reader037/viewer/2022103100/56649f075503460f94c1cf91/html5/thumbnails/37.jpg)
PRI Using Our Segment Tree2 rectangles in the plane intersect ⇔ one of
the following holds:1. They have intersecting edges.2. One contains the other ⇒ One contains the
other’s midpoint.
We have shown an O(n log m n + r) solution for both (1) and (2).
Therefore we have an optimal O(n log m n + r) solution for the PRI problem.
The Buffer Tree 37