I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries
description
Transcript of I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries
I/O-Efficient Structures for OrthogonalRange Max and Stabbing Max Queries
Second Year Project Presentation
Ke Yi
Advisor: Lars Arge
Committee: Pankaj K. Agarwal and Jun Yang
2
Problem Definition: Range Max Queries
• Range-aggregate queries: range-count, range-sum, range-max
• N points in Rd
• Each point p is associated with a weight w(p)
• Query rectangle Q
• Compute max{w(p) | pQ}
• Static and dynamic
3
Problem Definition: Stabbing Max Queries
q
yx• N hyper-rectangles in Rd
• Each rectangle γ is associated with a weight w(γ)
• Query point q
• Compute max{w(γ) | qγ}
4
Model• I/O Model
– N : Elements in structure
– B : Elements per block
– M : Elements in main memory
– n = N/B
• Assumptions
– M>B2
– Each word holds log2N bits
– Any coordinate or weight can be stored in one word
D
P
M
Block I/O
5
Related Work & Our Results: Range Queries• 1D range queries are easy: B-tree
* O(n) space, O(logBn) query & update
• 2D range queries:
– Poly-logarithmic query: CRB-tree [AAG03]
* O(nlogBn) space, O(log2Bn) query
– Linear space: kdB-tree, cross-tree, O-tree
* query, O(logBn) update
• Our results:
)( nO
6
Related Work & Our Results: Stabbing Queries• 1D stabbing queries
– SB-tree [YW01]
* O(n) space, O(logBn) query & insert
* Does not allow deletions!
• 2D stabbing queries
– No structures with worst-case guarantee
• Our results:
q
yx
7
2D Range Max Queries• The external version of Chazelle’s structure [C88]
– Linear space,
– Static: O(log1+εN) query
– Dynamic: O(log3N log log N) query & update
• Overall structure
– A normal B-tree Φ on y-coordinates of all the points
– A Fan-out base B-tree T on x-coordinates
* Pv: all points stored in the subtree of v
* Each internal node v stores two secondary structures Cv, Mv storing information about Pv in a compressed manner
* Cv and Mv of size O(|Pv| / logBn) → linear size in total
* Weights of points stored at leaves explicitly
)( B
8
2D Range Max Queries
• Cv borrowed from CRB-tree
– Compute the ranks of the points one level down in O(1) I/Os
– Identify the weight of a point explicitly in O(logBn) I/Os
• Mv computes the maximum weight in a multislab inO(logBn) I/Os
• Answering a query:
– Use Φ to compute the ranksin the root of T
– Use Mv to compute maximumat each level
– For a total of O(log2Bn) I/Os
)( B
v
v1 v2 v3 v4 v5 v6
9
2D Range Max Queries: Mv
• Divide Pv into chunks of BlogBN
• Divide each chunk into minichunks of size B
• Three-level structures
– Mv=(Ψ1, Ψ2, Ψ3)
– each of size O(|Pv| / logBn))( B
v
10
2D Range Max Queries: Mv
• Basic idea: encode the range max information in a compressed manner, identify the maximum point using Cv once its rank is found
• Ψ3[l]: for each minichunk, stores a (slab index, weight rank) pair for each point inside the minichunk
– Find the rank of the maximum-weight point in O(1) I/Os;
– Identify it in O(logBN) I/Os.
• Ψ2[k]: for each chunk, encode a Cartesian tree on the O(logBN) minichunks for each of the O(B) multislabs
– Find the minichunk containing the maximum-weight point in O(1) I/Os;
– Use Ψ3 to find the exact point in O(logBN) I/Os;
• Ψ1: A fanout B-tree on the O(|Pv| / (BlogBn)) chunks
– Find the maximum-weight point in O(logBN) I/Os.
)( B
11
2D Range Max Queries• Static structures
– O(n) size, O(log2BN) query, O(nlogBN) construction
– O(n) size, O(logB1+εN) query, O(NlogBN) construction
• Dynamization:– Throw away Ψ2 and expandΨ3
– O(nlogBlogBN) size
– O(log3BN) query, worst case
– O(log2BN logM/BlogBN) insert, amortized
– O(log2BN) delete, amortized
• Extending to d-dimension– Standard technique
– Pay an extra O(logd-2BN) factor to all these bounds
12
1D Stabbing Max Queries• Modify the external interval tree [AV96] to support max
• Fan-out base B-tree on x-coordinates
– Interval stored in highest node v where it contains slab boundary
– In one left (right) slab structure and the multislab structure
• Answering a query
– Search down tree and visit O(logBN)nodes
– Compute the maximum weight in left (right)slab structure and the multislab structure
)( B
)( Bv
q
yx
13
1D Stabbing Max Queries• Slab structures are implemented using B-trees
– Query and update: O(logBN) I/Os
• Multislab structure: Fan-out B-tree
– At each internal node, we store the maximum weight for each of the slabs and for each of the children
– Query: O(1) I/Os (only look at the root)
– Update: O(logBN) I/Os
• Rebalancing the base tree: O(logBN) I/Os
– Weight-balanced B-trees
• Overall cost: size O(n), query O(log2BN), update O(logBN).
)( B
)( B )( B
14
1D Stabbing Max Queries• Space-time tradeoff:
– O(nlogBεN) size
– O(nlogB2-εN) query
• Can handle the general semigroup queries
– A semigroup (S, +)
– Each weight w(γ) S
– Want to compute ∑ qγ w(γ)
• Ideas can also be used to improve the internal memory algorithm
– Linear size, O(log2N / log log N) query and update
15
2D Stabbing Max Queries• Extend our 1D stabbing query structure
• Use our 2D range query structure as a building block
• Extending to d-dimension
– Standard technique
– Pay an extra O(logd-2BN) factor to all these bounds
16
Conclusions and Open Problems• In this project, we developed I/O-efficient
– linear space structures with poly-logarithmic query cost for the static 2D range max queries
– near linear space structures with poly-logarithmic query & update cost for the dynamic 2D range max queries
– linear space structures with poly-logarithmic query cost for the dynamic 1D stabbing max queries
– near linear space structures with poly-logarithmic query & update cost for the dynamic 2D stabbing max queries
• Open problems
– Linear size dynamic structures for the 2D range & stabbing max queries?
– General semigroup queries?
THE END
Thank you!