I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

17
I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries Second Year Project Presentation Ke Yi Advisor: Lars Arge Committee: Pankaj K. Agarwal and Jun Yang

description

I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries. Second Year Project Presentation Ke Yi Advisor: Lars Arge Committee: Pankaj K. Agarwal and Jun Yang. Problem Definition: Range Max Queries. Range-aggregate queries : range-count, range-sum, range-max - PowerPoint PPT Presentation

Transcript of I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

Page 1: I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

I/O-Efficient Structures for OrthogonalRange Max and Stabbing Max Queries

Second Year Project Presentation

Ke Yi

Advisor: Lars Arge

Committee: Pankaj K. Agarwal and Jun Yang

Page 2: I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

2

Problem Definition: Range Max Queries

• Range-aggregate queries: range-count, range-sum, range-max

• N points in Rd

• Each point p is associated with a weight w(p)

• Query rectangle Q

• Compute max{w(p) | pQ}

• Static and dynamic

Page 3: I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

3

Problem Definition: Stabbing Max Queries

q

yx• N hyper-rectangles in Rd

• Each rectangle γ is associated with a weight w(γ)

• Query point q

• Compute max{w(γ) | qγ}

Page 4: I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

4

Model• I/O Model

– N : Elements in structure

– B : Elements per block

– M : Elements in main memory

– n = N/B

• Assumptions

– M>B2

– Each word holds log2N bits

– Any coordinate or weight can be stored in one word

D

P

M

Block I/O

Page 5: I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

5

Related Work & Our Results: Range Queries• 1D range queries are easy: B-tree

* O(n) space, O(logBn) query & update

• 2D range queries:

– Poly-logarithmic query: CRB-tree [AAG03]

* O(nlogBn) space, O(log2Bn) query

– Linear space: kdB-tree, cross-tree, O-tree

* query, O(logBn) update

• Our results:

)( nO

Page 6: I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

6

Related Work & Our Results: Stabbing Queries• 1D stabbing queries

– SB-tree [YW01]

* O(n) space, O(logBn) query & insert

* Does not allow deletions!

• 2D stabbing queries

– No structures with worst-case guarantee

• Our results:

q

yx

Page 7: I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

7

2D Range Max Queries• The external version of Chazelle’s structure [C88]

– Linear space,

– Static: O(log1+εN) query

– Dynamic: O(log3N log log N) query & update

• Overall structure

– A normal B-tree Φ on y-coordinates of all the points

– A Fan-out base B-tree T on x-coordinates

* Pv: all points stored in the subtree of v

* Each internal node v stores two secondary structures Cv, Mv storing information about Pv in a compressed manner

* Cv and Mv of size O(|Pv| / logBn) → linear size in total

* Weights of points stored at leaves explicitly

)( B

Page 8: I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

8

2D Range Max Queries

• Cv borrowed from CRB-tree

– Compute the ranks of the points one level down in O(1) I/Os

– Identify the weight of a point explicitly in O(logBn) I/Os

• Mv computes the maximum weight in a multislab inO(logBn) I/Os

• Answering a query:

– Use Φ to compute the ranksin the root of T

– Use Mv to compute maximumat each level

– For a total of O(log2Bn) I/Os

)( B

v

v1 v2 v3 v4 v5 v6

Page 9: I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

9

2D Range Max Queries: Mv

• Divide Pv into chunks of BlogBN

• Divide each chunk into minichunks of size B

• Three-level structures

– Mv=(Ψ1, Ψ2, Ψ3)

– each of size O(|Pv| / logBn))( B

v

Page 10: I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

10

2D Range Max Queries: Mv

• Basic idea: encode the range max information in a compressed manner, identify the maximum point using Cv once its rank is found

• Ψ3[l]: for each minichunk, stores a (slab index, weight rank) pair for each point inside the minichunk

– Find the rank of the maximum-weight point in O(1) I/Os;

– Identify it in O(logBN) I/Os.

• Ψ2[k]: for each chunk, encode a Cartesian tree on the O(logBN) minichunks for each of the O(B) multislabs

– Find the minichunk containing the maximum-weight point in O(1) I/Os;

– Use Ψ3 to find the exact point in O(logBN) I/Os;

• Ψ1: A fanout B-tree on the O(|Pv| / (BlogBn)) chunks

– Find the maximum-weight point in O(logBN) I/Os.

)( B

Page 11: I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

11

2D Range Max Queries• Static structures

– O(n) size, O(log2BN) query, O(nlogBN) construction

– O(n) size, O(logB1+εN) query, O(NlogBN) construction

• Dynamization:– Throw away Ψ2 and expandΨ3

– O(nlogBlogBN) size

– O(log3BN) query, worst case

– O(log2BN logM/BlogBN) insert, amortized

– O(log2BN) delete, amortized

• Extending to d-dimension– Standard technique

– Pay an extra O(logd-2BN) factor to all these bounds

Page 12: I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

12

1D Stabbing Max Queries• Modify the external interval tree [AV96] to support max

• Fan-out base B-tree on x-coordinates

– Interval stored in highest node v where it contains slab boundary

– In one left (right) slab structure and the multislab structure

• Answering a query

– Search down tree and visit O(logBN)nodes

– Compute the maximum weight in left (right)slab structure and the multislab structure

)( B

)( Bv

q

yx

Page 13: I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

13

1D Stabbing Max Queries• Slab structures are implemented using B-trees

– Query and update: O(logBN) I/Os

• Multislab structure: Fan-out B-tree

– At each internal node, we store the maximum weight for each of the slabs and for each of the children

– Query: O(1) I/Os (only look at the root)

– Update: O(logBN) I/Os

• Rebalancing the base tree: O(logBN) I/Os

– Weight-balanced B-trees

• Overall cost: size O(n), query O(log2BN), update O(logBN).

)( B

)( B )( B

Page 14: I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

14

1D Stabbing Max Queries• Space-time tradeoff:

– O(nlogBεN) size

– O(nlogB2-εN) query

• Can handle the general semigroup queries

– A semigroup (S, +)

– Each weight w(γ) S

– Want to compute ∑ qγ w(γ)

• Ideas can also be used to improve the internal memory algorithm

– Linear size, O(log2N / log log N) query and update

Page 15: I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

15

2D Stabbing Max Queries• Extend our 1D stabbing query structure

• Use our 2D range query structure as a building block

• Extending to d-dimension

– Standard technique

– Pay an extra O(logd-2BN) factor to all these bounds

Page 16: I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

16

Conclusions and Open Problems• In this project, we developed I/O-efficient

– linear space structures with poly-logarithmic query cost for the static 2D range max queries

– near linear space structures with poly-logarithmic query & update cost for the dynamic 2D range max queries

– linear space structures with poly-logarithmic query cost for the dynamic 1D stabbing max queries

– near linear space structures with poly-logarithmic query & update cost for the dynamic 2D stabbing max queries

• Open problems

– Linear size dynamic structures for the 2D range & stabbing max queries?

– General semigroup queries?

Page 17: I/O-Efficient Structures for Orthogonal Range Max and Stabbing Max Queries

THE END

Thank you!