Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases

23
Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases Jost Enderle , Nicole Schneider, Thomas Seidl RWTH Aachen University, Germany VLDB 2005, Trondheim Data Management and Exploration Prof. Dr. Thomas Seidl

description

Data Management and Exploration Prof. Dr. Thomas Seidl. Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases. Jost Enderle , Nicole Schneider, Thomas Seidl RWTH Aachen University, Germany VLDB 2005, Trondheim. Outline. - PowerPoint PPT Presentation

Transcript of Efficiently Processing Queries on Interval-and-Value Tuples in Relational Databases

Efficiently Processing Querieson Interval-and-Value Tuples

in Relational Databases

Jost Enderle, Nicole Schneider, Thomas Seidl

RWTH Aachen University, Germany

VLDB 2005, Trondheim

Data Management and ExplorationProf. Dr. Thomas Seidl

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 2

Data Management and ExplorationProf. Dr. Thomas Seidl

Outline

• Interval-and-Value (IaV) Data and Applications

• Relational Interval Tree (RI-tree)

• Managing Interval-and-Value Tuples Using RI-tree

• Experimental Results

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 3

Data Management and ExplorationProf. Dr. Thomas Seidl

Contracts table: storing period and budget of contracts

CREATE TABLE contracts (// key:c_no VARCHAR(10),// simple-valued attribute:c_budget DECIMAL(10,2),// interval:c_period ROW (

c_start DATE,c_end DATE))

Interval-and-Value Data: Example

No. Budget (k€)

Period

Start End

C1 250 2005-03-01 2005-31-07

C2 5300 2002-02-17 2003-05-06

C3 10700 1999-05-27 2001-12-17

C4 1600 2001-02-28 2002-11-02

C5 870 2002-06-25 2002-08-12

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 4

Data Management and ExplorationProf. Dr. Thomas Seidl

Interval-and-Value Data: Query

• Sample query on contracts table// Find all contractsSELECT c_no FROM contracts// within certain budget rangeWHERE c_budget BETWEEN 500 AND 2000

// running during certain time interval

AND c_period OVERLAPS(DATE ‘2003-03-01’, DATE ‘2004-01-31’)

• Special Cases of this general Range-Interval query:– Value-Interval Query // value range is a single point– Range-Stabbing Query // query interval is a single point– Value-Stabbing Query // both restrictions hold

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 5

Data Management and ExplorationProf. Dr. Thomas Seidl

Motivation of Relational Indexing

• Main Memory Structures– no persistency, no disk block structure

• Secondary Storage Structures+ persistency, high block-oriented efficiency

– integration into DBMS kernel typically not supported (GiST?)

• Relational Storage Structures+ basic idea: don‘t extend, just use RDBMS (virtual storage machine)

+ sound formal fundament, little implementation effort

+ immediate industrial strength (availability, robustness, ACID, …)

+ high efficiency by exploiting built-in indexing structures (B+-tree)

Disk

No DB

SQL

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 6

Data Management and ExplorationProf. Dr. Thomas Seidl

52 13234

15452

12, 15, C112, 15, C1

12, 10, C112, 10, C1

8, 13, C28, 13, C2 12, 15, C112, 15, C1

8, 5, C28, 5, C2 12, 10, C112, 10, C1

4, 7, C34, 7, C3 8, 13, C28, 13, C2 12, 15, C112, 15, C1

4, 1, C34, 1, C3 8, 5, C28, 5, C2 12, 10, C112, 10, C1

4, 7, C34, 7, C3 8, 13, C28, 13, C2 8, 15, C48, 15, C4 12, 15, C112, 15, C1

4, 1, C34, 1, C3 8, 3, C48, 3, C4 8, 5, C28, 5, C2 12, 10, C112, 10, C1

• Two relational indexes (B+-trees) store the interval bounds

lowerIndex (node,start,id):

upperIndex (node,end,id):

• Supported by any RDBMS: No modification of built-in B+-trees

• Optimal complexities for space, updates, and intersection queries

Relational Interval Tree

C4

7313 101 151

C3C2C1

15

8

1 3 5 7 13119

2 6 10 14

4 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

root = 2h-1

[Kriegel, Pötke, Seidl: VLDB 2000]based on [Edelsbrunner 1980]

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 7

Data Management and ExplorationProf. Dr. Thomas Seidl

Single Interval Query Processing

Two steps to process an interval query

1. Transform interval query into a set of range queries– The generated queries are collected in transient tables (no I/Os)

2. Perform a single SQL query– Join the transient query tables with the relational indexes

start endstart end

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 8

Data Management and ExplorationProf. Dr. Thomas Seidl

Preprocessing: Generate Query Ranges

• Generate a set of range queries for lowerIndex and upperIndex– At nodes left of start: report entries i with i.end start (32,48,52)(32,48,52)

– At nodes right of end: report entries i with i.start end (56)(56)

– For nodes between start and end: report all entries (54 - 55)(54 - 55)

start endstart end

upperIndex 32 48 52

lowerIndex 5654 to 55

1513

14

1 3

2

5 7

6

4

8

119

10

12

17 19

18

21 23

22

20

24

3129

30

2725

26

28

16

4745

46

33 35

34

37 39

38

36

40

4341

42

44

49 51

50

53 55

54

52

56

6361

62

5957

58

60

48

32

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 9

Data Management and ExplorationProf. Dr. Thomas Seidl

Processing by a Single SQL Query

• Join transient query tables with B+-tree indexesSELECT id

FROM upperIndex AS i JOIN :leftQueries USING (node)WHERE i.end >= :start

UNION ALL

SELECT idFROM lowerIndex AS i JOIN :rightQueries USING (node)WHERE i.start <= :end

UNION ALL

SELECT idFROM lowerIndex // or upperIndexWHERE node BETWEEN :start AND :end

• No duplicates are produced → UNION ALL

• Blocked output of index range scans is guaranteed

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 10

Data Management and ExplorationProf. Dr. Thomas Seidl

Extending the RI-tree for IaV Support (1)

• Add value predicate to RI-tree querySELECT id // lower subquery

FROM upperIndex AS i JOIN :leftQueries USING (node)WHERE i.end >= :startAND i.value BETWEEN :Value1 and :Value2

UNION ALL ... // upper subquery

UNION ALL

SELECT id // inner subqueryFROM lowerIndex // or upperIndexWHERE node BETWEEN :start AND :endAND value BETWEEN :Value1 and :Value2

• Integrate simple value attribute into lower-/upperIndex– old schema: (node, bound, id)

– new schema: ? → depends on type of query to support

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 11

Data Management and ExplorationProf. Dr. Thomas Seidl

Extending the RI-tree for IaV Support (2)

• Viable schemas for new lower-/upperIndexes– (value, node, bound, id)

– (node, value, bound, id) estimate access cost for each query type

– (node, bound, value, id)

• Observations (see paper for details):– Value queries best supported by (value, node, bound, id) index

• simple attribute predicates = point queries• evaluation requires same number of disk accesses as original proceeding

– Range Queries: choice of index not obvious• inner subquery of Range-Stabbing Queries best supported by

(node, value, bound, id)• otherwise: depends on stored data and values of query variables

• Question: Can Range Queries be further enhanced?

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 12

Data Management and ExplorationProf. Dr. Thomas Seidl

Improving Range Query Processing (1)

• Problem of composite indexes for multiple attributes– queries may contain range predicates on two or more of the indexed

attributes

– tuples satisfying first predicate lie in contiguous disk area

– tuples satisfying both/all predicates are scattered within this area

• Common solution: using space-filling curves– mapping multi-dimensional data to one-dimensional values

– similar values of original data are mapped on similar index data

– ranges of indexed attributes will be found in adjacent disk areas

• Application on RI-tree scenario– combining some attributes of lower-/upperIndex

– depends on type of query to support

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 13

Data Management and ExplorationProf. Dr. Thomas Seidl

Improving Range Query Processing (2)

Identifying viable schemas for new lower-/upperIndexes

– find subqueries containing several range predicates• for Range Queries: lower and upper subqueries (bound, value)• for Range-Interval Queries:

inner subquery (node, value)

– combine respective attributes (x,y)within space-filling curve {x,y}

– useful combinations forlower-/upperIndex:

• (node, {value, bound}) • ({node, value}, bound)

node

valu

e

lower

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 14

Data Management and ExplorationProf. Dr. Thomas Seidl

Improving Range Query Processing (3)

• Observations:– lower and upper subqueries of Range Queries will profit by a

(node, {value, bound}) index

– inner subquery of Range-Interval Queries will profit by a({node, value}, bound) index

– Value Queries will not profit by “space-filling indexes”

• Intermediate result– space-filling indexes can reduce disk accesses in certain cases

– there is no “universal” index supporting all queries to the same extent

– different subqueries will profit by different indexes

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 15

Data Management and ExplorationProf. Dr. Thomas Seidl

Identifying best indexes for each query type– Value Queries: best supported by (value, node, bound, id) index– Range Queries: depends on data and space-filling curve (if used)

• different subqueries best supported by different indexes

• subqueries may be evaluated separately using best index

• drawback: higher cost for index updates and storage requirements

Employing index mixes

Queries Lower/Upper Subquery Inner Subquery

Value-Stabbing (value, node, bound) (value, node, bound)

Value-Interval (value, node ,bound) (value, node, bound)

Range-Stabbing (node, {value, bound}) (node, value, bound)

Range-Interval (node, {value, bound}) ({node, value})

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 16

Data Management and ExplorationProf. Dr. Thomas Seidl

Adapting the RI-tree Algorithms (1)

Example: Evaluate a contracts query using „space-filling index“

Contracts table:– Node and Z-order value calculated for each tuple

– B-tree index on (node, Z(budget, start), no)

No.

Budget(k€)

Period

Node Z(budget, start)Start End

C1 2 1 5 4 4

C2 5 2 9 8 50

C3 10 8 17 16 221

C4 6 14 19 16 149

C5 8 21 26 24 186

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 17

Data Management and ExplorationProf. Dr. Thomas Seidl

Range-Interval Query: value range (1,12); interval (3,6)

Adapting the RI-tree Algorithms (2)

start

bu

dge

t

end

Ra

ng

e(1

, 12

)

start <= end

Evaluation ofupper subquerywith Z-order index

Evaluation ofupper subquerywith Z-order index

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 18

Data Management and ExplorationProf. Dr. Thomas Seidl

Access Cost with Varying Table Sizes

0

2.000

4.000

6.000

8.000

10.000

12.000

14.000

1,0E+05 1,0E+06 1,0E+07

table size [number of tuples]

acce

ss c

ost [

num

ber

of I/

Os] RI(VNB)

RI(NVB)

RI(NBV)

RI({NV})h

RI(N{VB})h

0

5.000

10.000

15.000

20.000

25.000

30.000

35.000

1,0E+05 1,0E+06 1,0E+07

table size [number of tuples]ac

cess

cos

t [nu

mbe

r of

I/O

s] RI(VNB)

RI(NVB)

RI(NBV)

RI({NV})h

RI(N{VB})h

Value-Stabbing QueriesValue-Stabbing Queries Value-Interval QueriesValue-Interval Queries

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 19

Data Management and ExplorationProf. Dr. Thomas Seidl

Access Cost with Varying Table Sizes

0

5.000

10.000

15.000

20.000

25.000

30.000

1,0E+05 1,0E+06 1,0E+07

table size [number of tuples]

acce

ss c

ost [

num

ber

of I/

Os] RI(VNB)

RI(NVB)RI(NBV)RI({NV})hRI(N{VB})hRI(NVB) + RI(N{VB})h

0

10.000

20.000

30.000

40.000

50.000

60.000

70.000

1,0E+05 1,0E+06 1,0E+07

table size [number of tuples]

acce

ss c

ost [

num

ber

of I/

Os] RI(VNB)

RI(NVB)RI(NBV)RI({NV})hRI(N{VB})hRI({NV})h + RI(N{VB})h

Range-Stabbing QueriesRange-Stabbing Queries Range-Interval QueriesRange-Interval Queries

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 20

Data Management and ExplorationProf. Dr. Thomas Seidl

Access cost for varying length of ranges

0

1.000

2.000

3.000

4.000

5.000

0 10 20 30 40 50

length of query range [% of attr. domain]

acce

ss c

ost [

num

ber

of I/

Os]

(VNB) (NVB)

({NV}B) (NVB) + (N{VB})

(VNB) + (N{VB}) (VNB) + (NVB) + (N{VB})

0

1.000

2.000

3.000

4.000

5.000

6.000

0 10 20 30 40 50

length of query range [% of attr. domain]

acce

ss c

ost [

num

ber

of I/

Os]

(VNB) ({NV}B)

({NV}B)+(N{VB}) (VNB)+({NV}B)

(VNB)+({NV}B)+(N{VB})

Stabbing QueriesStabbing Queries Interval QueriesInterval Queries

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 21

Data Management and ExplorationProf. Dr. Thomas Seidl

Access cost for varying length of ranges

0

1.000

2.000

3.000

4.000

5.000

6.000

0 10 20 30 40 50

length of query interval [% of int. domain]

acce

ss c

ost [

num

ber

of I/

Os]

({NV}B) (N{VB})

(NVB)+(N{VB}) ({NV}B)+(N{VB})

(VNB)+({NV}B)+(N{VB})

Range QueriesRange Queries

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 22

Data Management and ExplorationProf. Dr. Thomas Seidl

595

703

664

1544

1002

1558

1148

901

1995

2151

1572

1740

2680

1639

1773

4527

3842

2268

2457

2949

0 1000 2000 3000 4000 5000 6000

(VNB)+(NVB)+({NV}B)+(N{VB})

(VNB)+({NV}B)+(N{VB})

(VNB)+(NVB)+(N{VB})

(VNB)+(N{VB})

(VNB)+({NV}B)

(VNB)+(NVB)

({NV}B)+(N{VB})

(NVB)+(N{VB})

(N{VB}) Hilbert

(N{VB}) z-curve

({NV}B) Hilbert

({NV}B) z-curve

(NBV)

(NVB)

(VNB)

Spatial RI-tree

R-tree

RI-tree → B-tree

B-tree → RI-tree

B-tree ∩ RI-tree

(VLU)

(LUV)

access cost [number of I/Os]

Comparison with competing techniques

Enderle, Schneider, Seidl Queries on Interval-and-Value Tuples in RDBs VLDB 05 - 23

Data Management and ExplorationProf. Dr. Thomas Seidl

Conclusions

• Processing Interval-and-Value Tuples in SQL databases

• Extensions of the Relational Interval Tree

• Various types of queries– Range vs. Value Queries

– Interval vs. Stabbing Queries

• Experiments demonstrate high performance

• Future work:– Extend proposed techniques to more complex queries (joins)

– Cost models to predict benefits for evolving query workload