Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey...

25
Computer Science Spatio-Temporal Aggregation Spatio-Temporal Aggregation Using Sketches Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Feifei Li, Dimitris Papadias Department of Computer Science Department of Computer Science City University of Hong Kong, Boston University, City University of Hong Kong, Boston University, Hong Kong University of Science and Technology Hong Kong University of Science and Technology 18, March, 2004 18, March, 2004
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    224
  • download

    4

Transcript of Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey...

Page 1: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Computer Science

Spatio-Temporal Aggregation Using SketchesSpatio-Temporal Aggregation Using Sketches

Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris PapadiasYufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris PapadiasDepartment of Computer ScienceDepartment of Computer Science

City University of Hong Kong, Boston University, City University of Hong Kong, Boston University,

Hong Kong University of Science and TechnologyHong Kong University of Science and Technology

18, March, 200418, March, 2004

Page 2: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

OutlineOutline

• Applications and motivation

• Preliminaries –Aggregate trees and sketch techniques

• Distinct spatio-temporal aggregation

• Performance study

• Extensions

• Conclusion

Page 3: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

• Traffic Supervision Systems

– Monitoring the number of vehicles in a district, the information could be used to identify the traffic jam area etc.

• Mobile Computing Applications

– Allocating bandwidth depending on the usage of each region

Spatio-Temporal Aggregate Query -- Spatio-Temporal Aggregate Query -- ApplicationsApplications

Example: For wireless companies, they would like to know the number of cell phone users in a particular region in a specified period. In addition, it is also interesting to know the total number of phone calls made by all users who qualified the first query.

Page 4: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Spatio-Temporal Aggregate QuerySpatio-Temporal Aggregate Query• Spatio-Temporal Application requires the retrieval of summarized

information about moving objects• Given an aggregate query region as a rectangle qr and query interval

qt, a spatio temporal aggregate query retrieves information about objects that appeared in qr during qt– Spatio-Temporal Count

• Returns the total number of qualifying objects– Spatio-Temporal Sum

• Each object associated with a measure, outputs the sum of the measures of the qualifying objects.

Existing Approach: multi-tree structures based on R-trees and B-trees – Problem: If an object remains in the query region for several

timestamps during the query interval, it will be counted (or summed ) multiple times in the result.

Page 5: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Spatio-Temporal Aggregate Query Spatio-Temporal Aggregate Query (cont.)(cont.)

Motivation: Distinct Spatio-Temporal Aggregate Query

Enable a much richer range of decision-making queries But: There is no way to exactly summarize distinct objects substantially better than by simply enumerating all of them

Solution:

Spatio-Temporal Aggregation Index Trees

Sketch Techniques

Stadi um

90

How to answer “Distinct Aggregate Query” ?e.g: How many cars are present in a

district?

Page 6: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

ExampleExample

Query retrieve the aggregate sum (during time T1-T3) of all rectangles that intersect it.

regions

1 2 3 5

r1

r2

r3

4r

150

75

12

150

80

12

145

85

12 12

90

130135

90

132 127 125 127127

12

4time

R1

r1 r4r3

R2

r2

qr

rq

Page 7: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Preliminaries -- Preliminaries -- Aggregate RB-treeAggregate RB-treeIn the aRB-tree, the extents of all regions (in this case r1,r2,…,r4) are stored in an

R-tree. Each (leaf/non-leaf) entry of the R-tree is associated with a pointer to a B-tree that stores historical aggregate data about the entry

R-tree for the

1 12

B-tree for r3

220 1 144 2 139 3 137 4 139

1 283 3 405

B-tree for R2

901 75 2 80 3 85 4

1 155 3 265

2B-tree for r

1 150 3 145 4 135 5 130

1 445 4 265

B-tree for r1

1 225 2 230 4 225 5

1 685 4 445

B-tree for R1

R1

R2

r1

r2r

3r4

spatial dimensions

1 132 2 127 3 125 4 127

1 259 3 379

4B-tree for r

Page 8: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Preliminaries – Preliminaries – Flajolet-Martin sketchesFlajolet-Martin sketches

• Goal: Small-space representation of a set of items.

• Sketch of a union of items is the OR of their bitmaps.

Prerequisite: Let h be a random, binary hash function.

Sketch of an item

For each unique item with ID x,

For each integer 1 ≤ i ≤ k in turn,

Compute h (x, i).

Stop when h (x, i) = 1, and set bit i.

X 0 0 1 0 0

Z 1 0 0 0 0

X Z 1 0 1 0 0∩

Page 9: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Preliminaries – Preliminaries – Flajolet-Martin sketches (cont.)Flajolet-Martin sketches (cont.)

Estimating COUNT

Take the bitmap of a set of N items.

Let j be the position of the leftmost zero in the bitmap.

j is an estimator of log2 (0.77 N)

Fixable drawbacks:

• Variance in the estimate is large.

1 1 01S 1

Best guess: COUNT ~ 11

j = 3

Page 10: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Preliminaries – Preliminaries – Flajolet-Martin sketches (cont.)Flajolet-Martin sketches (cont.)

Standard variance reduction methods apply.

• Compute m independent bitmaps in parallel.

• Generate m independent estimates of N.

• Take the mean of the estimates.

Provable tradeoffs between m and variance of the estimator.

Page 11: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Distinct Spatio-Temporal AggregationDistinct Spatio-Temporal AggregationExact SolutionIf n is the number of distinct objects and T is the total number of timestamps in history, the exact solution requires (n∙T) space.

Existing Aggregation ApproachaRB tree stores only the summarized data, information about individual objects is lost and the problem cannot be solved.

Our Solution• Combining aRB tree with FM sketch technique! For each region ri and every timestamp t we maintain a sketch si(t) that captures the (ids of) objects in ri at t. • Requires (m∙R∙T∙logn) space. where R is the number of regions and m is an adjustable constant specifying number of bitmaps used by one sketch. (determines the tradeoff between overhead and approximation accuracy)

Page 12: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

System ArchitectureSystem Architecture

object ids

or weights

object ids or weights

object ids

or weightsr 1

r 2

r 3

databaseaggregate queriesapprox. results

sketchproducers

sketches

regions

1 2 3 5

r1

r2

r3

4r

4 time

01000

10100

11000

101001110010000

11111

01100 01100

10000 1100010000 10001

100001000010000

10000 10000 10100 10100

The sketches can be stored in a two dimensional array

Page 13: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Sketch Indexing StructuresSketch Indexing Structures

<time, sketch>

The sketch of a non-leaf entry in B-tree equals to the OR of all the sketches in its sub-trees.

R-tree for the

R1

R2

r1

r2 r3

r4

spatial dimensions

4B-tree for r

10000110002100001 1010043

3110001 10100

1B-tree for r

11100011002100001 1010054

4111001 11101

1B-tree for R

11100111002101001 1010154

4111001 11101

2B-tree for r

11000100003101001 1000154

4111001 11101N 4

N 2

N 1

N 3

3B-tree for r

11111100002010001 5

5110001 11111

2B-tree for R

10100100003110001 4

4110001 11111

111115

R1r1 r4

r3

R2

r2

qr qt=(1,4)

Page 14: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Query ProcessingQuery Processing

• Similar to the query processing technique in aRB tree.

Basic Idea: The spatial and temporal searching conditions are applied alternatively. The result sketch is incrementally updated.

• Can be improved by applying some pruning techniques.

Heuristic 1: Let RS be the current result sketch, and e a non-leaf B-tree entry whose associated sketch is se. Then, the sub-tree of e can be pruned

if (se OR RS) = RS.

Heuristic 2: Given a set of entries that cannot be pruned by Heuristic 1, we visit their child nodes in descending order of the number of 1’s in their sketches.

And more heuristics!

Page 15: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Query Processing Query Processing – Supporting Distinct Sum Query– Supporting Distinct Sum Query

Extending FM sketches

• FM sketches can handle this :

- to insert a value of 500, perform 500 distinct item insertions

• Our observation: We can simulate a large number of insertions into an FM sketch more efficiently.

Page 16: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

PerformancePerformance• Dataset settings

– Number of cities = 10,000

– Number of buses = 100,000

– History length = 1,00 timestamps

– Number of passengers for each bus = [200,300]

– At each timestamp, bus reports to its nearest city, <time t, city c, bus b, passenger # a>

• Each query contains 2 parameters: (spatial extents and interval length)

• A count query retrieves the number of distinct buses that report to cities in qr during qt, while a sum query returns the sum of these buses’ passengers

• Compare the sketch-index to the relational approach: index the 4-tuple table <t,c,b,a> using a B-tree on the time t column

Page 17: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Results Results (Space Consumption)(Space Consumption)

020406080

100120140160

8 16 32number of bitmaps per sketch

size (mega bytes)

databasesize

Size of sketch index could be further reduced by applying simple compression techniques!

Page 18: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Results Results (Sketch Pruning in Query)(Sketch Pruning in Query)

0

100

200

300

400

500

600

700

800

900

0.05 0.1 0.15 0.2 0.25

number of disk accesses

query rectangle length

sketch-pruning naive relational

(a) Cost vs. qrlen (qtlen=10)

Page 19: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Results Results (Sketch Pruning in Query)(Sketch Pruning in Query)

300

0

100

200

400

500

600

1 5 10 15 20

number of disk accesses

query interval length

sketch-pruning naive relational

(b) Cost vs. qtlen (qrlen=0.15)

Page 20: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Results Results (Accuracy of Approximate Results)(Accuracy of Approximate Results)

relative error

0%

5%

10%

15%

20%

25%

30%

35%

0.05 0.1 0.15 0.2 0.25query rectangle length

32-bitmap 16-bitmap 8-bitmap

(a) Error vs. qrlen (qtlen=10, count)

Page 21: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Results Results (Accuracy of Approximate Results)(Accuracy of Approximate Results)

relative error

query rectangle length

0%

5%

10%

15%

20%

25%

0.05 0.1 0.15 0.2 0.25

(b) Error vs. qrlen ( qtlen=10, sum)

32-bitmap 16-bitmap 8-bitmap

Page 22: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Results Results (Costs of Indexes)(Costs of Indexes)

number of disk accesses

query rectangle length

0

50

100

150

200

250

300

350

400

0.05 0.1 0.15 0.2 0.25

32-bitmap 16-bitmap 8-bitmap

(a) Cost vs. qrlen (qtlen=10)

Page 23: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

Results Results (Costs of Indexes)(Costs of Indexes)

number of disk accesses

query interval length

0

50

100

150

200

250

300

350

1 5 10 15 20

(b) Cost vs. qtlen (qrlen=0.15)

32-bitmap 16-bitmap 8-bitmap

Page 24: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

ExtensionsExtensions• Approximating general moving data

Problem: Each object o reports its location <x,y> at each timestamp t, the size of the database grows continuously! (n∙T)

• Solution: Impose a resres regular grid over the data space, the sketch index is applied by treating the grid cells as the finest aggregate granularity. O((res)2∙T∙logn) [or, O(T∙logn) when res is a constant ]

Level 0

Level 1

Level L

B-tree

B-tree

B-tree

B-tree

B-tree

B-tree

Page 25: Computer Science Spatio-Temporal Aggregation Using Sketches Yufei Tao, George Kollios, Jeffrey Considine, Feifei Li, Dimitris Papadias Department of Computer.

ConclusionConclusion

• We propose a sketch index that integrates traditional approximate counting techniques with spatio-temporal indexes for efficient distinct aggregation query processing in spatio-temporal database.

• Sketch index consumes less space and give an order of magnitude faster query process with less aggregate error than a conventional database.

• Extensions and Future work

– Other possible sketches

– More sophisticated algorithms for mining association rules