Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing):...

36
1 USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani 4/9/2008 OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner

Transcript of Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing):...

Page 1: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

1USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

OLAP (Online Analytical Processing):Dynamic Data Cubes

Excerpt from Presentation by S. Geffner

Page 2: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

2USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

Content

� Introduction to Multidimensional Databases (from A.R. 20 and 21)

� Focus Application: OLAP� Prefix-Sum Data Cube (from A.R. 16)� Dynamic Data Cube (from A.R. 17)� Iterative Data Cube (from A.R. 18)� Wavelet-based approaches

• Compact Data Cube (from A.R. 19)• ProPolyne (from A.R. 22 and 23)

Page 3: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

3USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

The Dynamic Data Cube

S. Geffner, D. Agrawal, and A. EL Abbadi

Dynamic Data Cube

Page 4: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

4USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

� Problem Description� Dynamic Data Cube� Improving Update� Conclusion

Outline

Page 5: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

5USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

Problem Description: with original array A

191322427

3

2

1

8

5

2

3

7

39172546

58163325

74331244

25351233

43332422

17862371

64221530

6543210Index

Size= N2

Page 6: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

6USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

� Complexity:� Arbitrary range queries : O(nd)� Update: O(1)

where:d: # of dimensionn: size in each dimension

Problem Description: with original array A

Page 7: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

7USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

2302051681271036955277

206

172

142

117

88

62

26

7

182154115936149256

15112695755040215

12310380611535194

998667513529153

786753402924122

575093292118101

231713119830

6543210Index

191322427

3

2

1

8

5

2

3

7

39172546

58163325

74331244

25351233

43332422

17862371

64221530

6543210Index

Precomputearray P

Original array A

Problem Description: with prefix-sum array P

Page 8: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

8USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

Query with Prefix-sum?

2302051681271036955277

206

172

142

117

88

62

26

7

182154115936149256

15112695755040215

12310380611535194

998667513529153

786753402924122

575093292118101

231713119830

6543210Index

Size= N2

Page 9: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

9USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

Update with Prefix-sum?

2302051681271036955277

206

172

142

117

88

62

26

7

182154115936149256

15112695755040215

12310380611535194

998667513529153

7867534029*24122

575093292118101

231713119830

6543210Index

Size= N2

Page 10: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

10USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

� Complexity:� Arbitrary range queries : O(1)� Update in worst case: O(nd)

where:d: # of dimensionn: size in each dimension

Problem Description: with prefix-sum array P

Page 11: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

11USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

� Same as for Prefix-sum, DDC can also be applied to obtain count (special case of SUM) and average (SUM/COUNT).

� Generalization: applicable to any binary operator “+”which has an inverse operator “–” such that a+b-b=a

� We focus on SUM

Solution: Dynamic Data Cube

Page 12: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

12USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

� Overlay Box

� Step1: A set of disjoint rectangles of equal size that completely partition cells of array A

� Step2: Each box stores exactly (kd-(k-1)d) values, where

k: The length of the overlay box in each dimensiond: # of dimension

Basic Data Structure: Overlay Box

Page 13: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

13USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

191322427

3

2

1

8

5

2

3

7

39172546

58163325

74331244

25351233

43332422

17862371

64221530

6543210Index

Overlay Box: Step 1Original array A

Page 14: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

14USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

n=8, k=n/2=4, # of values in the overlay box=(kd-(k-1)d)

6154308523426127

47

31

15

66

48

33

15

7

426

245

104

483516513529153

402

291

110

6543210Index

k

kk

k

Overlay Box: Step 2

Page 15: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

15USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

Dynamic Data Cube

� A tree structure which recursively partitions original array A into overlay boxes

� Each overlay box will contain information regarding relative sums of the corresponding regions of A

� Organize overlay boxes into a tree to recursively partition array A into non-overlapping regions

� Each node forms children by dividing its range in each dimension in half

Page 16: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

16USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

How to Construct DDC?

7

7

6

5

4

3

2

1

0

6543210Index

Page 17: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

17USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

7

7

6

5

4

3

2

1

0

6543210Index

How to Construct DDC? Level 2

n=8, k=n/2, # of values in the overlay box=(kd-(k-1)d)

Page 18: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

18USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

6154308523426127

47

31

15

66

48

33

15

7

426

245

104

483516513529153

402

291

110

6543210Index

n=8, k=n/2, # of values in the overlay box=(kd-(k-1)d)

How to Construct DDC? Level 2

Page 19: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

19USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

513529153

402

291

110

3210Index

k=n/4=2113115

56

1131810

38k=n/2=4

# of values in the overlay box=(kd-(k-1)d) =3

How to Construct DDC? Level 1

Page 20: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

20USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

1131153

562

11318101

380

3210Index

k=n/8=1

k=n/4=2

37

53

37

53

51

32

62

21

How to Construct DDC? Level 0

Page 21: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

21USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

The DDC Hierarchy

Level 2

(Root node)

113115

56

1131810

38

1612144

610

1512164

87

134156

99

134116

46

196146

96

1272110

96

Level 1

6154308523426127

47

31

15

66

48

33

15

7

426

245

104

483516513529153

402

291

110

6543210

37

53

42

54

32

24

23

42

21

36

78

42

62

21

51

32

19

33

22

72

63

31

81

43

25

17

82

54

53

33

13

91

Level 0

(Leaf node)

Page 22: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

22USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

6154308523426127

47

31

15

66

48

33

15

7

426

*245

104

483516513529153

402

291

110

6543210Index

How to Query DDC?

Page 23: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

23USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

6154308523426127

47

31

15

66

48

33

15

7

426

*245

104

483516513529153

402

291

110

6543210Index

How to Query DDC? (cont’d)

Page 24: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

24USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

6154308523426127

47

31

15

66

48

33

15

7

426

*245

104

483516513529153

402

291

110

6543210

Level 2

(Root node)

113115

56

1131810

38

1612144

610

15*12164

87

134156

99

134116

46

196146

96

1272110

96

Level 1

51+48+24+16+12=151

How to Query DDC? (cont’d)

Page 25: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

25USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

Level 2

(Root node)

113115

56

1131810

38

1612144

610

1512164

87

134156

99

134116

46

196146

96

1272110

96

Level 1

6154308523426127

47

31

15

66

48

33

15

7

426

245

104

483516513529153

402

291

110

6543210

37

53

42

54

32

24

23

42

21

36

78

42

62

21

51

32

19

33

22

72

63

31

81

43

25

17

82

54

53

33

13

91

Level 0

(Leaf node)

6

How to Update DDC?

Page 26: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

26USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

Level 2

(Root node)

113115

56

1131810

38

1612144

610

1512*164

87

134156

99

134116

46

196146

96

1272110

96

Level 1

6154308523426127

47

31

15

66

48

33

15

7

426

*245

104

483516513529153

402

291

110

6543210

37

53

42

54

32

24

23

42

21

36

78

42

62

21

51

32

19

33

22

72

63

31

81

43

2*5

17

82

54

53

33

13

91

Level 0

(Leaf node)

6

How to Update DDC? (cont’d)

Page 27: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

27USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

Level 2

(Root node)

113115

56

1131810

38

1612144

610

1613164

87

134156

99

134116

46

196146

96

1272110

96

Level 1

6255308523426127

48

32

15

66

48

33

15

7

426

245

104

483516513529153

402

291

110

6543210

37

53

42

54

32

24

23

42

21

36

78

42

62

21

51

32

19

33

22

72

63

31

81

43

26

17

82

54

53

33

13

91

Level 0

(Leaf node)

How to Update DDC? (cont’d)

Page 28: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

28USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

� Query cost for arbitrary range queries : O(log n)

� Update cost in worst case should be evaluated considering:1. O(log n) hierarchy traversal cost

• Whichever cell you update, only one overlay box is updated at each tree level

2. The cost of updating the values in the relevant overlay box at each level

The 2nd cost might be high depending on the placement of the updated cell. Hence, the total cost may become as bad as O(n)

Query and Update Complexity with DDC

Page 29: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

29USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

� The high update cost of the overlay boxes is a consequence of dependencies between successive row sum values

Improving Update: Problem

X8X7X6X5X4X3X2X1

Page 30: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

30USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

Storing row sum values in an array

Store row sum values separately in

Cumulative B Tree (BC Tree)

Improving Update: Solution

Page 31: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

31USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

� Each leaf of the BC tree corresponds to one row-sum cell

� Interior nodes of the BC tree maintain subtree sums (STS)

� For each node entry, the STS stores the sum of the subtree from the left branch associated with leaf value

How to Construct BC Tree?

Page 32: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

32USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

665345332314

Key 4STS:33

Key 3STS:9

Key 2STS:14

Key 6STS:8

Key 5STS:12

Leaf 1Value 14

Leaf 6Value 13

Leaf 5Value 8

Leaf 4Value 12

Leaf 3Value 10

Leaf 2Value 9

Overlay box

How to Construct BC Tree?

23-14 33-23

Page 33: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

33USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

Query with BC Tree?

665345332314

Key 3STS:9

Key 2STS:14

Leaf 1Value 14

Leaf 6Value 13

Leaf 4Value 12

Leaf 3Value 10

Leaf 2Value 9

� Traverse the tree using the cell’s index as the key� If descending to a node’s right branch, we sum each

preceding STS in the node with the key less than or equal to the query key

654321

Key 4STS:33

5

5

Key 6STS:8

Key 5STS:12

Leaf 5Value 8

Key 6 is not added because it does not precede key 5

Page 34: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

34USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

Update with BC Tree?

665345332314

Key 3STS:9

Key 2STS:14

Leaf 1Value 14

Leaf 6Value 13

Leaf 4Value 12

Leaf 3Value 10

Leaf 2Value 9

� Reflect the change using a bottom-up method

654321

Leaf 5Value 8

Key 6STS:8

Key 5STS:12

Key 4STS:33 38

15

38 50 58 71

Key 3 is not updatedbecause leaf 3 is not in its left subtree

Page 35: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

35USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

� A two dimensional overlay box has two groups of row sum values, each of which is one dimensional

� In general, an overlay box of d dimensions has dgroups of row sum values and each group is (d-1) dimensional

� Hence, both query and update complexity is O(logdn)

Query and Update Complexity with Improved DDC

Page 36: Excerpt from Presentation by S. Geffner Dynamic Data Cubes ...OLAP (Online Analytical Processing): Dynamic Data Cubes Excerpt from Presentation by S. Geffner. 2 USC - CSCI585 - Spring

36USC - CSCI585 - Spring 2008 - Farnoush Banaei-Kashani4/9/2008

Conclusion

O(logdn)O(logdn)DDC

O(nd)O(1)Prefix sum

O(1)O(nd)Naïve approach

UpdateQueryMethod