Iceberg
Transcript of Iceberg
Iceberg query evaluation using bitmap index
1
Iceberg Query Evaluation Using
Bitmap IndexName-Om Pawar
Roll No.-3253
Guide-Prof.A.Phakatkar
Iceberg query evaluation using bitmap index
2
1) Introduction to Iceberg query2) Bitmap index 3) Bitmap index example4) Dynamic Pruning5) Vector Alignment6) Experimental Evaluation7) Conclusion8) References
Index
Iceberg query evaluation using bitmap index
3
What is an iceberg query? Iceberg query is special class of
aggregation query which computes aggregate values above a given threshold.
The general form of an iceberg query on relation R(C1,C2, . . . , Cn) is:
SELECT Ci, Cj, . . . ; Cm, AGG(*) FROM R
GROUP BY Ci; Cj; . . . , Cm
HAVING AGG(*) >= T
Introduction to Iceberg Query
Iceberg query evaluation using bitmap index
4
Example: SELECT Product, State, COUNT(*) FROM Sales
GROUP BY Product, State HAVING COUNT(*) >= 100,000.
In above example aggregation is done on states and products with a COUNT function. Only (state, product) groups whose counts exceed 100K are included in the result set.
Iceberg query evaluation using bitmap index
5
What is a bitmap index? A bitmap index is a special type of structure used by
most high-end database management systems to optimize search and retrieval for low-variability data such as gender (M, F).
Consists of a collection of bitmap vectors each created to represent a distinct value.
Each distinct value of a column is encoded using a number of bits, each of which is stored in a bitmap vector.
Bitmap Index
Iceberg query evaluation using bitmap index
6
Example of Bitmap index
Fig: An example of Bitmap index
Gender {Male,Female}
Bitmap vectors: {Bmale,Bfemale}
Iceberg query evaluation using bitmap index
7
1) Saving computation time by conducting bitwise operations
2) Low cardinality(few distinct values)
3) Include null values
4) Compression is utilized to reduce the storage size and improve performance.
Advantages of bitmap index
Iceberg query evaluation using bitmap index
8
A bitmap for an attribute (column) of a table can be viewed as a (v X r) matrix, where v is the number of distinct values of the column and r is the number of tuples (rows) in the table.
An uncompressed bitmap can be much larger than the original data, thus compression is typically utilized to reduce the storage size and improve performance.
Bitmap index and its compression
Iceberg query evaluation using bitmap index
9
Algorithm for iceberg processing
Iceberg query evaluation using bitmap index
10
The way to process iceberg query on two attributes A and B using bitmap indices is to conduct pair wise bitwise-AND operations between each vector of A and each vector of B.
Bitwise-AND operation, which carries out the following three actions in one bitwise-AND operation between vectors X and Y :
1. Z = X AND Y 2. X =X XOR Z 3. Y = Y XOR Z
Dynamic Pruning
Iceberg query evaluation using bitmap index
11
Empty bitwise-AND result
Performance is slower
Takes more time to solve query
Disadvantages of Dynamic Pruning
Iceberg query evaluation using bitmap index
12
First 1-bit position: It refers to the position of the first 1-bit in a bitmap vector.
Vector alignment: Two bitmap vectors are aligned
if their first 1-bit positions are the same.
Vector Alignment
Iceberg query evaluation using bitmap index
13
iceberg PQ (attribute A, attribute B, threshold T) Output: iceberg results 1: PQA.clear, PQB.clear 2: for each vector a of attribute A do 3: a.count = BIT1 COUNT (a) 4: if a.count >= T then 5: a.next1 =first1BitPosition (a, 0) 6: PQA.push (a) 7: for each vector b of attribute B do 8: b.count = BIT1_ COUNT (b) 9: if b.count >= T then 10: b.next1 = first1BitPosition(b, 0) 11: PQB.push(b) 12: R =0;
Algorithm: Iceberg Processing with Vector Alignment and Dynamic Pruning
Iceberg query evaluation using bitmap index
14
13: a, b = nextAlignedVectors(PQA, PQB; T) 14: while a ≠ null and b ≠ null do 15: PQA.pop 16: PQB.pop 17: r = BITWISE_AND(a, b)
18: if r.count >= T then 19: Add iceberg result (a.value, b.value, r.count) into R 20: a.count = a.count – r.count 21: b.count =b.count – r.count 22: if a.count >= T then 23: a.next1 = first1BitPosition(a, a.next1 + 1) 24: if a.next1 ≠ null then 25: PQA:push(a)
Iceberg query evaluation using bitmap index
15
26: if b.count >= T then 27: b.next1 = first1BitPosition(b, b.next1 + 1) 28: if b.next1 ≠ null then 29: PQB:push(b) 30: a, b = nextAlignedVectors(PQA, PQB, T) 31: return R
Iceberg query evaluation using bitmap index
16
A1 A2 A3
0 1 0
1 0 0
0 1 0
0 1 0
1 0 0
0 1 0
0 1 0
0 1 0
1 0 0
0 1 0
0 0 1
0 0 1
B1 B2 B3
0 1 0
0 0 1
1 0 0
0 1 0
0 0 1
1 0 0
0 1 0
1 0 0
0 0 1
0 1 0
1 0 0
1 0 0
A B C
A2 B2 1.23
A1 B3 2.34
A2 B1 5.56
A2 B2 8.36
A1 B3 3.27
A2 B1 9.45
A2 B2 6.23
A2 B1 1.98
A1 B3 8.23
A2 B2 0.11
A3 B1 3.44
A3 B1 2.08
Fig.Table R Fig. Bitmap Indices for A,B
Iceberg query evaluation using bitmap index
17
SELECT A,B,COUNT(*) FROM RGROUP BY A,BHAVING COUNT(*)>2
Initial Bitmap VectorsPriority Queue 1 Priority Queue 2A2 1011 0111 0100 B2 1001 0010 0100A1 0100 1000 1000 B3 0100 1000 1000A3 0000 0000 0011 B1 0010 0101 0011 Number of 1s in A3 is not larger than 2
Example of Vector Alignment
Iceberg query evaluation using bitmap index
18
Bitmap vectors after first alignmentPriority Queue 1 Priority Queue 2A1 0100 1000 1000 B3 0100 1000 1000A2 0010 0101 0000 B1 0010 0101 0011
B2 is removedBitmap vectors after second alignment Priority Queue 1 Priority Queue 2 A2 0010 0101 0000 B1 0010 0101 0011
Iceberg query evaluation using bitmap index
19
Experimental evaluation is based on:
1) Data size(No of tuples)
2) Time
3) No of distinct values
4) Bitwise AND operations
Experimental evaluation
Iceberg query evaluation using bitmap index
20
Performance of Dynamic Pruning and Vector Alignment
Tim
e(s
)
Number of tuples(millions)
Fig 5: Performance of icebergDP and icebergPQ
Iceberg query evaluation using bitmap index
21
IcebergPQ IcebergDP
Performance of icebergPQ is faster than icebergDP.
Performance of icebergDP is slower than icebergPQ
IcebergPQ only needs 0.404 seconds to finish processing for 1 million tuples.
IcebergDP only needs 10.688 seconds to finish processing for 1 million tuples.
IcebergPQ also scales well when the data size increases.
The performance of icebergDP is unacceptable for practical data sizes.
Iceberg query evaluation using bitmap index
22
1. Data Warehousing
2. Information Retrieval
3. Market Analysis
4. Data Mining
Applications Of Iceberg Query
Iceberg query evaluation using bitmap index
23
To investigate the processing of iceberg queries without the anti monotone property.
Optimal order of attributes to be processed (in case, we have three or more aggregation attributes) to gain better efficiency.
When the data are of enormous size such that the bitmap of a single column does not fit in main memory.
Future Scope
Iceberg query evaluation using bitmap index
24
1.“Iceberg query evaluation using bitmap index”.Bin He, Hui-I Hsiao, Member, IEEE, Ziyang Liu, Yu Huang, and Yi Chen, Member, IEEE,2012.
2.F. Delie`ge and T.B. Pedersen, “Position List Word Aligned Hybrid: Optimizing Space and Performance for Compressed Bitmaps,” Proc. Int’l Conf. Extending Database Technology (EDBT), pp. 228-239, 2010.
3.A. Ferro, R. Giugno, P.L. Puglisi, and A. Pulvirenti, “BitCube: A Bottom-Up Cubing Engineering,” Proc. Int’l Conf. Data Warehousing and Knowledge Discovery (DaWaK), pp. 189-203, 2009.
4.M. Fang, N. Shivakumar, H. Garcia-Molina, R. Motwani, and J.D.Ullman, “Computing Iceberg Queries Efficiently,” Proc. Int’l Conf.Very Large Data Bases (VLDB), pp. 299-310, 1998K. Wu, E.J. Otoo, and A. Shoshani, “Optimizing Bitmap Indices with Efficient Compression,” ACM Trans. Database Systems, vol. 31, no. 1, pp. 1-38, 2006.
Refrences used
Iceberg query evaluation using bitmap index
25
THANK YOU!!!