Data Compression for Multi-dimentional Data Warehouses
-
Upload
mushfiqur-rahman -
Category
Documents
-
view
97 -
download
0
Transcript of Data Compression for Multi-dimentional Data Warehouses
Data Compression for Large Multidimensional Data Warehouses
Dr. K.M. Azharul Hasan
Associate Professor,
Head of the Department,
Department of CSE, KUET
Presented by: Supervisor: Abdullah Al Mahmud,
Roll : 0507006
Md. Mushfiqur Rahman,
Roll : 0507029
1
This slide is prepared by Muhammad Mushfiqur Rahman & Abdullah Al Mahmud for the presentation of Thesis
Presentation Layout
Objectives
Existing Compression Schemes
Traditional Extendible Array
Proposed Compression Scheme
EXCS
(Extendible Array Based Compression Scheme)
Comparative Analysis
Conclusion
2
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
Data compression technology reduces:
effective price of logical data storage capacity
improves query performance
Multidimensional array is widely used in large number of scientific research.
An efficient compression of multidimensional array can handle large multidimensional data sets of data warehouses
3
Objectives
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
Existing Compression Schemes (1/ 3)
Bitmap compression
Run Length Encoding
Header compression
Compressed Column Storage
Compressed Row Storage
4
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
Existing Compression Schemes (2/ 3)
5
(a) A sparse array. (b) The CRS scheme
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
Existing Compression Schemes (3/ 3)
Classical methods cannot support updates without completely readjusting runs .
Compressing sparse array
Do not support extendibility
6
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
Traditional Extendible Array
TEA supports dynamic extension of dimension size.
7
0 1
2 3
4
5
6 7 8
9
10
11
0 1 4 9
0
2
6
0
0 1 3 5
2
4
Address Table
History Table
0 History Counter= 0 1 2 3 4 5
Figure 1: TEA Construction And Access
Position <1,3>
H1[1]<H2[3]
Address of Cell=Address1[3]+1=10
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
Proposed Compression Scheme
Multidimensional arrays are important for sparse array operations
Extendibility of multidimensional arrays
A compression technique that can work on multidimensional extendible array
Our proposed compression scheme is EXCS (Extendible array based Compression Scheme)
8
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
Extendible array based Compression Scheme (EXCS) 1/3
We implemented the multidimensional extendible array in secondary memory
We have considered dimension =3 in our experimental approach
The sub-arrays are distinguished to store
them individually in the secondary memory
9
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
Extendible array based Compression Scheme (EXCS) 2/3
The sub-arrays are of n-1(=2) dimension
A large no. of sub-arrays are generated to be compressed
Sub-arrays are dynamically taken as input
Only the max no of sub-arrays is to be given
10
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
11
Extendible array based Compression Scheme (EXCS) 3/3
Each sub-array is compressed individually
The compression technique used is similar to CRS
The compressed elements are written in the secondary memory as RO, CO, VL of subarray_1, subarray_2, … … subarray_N
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
Performance Measurement Performance is measured by measuring two
key factors of the compression schemes:
Data Density
Length of Dimension/ Number of Data
compression ratio=
(compressed data/ original data)
space savings = 1 – compression ratio
we have considered space savings in percent
12
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
Comparative Analysis (1/4)
13
-40
-20
0
20
40
60
80
100
64 729 4096 15625 46656
Sp
ac
e s
avin
gs
Header
Bitmap
CRS
EACRS
Offset
No. of data
Figure: Comparison with fixed density = 20%
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
14
-40
-20
0
20
40
60
80
64 729 4096 15625 46656
Sp
ac
e s
avin
gs
Header
Bitmap
CRS
EACRS
Offset
No. of data
Figure: Comparison with fixed density = 25%
Comparative Analysis (2/4)
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
Comparative Analysis (3/4)
15
-60
-40
-20
0
20
40
60
80
100
10 20 30 40 50co
mp
res
sio
n r
ati
o
Header
Bitmap
CRS
EACRS
Offset
Density of data
Figure: Comparison with fixed no. of data=64
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
Comparative Analysis (4/4)
16
-60
-40
-20
0
20
40
60
80
100
10 20 30 40 50co
mp
res
sio
n r
ati
o
Header
Bitmap
CRS
EACRS
Offset
Density of data
Figure: Comparison with fixed no. of data=4096
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
Performance Measurement
Extendibility of arrays
Using multidimensional arrays
Extendibility toward any dimension
EXCS allows dynamic extension of arrays.
In analysis, we can extend data up to n dimensions
Performance is good for large no. of data
17
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
Conclusion
Our proposed compression scheme is experimentally done up to 3 dimension data
It can be extended experimentally for compressing n dimension data in future.
EXCS is effective for large multidimensional data warehouses
18
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh