Multidimensional Databases
-
Upload
bixanh0205 -
Category
Documents
-
view
230 -
download
0
Transcript of Multidimensional Databases
-
8/8/2019 Multidimensional Databases
1/57
MultidimensionalDatabases
Prof. Navneet Goyal Computer Science Department
BITS, Pilani
-
8/8/2019 Multidimensional Databases
2/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 2
Database Evolution
Flat filesHierarchical and NetworkRelationalDistributed RelationalMultidimensional
-
8/8/2019 Multidimensional Databases
3/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 3
Why Multi-Dimensional Databases?
No single "best" data structure for allapplications within an enterpriseOrganizations have abandoned the searchfor the HOLY GRAIL of globally accepteddatabaseSelect the most appropriate data structureon a case-by-case basis from a palette of
standard database structuresMultidimensional Databases for OLAP?
-
8/8/2019 Multidimensional Databases
4/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 4
Why Multi-Dimensional Databases?
From econometric research conducted at MIT inthe 1960s, the multidimensional database hasmatured into the database engine of choice for data analysis applicationsInherent ability to integrate and analyze largevolumes of enterprise dataOffers a good conceptual fit with the way end-users visualize business data Most business people already think about their
businesses in multidimensional terms Managers tend to ask questions about product sales in
different markets over specific time periods
-
8/8/2019 Multidimensional Databases
5/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 5
Multidimensional Database
Spreadsheets A 2D database?Functionalities
What about a stack of similar spreadsheets for different times?Limitations?
We can not relate data in differentsheets easily
-
8/8/2019 Multidimensional Databases
6/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 6
Multidimensional Database
An MDDB is a computer software systemdesigned to allow for the efficient andconvenient storage and retrieval system of large volumes of data that is
1. Intimately related &2. Stored, viewed and analyzed form different
perspectives
These perspectives are called Dimensions
-
8/8/2019 Multidimensional Databases
7/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 7
A Motivating Example
An automobile manufacturer wants toincrease sale volumes by examining salesdata collected throughout the organization.The evaluation would require viewing
historical sales volume figures from multipledimensions such asSales volume by modelSales volume by color Sales volume by dealer Sales volume over time
-
8/8/2019 Multidimensional Databases
8/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 8
R elational Structure
-
8/8/2019 Multidimensional Databases
9/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 9
COLOR
MODEL
Mini Van
Sedan
Coupe
Red WhiteBlue
6 5 4
3 5 5
4 3 2
Sales Volumes
Multidimensional A rra y Structure
-
8/8/2019 Multidimensional Databases
10/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 10
R DBMS vs. MDD
Multidimensional array structure represents ah ig h er level of organization than the relationaltablePerspectives are embedded directly into thestructure in the multidimensional model
All possible combinations of perspectives containing aspecific attribute (the color BLUE, for example) line up alongthe dimension position for that attribute.
Perspectives are placed in fields in therelational model - tells us nothing about fieldcontents .
-
8/8/2019 Multidimensional Databases
11/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 11
MDD makes data browsing and manipulationintuitive to the end-user Any data manipulation action possible with aMDD is also possible using relationaltechnologySubstantial cognitive advantages in queryformulationSubstantial computational performanceadvantages in query processing when usingMDD
R DBMS vs. MDD
-
8/8/2019 Multidimensional Databases
12/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 12
R DBMS vs. MDD
-
8/8/2019 Multidimensional Databases
13/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 13
Mutlidimensional R epresentation
Sales Volumes
DEALERSHIP
Mini Van
Coupe
Sedan
Blue Red White
MODEL
ClydeGleason
Carr
COLOR
-
8/8/2019 Multidimensional Databases
14/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 14
V iewing Data - A n Example
DEALERSHIP
Sales Volumes
MODEL
COLOR
Assume that each dimension has 10 positions, as shown inthe cube above
-
8/8/2019 Multidimensional Databases
15/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 15
V iewing Data - A n Example
H ow many records would be there in a relational table?Implications for viewing data from an end-user standpoint?
MODEL COLOR DEALERSHIP VOLUMEMINI VAN BLUE CLYDE 2MINI VAN BLUE GLEASON 2
MINI VAN BLUE CARR 2MINI VAN RED CLYDE 1MINI VAN WHITE GLEASON 3
RE CORD NUMBER.... 998RE CORD NUMBER.... 999RE CORD NUMBER.... 1000
SALES VOLUMES FOR ALL DEALERSHIPS
-
8/8/2019 Multidimensional Databases
16/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 16
Volume figure when car type = SEDAN,color=BLUE, & dealer=GLEASON?RDBMS all 1000 records might need to be
searched to find the right recordMDB has more knowledge about where thedata liesMax. of 30 position searches!!Average case15 vs. 500
P erformance A dvantages
-
8/8/2019 Multidimensional Databases
17/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 17
Total Sales across all colors and dealers whenmodel = SEDAN?RDBMS all 1000 records must be searched to
get the answer MDB Sum the contents of one 10x10 slice
P erformance A dvantages
-
8/8/2019 Multidimensional Databases
18/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 18
Data manipulation that requires a minute inRDBMS may require only a few seconds in MDBMDBs are an order of magnitude faster thanRDBMSs
Performance benefits are more for queries thatgenerate cross-tab views of dataThe performance advantages offered bymultidimensional technology facilitates thedevelopment of interactive decision supportapplications like OLAP that can be impracticalin a relational environment.
P erformance A dvantages
-
8/8/2019 Multidimensional Databases
19/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 19
Any data manipulation action possiblewith a multidimensional database is alsopossible using relational technology
MDBs however offer several advantageslike: Ease of data presentation and navigation
Ease of maintenance Performance
R DBMS vs. MDB
-
8/8/2019 Multidimensional Databases
20/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 20
Intuitive spreadsheet like data views arenatural output of MDBsObtaining the same views in a relational
environment, requires either a complexSQL or a SQL generator against a RDB toconvert the table outputs into a moreintuitive formatTop N queries are not possible with SQLat all
Ease of Data P resentation & Navigation
-
8/8/2019 Multidimensional Databases
21/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 21
Ease of maintenance because data isstored as it is viewedNo additional overhead is required to
translate user queries into requests for dataTo provide same intuitiveness, RDBs useindexes and sophisticated joins whichrequire significant maintenance andstorage
Ease of Maintenance
-
8/8/2019 Multidimensional Databases
22/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 22
Performance of MDBs can be matched byRDBs through database tuningNot possible to tune the database for all
possible adhoc queriesTuning requires resources of anexpensive DB specialistAggregate navigators are helping RDBsto catch up with MDBs as far asaggregation queries are concerned
P erformance
-
8/8/2019 Multidimensional Databases
23/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 23
A dding Dimension - A n Example
MODEL
Mini Van
Coupe
Sedan
Blue Red White
ClydeGleason
Carr
COLOR
Sales Volumes
Coupe
Sedan
Blue Red White
ClydeGleason
Carr
COLOR
DEALERSHIP
Mini Van
Coupe
Sedan
Blue Red White
ClydeGleason
Carr
COLOR
JANUARY FEBRUARY MAR CH
Mini Van
-
8/8/2019 Multidimensional Databases
24/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 24
Wh en is MDD (In)appropriate?
PERSONNEL
LAST NAME EMPLOYEE# EMPLOYEE AGESM I TH 01 21REGAN 12 19FOX 3 1 6 3WELD 14 3 1KELLY 5 4 27LI N K 0 3 5 6KRANZ 41 4 5LUCUS 33 41WEI SS 2 3 19
First, consider situation 1
-
8/8/2019 Multidimensional Databases
25/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 25
N ow consider situation 2SALES VOLUMES FOR GLEASON DEALERSHIP
MODEL COLOR VOLUMEMINI VAN BLUE 6MINI VAN RED 5MINI VAN WHITE 4SPORTS COUPE BLUE 3SPORTS COUPE RED 5SPORTS COUPE WHITE 5SEDAN BLUE 4SEDAN RED 3SEDAN WHITE 2
1. S et up a MDD structure for situation 1, with LAST NAMEand Employee # as dimensions, and AGE as the measurement.2. S et up a MDD structure for situation 2, with MODEL andCOLOR as dimensions, and SALES VOLUME as the measurement .
When is MDD (In)appropriate?
-
8/8/2019 Multidimensional Databases
26/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 26
When is MDD (In)appropriate?
COLOR
MODEL
Mini Van
Sedan
Coupe
Red WhiteBlue
6 5 4
3 5 5
4 3 2
Sales Volumes
EMPLOYEE #
LAST
NAME
Kranz
Weiss
Lucas
41 3331
45
19
Employee Age
41
31
56
63
21
19
Smith
Regan
Fox
Weld
Kelly
Link
01 14 54 03 1223
27
N ote the sparse between the two MDD representations
MDD S tructures for the S ituations
-
8/8/2019 Multidimensional Databases
27/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 27
When is MDD (In)appropriate?
O ur sales volume dataset has a great numberof meaningful interrelationshipsInterrelationships more meaningful thanindividual data elements themselves.
The greater the number of inherent interrelationshipsbetween the elements of a dataset, the more likely it isthat a study of those interrelationships will yield businessinformation of value to the company.H ighly interrelated dataset types be placed in amultidimensional data structure for greatestease of access and analysis
-
8/8/2019 Multidimensional Databases
28/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 28
When is MDD (In)appropriate?
N o last name is matching with morethan one emp # and no emp # ismatching with more than one last name
In contrast, there is a sales figureassociated with every combination ofmodel and color resulting in acompleted filled up 3x3 matri x
P erformance suffers ( RDB 9 vs. MDB 18)
-
8/8/2019 Multidimensional Databases
29/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 29
When is MDD (In)appropriate?
The relative performance advantages ofstoring multidimensional data in amultidimensional array increase as thesize of the dataset increases
The relative performance disadvantagesof storing non-multidimensional data ina multidimensional array increase asthe size of the dataset increases.NO inherent value of storing N on-multidimensional data (employee data)in multidimensional arrays
-
8/8/2019 Multidimensional Databases
30/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 30
When is MDD Appropriate?
The greater the number of inherent interrelationshipsbetween the elements of a dataset, the more likely it isthat a study of those interrelationships will yield businessinformation of value to the company.M ost companies have limited time andresources to devote to analyzing dataIt therefore becomes critical that these highlyinterrelated dataset types be placed in amultidimensional data structure for greatest
ease of access and analysis.
-
8/8/2019 Multidimensional Databases
31/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 31
When is MDD Appropriate?Ex
amples of applications that are suited formultidimensional technology:
1. Financial Analysis and Reporting2. Budgeting3. Promotion Tracking4. Quality Assurance and Quality Control5. Product Profitability
-
8/8/2019 Multidimensional Databases
32/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 32
MDD Features - Rotation
Sales Volumes
COLOR
MODEL
Mini Van
Sedan
Coupe
Red WhiteBlue
6 5 4
3 5 5
4 3 2
MODEL
COLOR
SedanCoupe
Red
White
Blue 6 3 4
5 5 3
4 5 2( ROTATE 90
o)
View #1: ModelxColorView #2: ColorxModel
Mini Van
Also referred to as data slicing.Each rotation yields a different slice or two dimensional table
of data.
-
8/8/2019 Multidimensional Databases
33/57
-
8/8/2019 Multidimensional Databases
34/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 34
MDD Features - Rotation
All the si x views can be obtained by simplerotation
In MDB s rotations are simple as norearrangement of data is required
R otation is also referred to as data slicing
N o. of views
2D 23D 6
4D - ? 24
-
8/8/2019 Multidimensional Databases
35/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 35
MDD Features - Ranging
H ow sales volume of models paintedwith new metallic blue compared withthe sales of normal blue color models?The user knows that only S ports Coupeand M ini Van models have received thenew paint treatmentAlso the user knows that only 2 dealers
viz, Carr and Clyde have unconstrainedsupply of these models
-
8/8/2019 Multidimensional Databases
36/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 36
MDD Features - Ranging
The end user selects the desired positions along each dimension.Also referred to as "data dicing."The data is scoped down to a subset grouping
Sales Volumes
DEALERSHIP
Mini Van
Coupe
MetalBlue
MODEL
ClydeCarr
COLOR
NormalBlue
Mini Van
Coupe
NormalBlue
MetalBlue
ClydeCarr
-
8/8/2019 Multidimensional Databases
37/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 37
MDD Features - Ranging
The reduced array can now be rotatedand used in computations in the samewas as the parent arrayR eferred to as D ata D icing as data isscoped down to a subset groupingComple x SQL query is required in RDBP erformance is better in MDB as less
resource consuming searches arerequired
-
8/8/2019 Multidimensional Databases
38/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 38
MDD Features Roll-Up & Drill-Down
Users want different views of the same dataFor eg., S ales Volume by model vs, sales volume bydealershipMany times views are similar
S ales volume by dealership vs. volume by districtN atural relationship between S ales Volumes at theDEALERSH I P level and S ales Volumes at theD I STR I CT level
S ales Volumes for all the dealerships in a districtsum to the S ales Volumes for that district
-
8/8/2019 Multidimensional Databases
39/57
-
8/8/2019 Multidimensional Databases
40/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 40
MDD Features - R oll-Ups & Drill Downs
Gary
Gleason arr Levi Lucas Bolton
Midwest
St. Louishicago
lyde
REGION
DISTRI T
DEALERSHIP
ORGANIZATION DIMENSION
The figure presents a definition of a hierarchy withinthe organization dimension.
Aggregations perceived as being part of the same dimension.Moving up and moving down levels in a hierarchy is referred
toas roll-up and drill-down.
-
8/8/2019 Multidimensional Databases
41/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 41
MDD Features - R oll-Ups & Drill Downs
-
8/8/2019 Multidimensional Databases
42/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 42
ea ures:Drill-Down T h roug h a Dimension
GaryGleason Carr Levi Lucas Bolton MidwestSt. Louis ChicagoClyde
REGION
DISTRICTDEALERSHIP
MO
DEL
COLOR
Sales Volumes
-
8/8/2019 Multidimensional Databases
43/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 43
Q ueries
High degree of structure inMDB makes the querylanguage very simple and
efficientQuery language is intuitiveOutput is immediately usefulto end user
-
8/8/2019 Multidimensional Databases
44/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 44
Q ueries: Example
Display sales volume by modelfor each dealershipPRINT TOTAL.(SALES_VOLUME KEEP MODEL DEALERSHIP)
Trends emerge and comparisons are easily made
DEALERSHIPMODEL CLYDE GLEASON CARRMINI VAN 7 5 6SPORTS COUPE 4 6 8
SEDAN 3 8 1 2
-
8/8/2019 Multidimensional Databases
45/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 45
Q ueries: Example
Corresponding SQLSELE CT MODEL, DEALERSHIP, SUM(SALES_VOLUME)FROM SALES_VOLUMEGROUP BY MODEL, DEALERSHIP
ORDER BY MODEL, DEALERSHIPMODEL | DEALERSHIP | SUM(SALES_VOLUME)
MINI VAN | CLYDE | 7MINI VAN | GLEASON | 5MINI VAN | CARR | 6SPORTS COUPE| CLYDE | 4
SPORTS COUPE| GLEASON | 6SPORTS COUPE| CARR | 8SEDAN | CLYDE | 3SEDAN | GLEASON | 8
SEDAN | CARR | 1 2
-
8/8/2019 Multidimensional Databases
46/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 46
Q ueries: ExampleUse report writer in addition to SQL andwe getMINI VAN
CLYDE 7GLEASON 5
CARR 6SPORTS COUPE
CLYDE 4GLEASON 5CARR 8
SEDANCLYDE 3GLEASON 8CARR 12
MDD F t
-
8/8/2019 Multidimensional Databases
47/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 47
MDD Features:Multidimensional Computations
Well equipped to handle demanding mathematicalfunctions.Can treat arrays like cells in spreadsheets. For example, in a budget analysis situation, one candivide the A CTUAL array by the BUDGET array to
compute the VARIAN CE array.Applications based on multidimensional databasetechnology typically have one dimension definedas a "business measurements" dimension.Integrates computational tools very tightly withthe database structure.
-
8/8/2019 Multidimensional Databases
48/57
-
8/8/2019 Multidimensional Databases
49/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 49
T h e Time Dimension
TIME as a predefined hierarchy for rolling-upand drilling-down across days, weeks, months,years and special periods, such as fiscal years. Eliminates the effort required to build sophisticated
hierarchies every time a database is set up.
Extra performance advantages
on ras ng e a ona o e an
-
8/8/2019 Multidimensional Databases
50/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 50
on ras ng e a ona o e anMD Model
Criteria Relational Model Multidimensional
DatabasesFocus Data integrity of each piece
of dataFacilitate expl oration of interrelati onships betweendimen sions
Organizati on str ucture One-dimen sional array M ulti-dimen sional array sPer s pectives Embedded in f ields Embedded directly in
MDDB str uctureComputational p ower for Query pr ocessing
Joining table s of tenrequired; computationallyexpen sive
Str ucture de signed for OLAP; computationallycheap
Cognitive i ssues inquerying data
Cumber some Int uitive
Query Lang uages SQL or SQL f r ont-end s,such as QBE
Point-and- click emp hasis No standardized lang uage
Management of TimeDimen sion
Not well suited Well suited
-
8/8/2019 Multidimensional Databases
51/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 51
R DBMS vs. MDDB
DO I still use RDBMS for my DW?MDDBs store data in hypercube, i.e.,multidimensional array
RDBMS store data as tables with row andcolumns that do not map directly tomultidimensional view that user have of data
EDW RDBMSData Marts - MDDB
-
8/8/2019 Multidimensional Databases
52/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 52
R DBMS vs. MDDB: Trade-Offs
SIZE MDDBs limited by sizeMid 1990s 10GB caused problemsToday 100GB is OK
Large DWs are still better served by relationalfront-ends running against high performanceand scalable RDBMS
VOLATILITY Highly volatile data are better handled by
RDBMS MDDBs take long to load and update
-
8/8/2019 Multidimensional Databases
53/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 53
R DBMS vs. MDDB: Trade-Offs
AGGREGATE STRATEGY MDDBs support aggregates better RDBMSs are catching up with the help of
Aggregate NavigatorsINVESTMENT PROTE CTION Most organizations already have made
significant investments in relationaltechnology and skill sets
Continued use for another purpose (DW)provides additional ROI and lowers technicalrisk of failure
MDDBs need to acquire new software andtrain staff to use it
-
8/8/2019 Multidimensional Databases
54/57
-
8/8/2019 Multidimensional Databases
55/57
O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 55
INTEG RA TED AR CHITECTU R E
DB vendors have integrated their multidimensional and relational databaseproductsMultidimensional Front-end tools
If queries require data that are notavailable in MDDB, the tools retrieve thedata from the larger RDBKnown as DRILL-THROUGH
-
8/8/2019 Multidimensional Databases
56/57
Q & A
-
8/8/2019 Multidimensional Databases
57/57