Multidimensional Databases

download Multidimensional Databases

of 57

Transcript of Multidimensional Databases

  • 8/8/2019 Multidimensional Databases

    1/57

    MultidimensionalDatabases

    Prof. Navneet Goyal Computer Science Department

    BITS, Pilani

  • 8/8/2019 Multidimensional Databases

    2/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 2

    Database Evolution

    Flat filesHierarchical and NetworkRelationalDistributed RelationalMultidimensional

  • 8/8/2019 Multidimensional Databases

    3/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 3

    Why Multi-Dimensional Databases?

    No single "best" data structure for allapplications within an enterpriseOrganizations have abandoned the searchfor the HOLY GRAIL of globally accepteddatabaseSelect the most appropriate data structureon a case-by-case basis from a palette of

    standard database structuresMultidimensional Databases for OLAP?

  • 8/8/2019 Multidimensional Databases

    4/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 4

    Why Multi-Dimensional Databases?

    From econometric research conducted at MIT inthe 1960s, the multidimensional database hasmatured into the database engine of choice for data analysis applicationsInherent ability to integrate and analyze largevolumes of enterprise dataOffers a good conceptual fit with the way end-users visualize business data Most business people already think about their

    businesses in multidimensional terms Managers tend to ask questions about product sales in

    different markets over specific time periods

  • 8/8/2019 Multidimensional Databases

    5/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 5

    Multidimensional Database

    Spreadsheets A 2D database?Functionalities

    What about a stack of similar spreadsheets for different times?Limitations?

    We can not relate data in differentsheets easily

  • 8/8/2019 Multidimensional Databases

    6/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 6

    Multidimensional Database

    An MDDB is a computer software systemdesigned to allow for the efficient andconvenient storage and retrieval system of large volumes of data that is

    1. Intimately related &2. Stored, viewed and analyzed form different

    perspectives

    These perspectives are called Dimensions

  • 8/8/2019 Multidimensional Databases

    7/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 7

    A Motivating Example

    An automobile manufacturer wants toincrease sale volumes by examining salesdata collected throughout the organization.The evaluation would require viewing

    historical sales volume figures from multipledimensions such asSales volume by modelSales volume by color Sales volume by dealer Sales volume over time

  • 8/8/2019 Multidimensional Databases

    8/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 8

    R elational Structure

  • 8/8/2019 Multidimensional Databases

    9/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 9

    COLOR

    MODEL

    Mini Van

    Sedan

    Coupe

    Red WhiteBlue

    6 5 4

    3 5 5

    4 3 2

    Sales Volumes

    Multidimensional A rra y Structure

  • 8/8/2019 Multidimensional Databases

    10/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 10

    R DBMS vs. MDD

    Multidimensional array structure represents ah ig h er level of organization than the relationaltablePerspectives are embedded directly into thestructure in the multidimensional model

    All possible combinations of perspectives containing aspecific attribute (the color BLUE, for example) line up alongthe dimension position for that attribute.

    Perspectives are placed in fields in therelational model - tells us nothing about fieldcontents .

  • 8/8/2019 Multidimensional Databases

    11/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 11

    MDD makes data browsing and manipulationintuitive to the end-user Any data manipulation action possible with aMDD is also possible using relationaltechnologySubstantial cognitive advantages in queryformulationSubstantial computational performanceadvantages in query processing when usingMDD

    R DBMS vs. MDD

  • 8/8/2019 Multidimensional Databases

    12/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 12

    R DBMS vs. MDD

  • 8/8/2019 Multidimensional Databases

    13/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 13

    Mutlidimensional R epresentation

    Sales Volumes

    DEALERSHIP

    Mini Van

    Coupe

    Sedan

    Blue Red White

    MODEL

    ClydeGleason

    Carr

    COLOR

  • 8/8/2019 Multidimensional Databases

    14/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 14

    V iewing Data - A n Example

    DEALERSHIP

    Sales Volumes

    MODEL

    COLOR

    Assume that each dimension has 10 positions, as shown inthe cube above

  • 8/8/2019 Multidimensional Databases

    15/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 15

    V iewing Data - A n Example

    H ow many records would be there in a relational table?Implications for viewing data from an end-user standpoint?

    MODEL COLOR DEALERSHIP VOLUMEMINI VAN BLUE CLYDE 2MINI VAN BLUE GLEASON 2

    MINI VAN BLUE CARR 2MINI VAN RED CLYDE 1MINI VAN WHITE GLEASON 3

    RE CORD NUMBER.... 998RE CORD NUMBER.... 999RE CORD NUMBER.... 1000

    SALES VOLUMES FOR ALL DEALERSHIPS

  • 8/8/2019 Multidimensional Databases

    16/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 16

    Volume figure when car type = SEDAN,color=BLUE, & dealer=GLEASON?RDBMS all 1000 records might need to be

    searched to find the right recordMDB has more knowledge about where thedata liesMax. of 30 position searches!!Average case15 vs. 500

    P erformance A dvantages

  • 8/8/2019 Multidimensional Databases

    17/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 17

    Total Sales across all colors and dealers whenmodel = SEDAN?RDBMS all 1000 records must be searched to

    get the answer MDB Sum the contents of one 10x10 slice

    P erformance A dvantages

  • 8/8/2019 Multidimensional Databases

    18/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 18

    Data manipulation that requires a minute inRDBMS may require only a few seconds in MDBMDBs are an order of magnitude faster thanRDBMSs

    Performance benefits are more for queries thatgenerate cross-tab views of dataThe performance advantages offered bymultidimensional technology facilitates thedevelopment of interactive decision supportapplications like OLAP that can be impracticalin a relational environment.

    P erformance A dvantages

  • 8/8/2019 Multidimensional Databases

    19/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 19

    Any data manipulation action possiblewith a multidimensional database is alsopossible using relational technology

    MDBs however offer several advantageslike: Ease of data presentation and navigation

    Ease of maintenance Performance

    R DBMS vs. MDB

  • 8/8/2019 Multidimensional Databases

    20/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 20

    Intuitive spreadsheet like data views arenatural output of MDBsObtaining the same views in a relational

    environment, requires either a complexSQL or a SQL generator against a RDB toconvert the table outputs into a moreintuitive formatTop N queries are not possible with SQLat all

    Ease of Data P resentation & Navigation

  • 8/8/2019 Multidimensional Databases

    21/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 21

    Ease of maintenance because data isstored as it is viewedNo additional overhead is required to

    translate user queries into requests for dataTo provide same intuitiveness, RDBs useindexes and sophisticated joins whichrequire significant maintenance andstorage

    Ease of Maintenance

  • 8/8/2019 Multidimensional Databases

    22/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 22

    Performance of MDBs can be matched byRDBs through database tuningNot possible to tune the database for all

    possible adhoc queriesTuning requires resources of anexpensive DB specialistAggregate navigators are helping RDBsto catch up with MDBs as far asaggregation queries are concerned

    P erformance

  • 8/8/2019 Multidimensional Databases

    23/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 23

    A dding Dimension - A n Example

    MODEL

    Mini Van

    Coupe

    Sedan

    Blue Red White

    ClydeGleason

    Carr

    COLOR

    Sales Volumes

    Coupe

    Sedan

    Blue Red White

    ClydeGleason

    Carr

    COLOR

    DEALERSHIP

    Mini Van

    Coupe

    Sedan

    Blue Red White

    ClydeGleason

    Carr

    COLOR

    JANUARY FEBRUARY MAR CH

    Mini Van

  • 8/8/2019 Multidimensional Databases

    24/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 24

    Wh en is MDD (In)appropriate?

    PERSONNEL

    LAST NAME EMPLOYEE# EMPLOYEE AGESM I TH 01 21REGAN 12 19FOX 3 1 6 3WELD 14 3 1KELLY 5 4 27LI N K 0 3 5 6KRANZ 41 4 5LUCUS 33 41WEI SS 2 3 19

    First, consider situation 1

  • 8/8/2019 Multidimensional Databases

    25/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 25

    N ow consider situation 2SALES VOLUMES FOR GLEASON DEALERSHIP

    MODEL COLOR VOLUMEMINI VAN BLUE 6MINI VAN RED 5MINI VAN WHITE 4SPORTS COUPE BLUE 3SPORTS COUPE RED 5SPORTS COUPE WHITE 5SEDAN BLUE 4SEDAN RED 3SEDAN WHITE 2

    1. S et up a MDD structure for situation 1, with LAST NAMEand Employee # as dimensions, and AGE as the measurement.2. S et up a MDD structure for situation 2, with MODEL andCOLOR as dimensions, and SALES VOLUME as the measurement .

    When is MDD (In)appropriate?

  • 8/8/2019 Multidimensional Databases

    26/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 26

    When is MDD (In)appropriate?

    COLOR

    MODEL

    Mini Van

    Sedan

    Coupe

    Red WhiteBlue

    6 5 4

    3 5 5

    4 3 2

    Sales Volumes

    EMPLOYEE #

    LAST

    NAME

    Kranz

    Weiss

    Lucas

    41 3331

    45

    19

    Employee Age

    41

    31

    56

    63

    21

    19

    Smith

    Regan

    Fox

    Weld

    Kelly

    Link

    01 14 54 03 1223

    27

    N ote the sparse between the two MDD representations

    MDD S tructures for the S ituations

  • 8/8/2019 Multidimensional Databases

    27/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 27

    When is MDD (In)appropriate?

    O ur sales volume dataset has a great numberof meaningful interrelationshipsInterrelationships more meaningful thanindividual data elements themselves.

    The greater the number of inherent interrelationshipsbetween the elements of a dataset, the more likely it isthat a study of those interrelationships will yield businessinformation of value to the company.H ighly interrelated dataset types be placed in amultidimensional data structure for greatestease of access and analysis

  • 8/8/2019 Multidimensional Databases

    28/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 28

    When is MDD (In)appropriate?

    N o last name is matching with morethan one emp # and no emp # ismatching with more than one last name

    In contrast, there is a sales figureassociated with every combination ofmodel and color resulting in acompleted filled up 3x3 matri x

    P erformance suffers ( RDB 9 vs. MDB 18)

  • 8/8/2019 Multidimensional Databases

    29/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 29

    When is MDD (In)appropriate?

    The relative performance advantages ofstoring multidimensional data in amultidimensional array increase as thesize of the dataset increases

    The relative performance disadvantagesof storing non-multidimensional data ina multidimensional array increase asthe size of the dataset increases.NO inherent value of storing N on-multidimensional data (employee data)in multidimensional arrays

  • 8/8/2019 Multidimensional Databases

    30/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 30

    When is MDD Appropriate?

    The greater the number of inherent interrelationshipsbetween the elements of a dataset, the more likely it isthat a study of those interrelationships will yield businessinformation of value to the company.M ost companies have limited time andresources to devote to analyzing dataIt therefore becomes critical that these highlyinterrelated dataset types be placed in amultidimensional data structure for greatest

    ease of access and analysis.

  • 8/8/2019 Multidimensional Databases

    31/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 31

    When is MDD Appropriate?Ex

    amples of applications that are suited formultidimensional technology:

    1. Financial Analysis and Reporting2. Budgeting3. Promotion Tracking4. Quality Assurance and Quality Control5. Product Profitability

  • 8/8/2019 Multidimensional Databases

    32/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 32

    MDD Features - Rotation

    Sales Volumes

    COLOR

    MODEL

    Mini Van

    Sedan

    Coupe

    Red WhiteBlue

    6 5 4

    3 5 5

    4 3 2

    MODEL

    COLOR

    SedanCoupe

    Red

    White

    Blue 6 3 4

    5 5 3

    4 5 2( ROTATE 90

    o)

    View #1: ModelxColorView #2: ColorxModel

    Mini Van

    Also referred to as data slicing.Each rotation yields a different slice or two dimensional table

    of data.

  • 8/8/2019 Multidimensional Databases

    33/57

  • 8/8/2019 Multidimensional Databases

    34/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 34

    MDD Features - Rotation

    All the si x views can be obtained by simplerotation

    In MDB s rotations are simple as norearrangement of data is required

    R otation is also referred to as data slicing

    N o. of views

    2D 23D 6

    4D - ? 24

  • 8/8/2019 Multidimensional Databases

    35/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 35

    MDD Features - Ranging

    H ow sales volume of models paintedwith new metallic blue compared withthe sales of normal blue color models?The user knows that only S ports Coupeand M ini Van models have received thenew paint treatmentAlso the user knows that only 2 dealers

    viz, Carr and Clyde have unconstrainedsupply of these models

  • 8/8/2019 Multidimensional Databases

    36/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 36

    MDD Features - Ranging

    The end user selects the desired positions along each dimension.Also referred to as "data dicing."The data is scoped down to a subset grouping

    Sales Volumes

    DEALERSHIP

    Mini Van

    Coupe

    MetalBlue

    MODEL

    ClydeCarr

    COLOR

    NormalBlue

    Mini Van

    Coupe

    NormalBlue

    MetalBlue

    ClydeCarr

  • 8/8/2019 Multidimensional Databases

    37/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 37

    MDD Features - Ranging

    The reduced array can now be rotatedand used in computations in the samewas as the parent arrayR eferred to as D ata D icing as data isscoped down to a subset groupingComple x SQL query is required in RDBP erformance is better in MDB as less

    resource consuming searches arerequired

  • 8/8/2019 Multidimensional Databases

    38/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 38

    MDD Features Roll-Up & Drill-Down

    Users want different views of the same dataFor eg., S ales Volume by model vs, sales volume bydealershipMany times views are similar

    S ales volume by dealership vs. volume by districtN atural relationship between S ales Volumes at theDEALERSH I P level and S ales Volumes at theD I STR I CT level

    S ales Volumes for all the dealerships in a districtsum to the S ales Volumes for that district

  • 8/8/2019 Multidimensional Databases

    39/57

  • 8/8/2019 Multidimensional Databases

    40/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 40

    MDD Features - R oll-Ups & Drill Downs

    Gary

    Gleason arr Levi Lucas Bolton

    Midwest

    St. Louishicago

    lyde

    REGION

    DISTRI T

    DEALERSHIP

    ORGANIZATION DIMENSION

    The figure presents a definition of a hierarchy withinthe organization dimension.

    Aggregations perceived as being part of the same dimension.Moving up and moving down levels in a hierarchy is referred

    toas roll-up and drill-down.

  • 8/8/2019 Multidimensional Databases

    41/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 41

    MDD Features - R oll-Ups & Drill Downs

  • 8/8/2019 Multidimensional Databases

    42/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 42

    ea ures:Drill-Down T h roug h a Dimension

    GaryGleason Carr Levi Lucas Bolton MidwestSt. Louis ChicagoClyde

    REGION

    DISTRICTDEALERSHIP

    MO

    DEL

    COLOR

    Sales Volumes

  • 8/8/2019 Multidimensional Databases

    43/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 43

    Q ueries

    High degree of structure inMDB makes the querylanguage very simple and

    efficientQuery language is intuitiveOutput is immediately usefulto end user

  • 8/8/2019 Multidimensional Databases

    44/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 44

    Q ueries: Example

    Display sales volume by modelfor each dealershipPRINT TOTAL.(SALES_VOLUME KEEP MODEL DEALERSHIP)

    Trends emerge and comparisons are easily made

    DEALERSHIPMODEL CLYDE GLEASON CARRMINI VAN 7 5 6SPORTS COUPE 4 6 8

    SEDAN 3 8 1 2

  • 8/8/2019 Multidimensional Databases

    45/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 45

    Q ueries: Example

    Corresponding SQLSELE CT MODEL, DEALERSHIP, SUM(SALES_VOLUME)FROM SALES_VOLUMEGROUP BY MODEL, DEALERSHIP

    ORDER BY MODEL, DEALERSHIPMODEL | DEALERSHIP | SUM(SALES_VOLUME)

    MINI VAN | CLYDE | 7MINI VAN | GLEASON | 5MINI VAN | CARR | 6SPORTS COUPE| CLYDE | 4

    SPORTS COUPE| GLEASON | 6SPORTS COUPE| CARR | 8SEDAN | CLYDE | 3SEDAN | GLEASON | 8

    SEDAN | CARR | 1 2

  • 8/8/2019 Multidimensional Databases

    46/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 46

    Q ueries: ExampleUse report writer in addition to SQL andwe getMINI VAN

    CLYDE 7GLEASON 5

    CARR 6SPORTS COUPE

    CLYDE 4GLEASON 5CARR 8

    SEDANCLYDE 3GLEASON 8CARR 12

    MDD F t

  • 8/8/2019 Multidimensional Databases

    47/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 47

    MDD Features:Multidimensional Computations

    Well equipped to handle demanding mathematicalfunctions.Can treat arrays like cells in spreadsheets. For example, in a budget analysis situation, one candivide the A CTUAL array by the BUDGET array to

    compute the VARIAN CE array.Applications based on multidimensional databasetechnology typically have one dimension definedas a "business measurements" dimension.Integrates computational tools very tightly withthe database structure.

  • 8/8/2019 Multidimensional Databases

    48/57

  • 8/8/2019 Multidimensional Databases

    49/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 49

    T h e Time Dimension

    TIME as a predefined hierarchy for rolling-upand drilling-down across days, weeks, months,years and special periods, such as fiscal years. Eliminates the effort required to build sophisticated

    hierarchies every time a database is set up.

    Extra performance advantages

    on ras ng e a ona o e an

  • 8/8/2019 Multidimensional Databases

    50/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 50

    on ras ng e a ona o e anMD Model

    Criteria Relational Model Multidimensional

    DatabasesFocus Data integrity of each piece

    of dataFacilitate expl oration of interrelati onships betweendimen sions

    Organizati on str ucture One-dimen sional array M ulti-dimen sional array sPer s pectives Embedded in f ields Embedded directly in

    MDDB str uctureComputational p ower for Query pr ocessing

    Joining table s of tenrequired; computationallyexpen sive

    Str ucture de signed for OLAP; computationallycheap

    Cognitive i ssues inquerying data

    Cumber some Int uitive

    Query Lang uages SQL or SQL f r ont-end s,such as QBE

    Point-and- click emp hasis No standardized lang uage

    Management of TimeDimen sion

    Not well suited Well suited

  • 8/8/2019 Multidimensional Databases

    51/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 51

    R DBMS vs. MDDB

    DO I still use RDBMS for my DW?MDDBs store data in hypercube, i.e.,multidimensional array

    RDBMS store data as tables with row andcolumns that do not map directly tomultidimensional view that user have of data

    EDW RDBMSData Marts - MDDB

  • 8/8/2019 Multidimensional Databases

    52/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 52

    R DBMS vs. MDDB: Trade-Offs

    SIZE MDDBs limited by sizeMid 1990s 10GB caused problemsToday 100GB is OK

    Large DWs are still better served by relationalfront-ends running against high performanceand scalable RDBMS

    VOLATILITY Highly volatile data are better handled by

    RDBMS MDDBs take long to load and update

  • 8/8/2019 Multidimensional Databases

    53/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 53

    R DBMS vs. MDDB: Trade-Offs

    AGGREGATE STRATEGY MDDBs support aggregates better RDBMSs are catching up with the help of

    Aggregate NavigatorsINVESTMENT PROTE CTION Most organizations already have made

    significant investments in relationaltechnology and skill sets

    Continued use for another purpose (DW)provides additional ROI and lowers technicalrisk of failure

    MDDBs need to acquire new software andtrain staff to use it

  • 8/8/2019 Multidimensional Databases

    54/57

  • 8/8/2019 Multidimensional Databases

    55/57

    O ctober 13, 2010 Dr. Navneet Goyal, BITS, Pilani 55

    INTEG RA TED AR CHITECTU R E

    DB vendors have integrated their multidimensional and relational databaseproductsMultidimensional Front-end tools

    If queries require data that are notavailable in MDDB, the tools retrieve thedata from the larger RDBKnown as DRILL-THROUGH

  • 8/8/2019 Multidimensional Databases

    56/57

    Q & A

  • 8/8/2019 Multidimensional Databases

    57/57