BI Dimensional Modeling

22
Dimensional Modeling Chapter 2

description

BI Dimensional Modeling

Transcript of BI Dimensional Modeling

  • 5/28/2018 BI Dimensional Modeling

    1/22

    Dimensional Modeling

    Chapter 2

  • 5/28/2018 BI Dimensional Modeling

    2/22

    The Dimensional Data ModelAn alternative to the normalized data

    model

    Present information as simply aspossible (easier to understand)

    Return queries as quickly as possible

    (efficient for queries) Track the underlying business processes

    (process focused)

  • 5/28/2018 BI Dimensional Modeling

    3/22

    The Dimensional Data Model Contains the same information as the

    normalized model

    Has far fewer tables Grouped in coherent business

    categories

    Pre-joins hierarchies and lookup tablesresulting in fewer join paths and fewerintermediate tables

    Normalized fact table with denormalized

    dimension tables.

  • 5/28/2018 BI Dimensional Modeling

    4/22

    GB Video E-R Diagram

    Customer

    #Cust No

    F Name

    L Name

    Ads1Ads2

    City

    StateZip

    Tel No

    CC No

    Expire

    Rental

    #Rental No

    Date

    Clerk No

    Pay TypeCC No

    Expire

    CC Approval

    Line

    #Line No

    Due Date

    Return DateOD charge

    Pay type

    Requestor

    of

    Owner of

    Video

    #Video NoOne-day fee

    Extra days

    Weekend

    Title

    #Title No

    Name

    Vendor No

    Cost

    Name for

    Holder of

  • 5/28/2018 BI Dimensional Modeling

    5/22

    Customer

    CustID

    Cust No

    F Name

    L Name

    Rental

    RentalID

    Rental No

    Clerk No

    Store

    Pay Type

    LineLineID

    OD Charge

    OneDayCharge

    ExtraDaysCharge

    WeekendCharge

    DaysReserved

    DaysOverdue

    CustID

    AddressIDRentalId

    VideoID

    TitleID

    RentalDateID

    DueDateID

    ReturnDateID

    Video

    VideoID

    Video No

    Title

    TitleID

    TitleNo

    Name

    Cost

    Vendor Name

    Rental DateRentalDateID

    SQLDate

    Day

    Week

    Quarter

    Holiday

    Due Date

    DueDateID

    SQLDate

    Day

    Week

    Quarter

    Holiday

    Return Date

    ReturnDateID

    SQLDate

    Day

    Week

    Quarter

    Holiday

    Address

    AddressID

    Adddress1

    Address2

    City

    State

    Zip

    AreaCode

    Phone

    GB Video Data Mart

  • 5/28/2018 BI Dimensional Modeling

    6/22

    Fact Table

    Measurements associated with a specific businessprocess

    Grain: level of detail of the table

    Process events produce fact records Facts (attributes) are usually

    Numeric Additive

    Derived facts included

    Foreign (surrogate) keys refer to dimension tables(entities)

    Classification values help define subsets

  • 5/28/2018 BI Dimensional Modeling

    7/22

    Dimension Tables

    Entities describing the objects of the process

    Conformed dimensions cross processes

    Attributes are descriptive

    Text Numeric Surrogate keys

    Less volatile than facts (1:m with the fact table)

    Null entries Date dimensions

    Produce by questions

  • 5/28/2018 BI Dimensional Modeling

    8/22

    Bus Architecture

    An architecture that permits aggregating

    data across multiple marts

    Conformed dimensions and attributes Drill Down vs. Drill Across

    Bus matrix

  • 5/28/2018 BI Dimensional Modeling

    9/22

    Keys and Surrogate Keys

    A surrogate key is a unique identifier for data

    warehouse records that replaces source

    primary keys (business/natural keys)

    Protect against changes in source systems

    Allow integration from multiple sources

    Enable rows that do not exist in source data

    Track changes over time (e.g. new customer

    instances when addresses change)

    Replace text keys with integers for efficiency

  • 5/28/2018 BI Dimensional Modeling

    10/22

    Slowly Changing Dimensions

    Attributes in a dimension that change moreslowly than the fact granularity

    Type 1: Current only

    Type 2: All history

    Type 3: Most recent few (rare)

    Note: rapidly changing dimensions usually

    indicate the presence of a business processthat should be tracked as a separatedimension or as a fact table

  • 5/28/2018 BI Dimensional Modeling

    11/22

    CustKey BKCustID CustName CommDist Gender HomOwn?

    1552 31421 Jane Rider 3 F N

    Date CustKey ProdKey Item Count Amount

    1/7/2004 1552 95 1 1,798.00

    3/2/2004 1552 37 1 27.95

    5/7/2005 1552 87 2 320.26

    2/21/2006 15522387 42 1 19.95

    Cust

    Key

    BKCust

    ID

    Cust

    Name

    Comm

    Dist

    Gender Hom

    Own?

    Eff End

    1552 31421 Jane Rider 3 F N 1/7/2004 1/1/2006

    2387 31421 Jane Rider 31 F N 1/2/2006 12/31/9999

    Fact Table

    Dimension with a slowly changing attribute

  • 5/28/2018 BI Dimensional Modeling

    12/22

    Date Dimensions

    One row for every day for which you expect to

    have data for the fact table (perhaps generated

    in a spreadsheet and imported)

    Usually use a meaningful integer surrogate key

    (such as yyyymmdd 20060926 for Sep. 26,

    2006). Note: this order sorts correctly.

    Include rows for missing or future dates to beadded later.

  • 5/28/2018 BI Dimensional Modeling

    13/22

    Degenerate Dimensions

    Dimensions without attributes. (Such as

    a transaction number or order number.)

    Put the attribute value into the fact tableeven though it is not an additive fact.

  • 5/28/2018 BI Dimensional Modeling

    14/22

    Snowflaking(Outrigger Dimensions or Reference Dimensions)

    Connects entities to dimension tables

    rather than the fact table

    Complicates coding and requiresadditional processing for retrievals

    Makes type 2 slowly changing

    dimensions harder to maintain Useful for seldom used lookups

  • 5/28/2018 BI Dimensional Modeling

    15/22

    M:N Multivalued Dimensions

    Fact to Dimension

    Dimension to Dimension

    Try to avoid these. Solutions can be

    very misleading.

  • 5/28/2018 BI Dimensional Modeling

    16/22

    Multivalued Dimensions

    ORDERS (FACT)

    SalesRepKey

    ProductKey

    SalesRepGrpKey

    CustomerKey

    OrderQty

    SALESREP

    SalesRepKey

    Name

    Address

    SALESREP-ORDER-BRIDGE

    SalesRepKey

    SalesrepGroupKeyWeight= (1/NumReps)

  • 5/28/2018 BI Dimensional Modeling

    17/22

    Hierarchies

    Group data within dimensions: SalesRep

    Region

    State County Neighborhood

    Problem structures

    Variable depth Frequently changing

  • 5/28/2018 BI Dimensional Modeling

    18/22

    Heterogeneous Products

    Several different kinds of entry with

    different attributes for each

    (The sub-class problem)

  • 5/28/2018 BI Dimensional Modeling

    19/22

    Aggregate Dimensions

    Dimensions that represent data at

    different levels of granularity

    Remove a dimension Roll up the hierarchy (provide a new shrunkendimensionwith new surr-key that represents

    rolled up data)

  • 5/28/2018 BI Dimensional Modeling

    20/22

    Junk Dimensions

    Miscellaneous attributes that dont

    belong to another entity, usually

    representing processing levels Flags Categories Types

  • 5/28/2018 BI Dimensional Modeling

    21/22

    Fact Tables

    Transaction Track processes at discrete points in time

    when they occur

    Periodic snapshot Cumulative performance over specific time

    intervals

    Accumulating snapshot Constantly updated over time. May includemultiple dates representing stages.

  • 5/28/2018 BI Dimensional Modeling

    22/22

    Aggregates

    Precalculated summary tables

    Improve performance

    Record data an coarser granularity