Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist...

54
Concepts: Data Warehouse Eric Tremblay Oracle Specialist [email protected] www.data-warehouse.ca

Transcript of Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist...

Page 2: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

ObjectiveDescribes the main steps in

the design of a data warehouse. Presents

techniques for it's use and challenges in it's

development.

Page 3: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Definition

Page 4: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Definition« A warehouse is a

subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision

making process »--- Bill Inmon

Page 5: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

The Goals of a Data Warehouse

Page 6: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

The Goals of a Data WarehouseAccess to company information

Coherent company information

A consistent, total and unified sight the company data

The data published is stored for fast consultation

Page 7: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Quality of the information in the data warehouse

The presentation tools to display the information is part of the data warehouse

Support business intelligence applications

The Goals of a Data Warehouse

Page 8: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Business IntelligenceConcept

Decisions

Analysis

Integration

Collection Data

Information

Knowledge

Action

Bu

sin

es

s V

alu

e

Page 9: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Dimensional

Modeling

Technical

Architecture

Design

End-User

Application

Specification

Physical

Design

Data Staging

Design

Maintenance

& Growth

Project

Planning &

Management

Deployment

Planning

Business

Requirements

Business Requirements

Page 10: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

How to build a Data Warehouse

Identify the problems and the business processes

Identify the grain

Identify the dimensions

Identify the facts

Page 11: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Data WarehouseLifecycle diagram

Project Planning

Business Requirements

Definition

Technical Architecture

Design

Dimensional Modeling

Product Selection & Installation

Physical Design

Data Staging &

DevelopmentDeployment

Maintenance and Growth

Analytic Application

Specification

Analytic Applicaiton

Development

Project Management

Page 12: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Elements of aData Warehouse

DATA

SOURCES

DATA QUERY

TOOLSDATA WAREHOUSE

Query Tools

Reporting

Analysis

Data Mining

Page 13: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Elements of aData Warehouse

Data

Staging

Area

Extract

Operational

Source

System

Data

Access

Tools

Load AccessData

Warehouse

Page 14: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Elements of aData Warehouse

Flow of information in a Data Warehouse

Data

Staging

Area

Extract

Operational

Source

System

Data

Access

Tools

Load AccessData

Warehouse

Page 15: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Extraction

StagingSource

Page 16: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Transformation

TransformationStaging Staging

Page 17: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Loading

Staging

Page 18: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Dimension & Fact Table

Page 19: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Dimension Table

Page 20: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Dimension TableSimple primary keyTextual attributes rich and adapted to the userHierarchical reportsFew codes; Few codes; codes should be decoded according to their descriptionsRelatively small

Page 21: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

DimensionClient

Location

Adresse

Région

Ville

Province

Clef d'entrepôt

Grain de dimension

3-Chiffres Code Postal

6-Chiffres Code Postal

Page 22: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

CalendarDimension

Day

CalendarWeek

CalendarQuarter

CalendarMonth

CalendarYear

FiscalQuarter

FiscalMonth

FiscalYear

Warehouse Key

Dimension Grain

Page 23: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Multidimensional ModelsStar & Snowflake Schema

Page 24: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Composed Primary key

The date/time is almost always a key

The facts are usually numerical

The facts are in general additive

Fact Table

Page 25: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Fact TableStar Schema

promotion_keycalendar_key (FK)product_key (FK)store_key (FK)customer_key (FK)dollars solddollars_costunits_sold

product_key (PK)SKUdescriptionbrandcategory

calendar_key (PK)day_of_weekmonthquarteryear

store_key (PK)store_IDstore_nameaddress

customer_key (PK)customer_IDcustomer_nameaddress

Sales Fact Table

Store Dimension Customer Dimension

Product DimensionCalendar Dimension

promotion_key (PK)promotion_name

Promotion Dimension

Page 26: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Fact TableStar Schema

promotion_keycalendar_key (FK)product_key (FK)store_key (FK)customer_key (FK)dollars solddollars_costunits_sold

product_key (PK)SKUdescriptionbrandcategory

calendar_key (PK)day_of_weekmonthquarteryear

store_key (PK)store_IDstore_nameaddress

customer_key (PK)customer_IDcustomer_nameaddress

Sales Fact Table

Store Dimension Customer Dimension

Product DimensionCalendar Dimension

promotion_key (PK)promotion_name

Promotion Dimension

Page 27: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

time_key (FK)student_key (FK)course_key (FK)teacher_key (FK)facility_key (FK)attendance = 1

student_key (PK)student_IDnameaddressmajorminorfirst_enrolledgraduation_class

time_key (PK)SQL_dateday_of_weekweek_numbermonth

facility_key (PK)typelocationdepartmentseatingsize

teacher_key (PK)employee_IDnameaddressdepartmenttitledegree

Student AttendanceFact Table

Facility Dimension Teacher Dimension

Student DimensionTime Dimension

course_key (PK)namedepartmentlevelcourse numberlaboratory_flag

Course Dimension

Fact TableStar Schema

Page 28: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

time_key (FK)student_key (FK)course_key (FK)teacher_key (FK)facility_key (FK)attendance = 1

student_key (PK)student_IDnameaddressmajorminorfirst_enrolledgraduation_class

time_key (PK)SQL_dateday_of_weekweek_numbermonth

facility_key (PK)typelocationdepartmentseatingsize

teacher_key (PK)employee_IDnameaddressdepartmenttitledegree

Student AttendanceFact Table

Facility Dimension Teacher Dimension

Student DimensionTime Dimension

course_key (PK)namedepartmentlevelcourse numberlaboratory_flag

Course Dimension

Fact TableStar Schema

Which classes were the most heavily attended?

Page 29: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

time_key (FK)student_key (FK)course_key (FK)teacher_key (FK)facility_key (FK)attendance = 1

student_key (PK)student_IDnameaddressmajorminorfirst_enrolledgraduation_class

time_key (PK)SQL_dateday_of_weekweek_numbermonth

facility_key (PK)typelocationdepartmentseatingsize

teacher_key (PK)employee_IDnameaddressdepartmenttitledegree

Student AttendanceFact Table

Facility Dimension Teacher Dimension

Student DimensionTime Dimension

course_key (PK)namedepartmentlevelcourse numberlaboratory_flag

Course Dimension

Fact TableStar Schema

Which classes were the most heavily attended?

Which classes were the most lightly used?

Page 30: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

time_key (FK)student_key (FK)course_key (FK)teacher_key (FK)facility_key (FK)attendance = 1

student_key (PK)student_IDnameaddressmajorminorfirst_enrolledgraduation_class

time_key (PK)SQL_dateday_of_weekweek_numbermonth

facility_key (PK)typelocationdepartmentseatingsize

teacher_key (PK)employee_IDnameaddressdepartmenttitledegree

Student AttendanceFact Table

Facility Dimension Teacher Dimension

Student DimensionTime Dimension

course_key (PK)namedepartmentlevelcourse numberlaboratory_flag

Course Dimension

Fact TableStar Schema

Which classes were the most heavily attended?

Which classes were the most lightly used?

Which teachers taught the most students?

Page 31: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

time_key (FK)student_key (FK)course_key (FK)teacher_key (FK)facility_key (FK)attendance = 1

student_key (PK)student_IDnameaddressmajorminorfirst_enrolledgraduation_class

time_key (PK)SQL_dateday_of_weekweek_numbermonth

facility_key (PK)typelocationdepartmentseatingsize

teacher_key (PK)employee_IDnameaddressdepartmenttitledegree

Student AttendanceFact Table

Facility Dimension Teacher Dimension

Student DimensionTime Dimension

course_key (PK)namedepartmentlevelcourse numberlaboratory_flag

Course Dimension

Fact TableStar Schema

Which classes were the most heavily attended?

Which classes were the most lightly used?

Which teachers taught the most students?

Which teachers taught classes in facilities belonging to other departments?

Page 32: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Fact TableSnowflake Schema

calendar_key (FK)product_key (FK)store_key (FK)customer_key (FK)dollars_solddollars_costunits_sold

product_key (PK)SKUdescription brand_key (FK) category

calendar_key (PK)month_key (FK)year

store_key (PK)store_IDstore_nomaddress

customer_key (PK)customer_IDcustomer_nameaddress

Sales Fact Table

Store Dimension Customer Dimension

Product DimensionCalendar Dimension

month_key (PK)year

year_key (PK)

brand_key (PK)brand_description

Page 33: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Slowly Changing Dimensions

Page 34: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Slowly Changing Dimensions

Type 1: Overwrite changed attribute.A fact is associated with only the current value of a dimension column.

Type 2: Add new dimension record.A fact is associated with only the original value of a dimension column.

Type 3: Use field for ʻoldʼ value.A fact is associated with both the original value and with the current value of a dimension column.

Page 35: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Slowly Changing Dimensions

Type 1: Overwrite changed attribute.A fact is associated with only the current value of a dimension column.

Type 2: Add new dimension record.A fact is associated with only the original value of a dimension column.

Type 3: Use field for ʻoldʼ value.A fact is associated with both the original value and with the current value of a dimension column.

Page 36: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Slowly Changing Dimensions

Type 1: Overwrite changed attribute.A fact is associated with only the current value of a dimension column.

Type 2: Add new dimension record.A fact is associated with only the original value of a dimension column.

Type 3: Use field for ʻoldʼ value.A fact is associated with both the original value and with the current value of a dimension column.

Page 37: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Scenario: Customer last name changes from Pharand to Smith: Update Cust.Lname

Slowly Changing Dimensions

Page 38: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Slowly Changing Dimensions

Page 39: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Slowly Changing Dimensions

Page 40: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Normalisation

Page 41: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

3NF

Normalisation

Page 42: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

3NF

Normalisation

Page 43: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Normalisation

Page 44: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Normalisation

Simplicity

Query Performance

Minor disk space saving

Slows down the users’ ability to browse

Page 45: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Normalisation

Simplicity

Query Performance

Minor disk space saving

Slows down the users’ ability to browse

Page 46: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Normalisation

Simplicity

Query Performance

Minor disk space saving

Slows down the users’ ability to browse

Page 47: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Normalisation

Simplicity

Query Performance

Minor disk space saving

Slows down the users’ ability to browse

Page 48: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

The DifferenceERD Star Schema

(Star Schema)

Page 49: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

The DifferenceData Base Data Warehouse

Actual Historical

Internal Internal and External

Isolated Integrated

Transactions Analysis

Normalised Dimensional

Dirty Clean and Consistent

Detailed Detailed and Summary

Page 50: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Enterprise reporting

Advanced analysis

Ad Hoc Query and Analysis

Page 51: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Elements of aData Warehouse

Extract

Filter

Cleanse

Transform

Load

Schedule

Portal

Department

Data

ERP Data

Transaction

Databases

Report

Display

Graph

Pivot

Drill

Distribute

Data

Mining

OLAP

DATA

SOURCES

DATA

PREPARATION

DATA ANALYSIS

TOOLS

DATA ACCESS

TOOLS

Page 52: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.

Ralph Kimball

Data Warehouse Guru

www.rkimball.com

Page 53: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.
Page 54: Concepts: Data Warehouse · Concepts: Data Warehouse Eric Tremblay Oracle Specialist eric.tremblay@data-warehouse.ca . Objective Describes the main steps in the design of a data warehouse.