Data Warehouse Implementation 1. Agenda Review Development Approach Review Dimensional Modeling ...

Post on 22-Dec-2015

229 views 0 download

Transcript of Data Warehouse Implementation 1. Agenda Review Development Approach Review Dimensional Modeling ...

1

MIS 4346/5346 DATA WAREHOUSING

Data Warehouse Implementation

2

Agenda

Review Development Approach Review Dimensional Modeling Implementing the Data Warehouse

with SQL Server Enterprise Edition Implementing Data Mart Physical

Structures Creating the data mart database Creating dimension tables Creating fact tables Using scripts

3

DW Development Approach: Kimball Methodology

DW Project Lifecycle

Business requirements Business Requirements Documentation Bus Matrix

Design, build and deliver in increments DW Architecture DW Design ETL system Cube, Reports, query tools, …

4

Review: Dimensional Modeling

5

Dimensional Model: Revisited

6

Data Warehouse Project Lifecycle

Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.

7

IT Architecture/Infrastructure Physical Design*: SQL Server Enterprise Edition

SQL ServerDatabase Engine

* Specifically Product Selection & Installation

8

Data Warehouse Project Lifecycle

Source: Mundy, Thornthwaite, and Kimball (2006). The Microsoft Data Warehouse Toolkit, Wiley Publishing Inc., Indianapolis, IN.

9

DW/DM Implementation: Building the Data Mart Database Typically one database per data mart Example:

USE MASTER

CREATE DATABASE ClassPerformanceDW;

GO

ALTER DATABASE ClassPerformanceDW SET RECOVERY SIMPLE

GO

10

Creating Dimension Tables Naming is typically DimTableName Consider data compression Example:

CREATE TABLE DimStudent(student_sk int identity(1,1),student_id varchar(9),firstname varchar(30),lastname varchar(30),major varchar(7),classification varchar(25),gpa numeric(2, 1),clubname varchar(25),undergradschool varchar(25),gmat int,undergradORgrad varchar(10),

CONSTRAINT dimstudent_pk PRIMARY KEY (student_sk)); GO

CREATE INDEX student_id_idx on DimStudent (student_id);GO

ALTER TABLE DimStudent REBUILD WITH (DATA_COMPRESSION = PAGE); GO

GRANT SELECT ON DimStudent TO PUBLIC; GO

See http://blog.sqlauthority.com/2010/03/01/sql-server-data-and-page-compressions-data-storage-and-io-improvement/

OR http://sqlmag.com/database-performance-tuning/practical-data-compression-sql-server

11

Creating Fact Tables Naming typically FactTableName Example:

CREATE TABLE fact_enrollment(student_sk int,class_sk int,date_sk int,professor_sk int,

location_sk int, termyear_sk int,

coursegrade numeric(2, 1), CONSTRAINT fact_enrollment_pk PRIMARY KEY (student_sk, class_sk, date_sk, professor_sk), CONSTRAINT fact_enrollment_student_fk FOREIGN KEY (student_sk) REFERENCES dimstudent(student_sk), CONSTRAINT fact_enrollment_class_fk FOREIGN KEY(class_sk) REFERENCES dimclass (class_sk), CONSTRAINT fact_enrollment_date_fk FOREIGN KEY(date_sk) REFERENCES dimtime (date_sk), CONSTRAINT fact_enrollment_professor_fk FOREIGN KEY(professor_sk) REFERENCES dimprofessor

(professor_sk), CONSTRAINT fact_enrollment_location_fk FOREIGN KEY(location_sk) REFERENCES dimlocation

(location_sk), CONSTRAINT fact_enrollment_termyear_fk FOREIGN KEY(termyear_sk) REFERENCES dimtermyear

(termyear_sk), );

GO

GRANT SELECT ON factenrollment TO PUBLIC;

GO

12

Using Scripts

Contains all statements to create data mart tables

Advantages: Can easily create test environments Can easily create production tables Fewer files to manage Code reuse

13

Example Script “Design”

CREATE Script Contains CREATEs for all tables

TRANSFORM/LOAD Script (next topic) Calls individual transform/load scripts

One for each table Cleanup

Clear and shrink the log file

Example:http://business.baylor.edu/gina_green/teaching/sqlserver/scripts/generate_class_performance_dw_tables/generate_class_performance_dw_tables.zip

14

Summary

Physical Design: Infrastructure and DW

Creating and Naming: Database Dimension tables Fact tables

Considerations when creating above objects

Using scripts