Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact...
-
Upload
rayna-fackrell -
Category
Documents
-
view
242 -
download
2
Transcript of Populating Data Warehouse Structures Examining the Star Schema Dimension Tables Dimension Table Fact...
Populating Data Warehouse Structures
Examining the Star Schema
DimensionTables Dimension Table
Fact Table
Sales Star Schema
Implementing the Star Schema
1. Extract Data From Multiple Sources
2. Integrate, Transform, and Restructure Data
3. Load Data Into Dimension Tables and Fact Tables
The Star Schema Data Load
Northwind Northwind OLTPOLTP
Staging Area
Polaris Data Warehouse
Heterogeneous
Data Sources
ExternalExternalFilesFiles
External External FilesFiles
Internal Internal FilesFiles
Inventory Inventory StarStar
Sales Sales StarStar
Extracting Data From Extracting Data From Transforming Loading the Transforming Loading the Heterogeneous SourcesHeterogeneous Sources Data Data Star Schema Star Schema
DTSDTS
DTSDTS DTSDTSFinancial Financial
DTSDTS
Verifying the Dimension Source Data
Verifying Accuracy of Source Data Integrating data from multiple sources
Applying business rules
Checking structural requirements
Managing Invalid Data Rejecting invalid data
Saving invalid data to a log
Correcting Invalid Data Transforming data
Reassigning data values
Dimension Data Load Examples:
buyer_namebuyer_namebuyer_namebuyer_name
Barr, AdamBarr, Adam
Chai, SeanChai, Sean
O’Melia, ErinO’Melia, Erin
......
reg_idreg_idreg_idreg_id
22
44
66
......
buyer_firstbuyer_firstbuyer_firstbuyer_first
AdamAdam
SeanSean
ErinErin
......
buyer_lastbuyer_lastbuyer_lastbuyer_last
BarrBarr
ChaiChai
O’MeliaO’Melia
......
reg_idreg_idreg_idreg_id
22
44
66
......DTSDTS
buyer_codebuyer_codebuyer_codebuyer_code
A123A123
B456B456
......
buyer_lastbuyer_lastbuyer_lastbuyer_last
BarrBarr
ChaiChai
O’MeliaO’Melia
......
reg_idreg_idreg_idreg_id
22
44
66
......
buyer_codebuyer_codebuyer_codebuyer_code
U999U999
A123A123
B456B456
......
buyer_lastbuyer_lastbuyer_lastbuyer_last
BarrBarr
ChaiChai
O’MeliaO’Melia
......
reg_idreg_idreg_idreg_id
22
44
66
......
buyer_namebuyer_namebuyer_namebuyer_name
Barr, AdamBarr, Adam
Chai, SeanChai, Sean
Smith, JaneSmith, Jane
Paper, AnnePaper, Anne
reg_idreg_idreg_idreg_id
22
44
22
44
DTSDTS
DTSDTS
buyer_namebuyer_namebuyer_namebuyer_name
Barr, AdamBarr, Adam
Chai, SeanChai, Sean
reg_idreg_idreg_idreg_id
IIII
IVIV
buyer_namebuyer_namebuyer_namebuyer_name
Smith, JaneSmith, Jane
Paper, AnnePaper, Anne
reg_idreg_idreg_idreg_id
22
44
Maintaining Integrity of the Dimension
Assigning a Surrogate Key to Each Record
Defines the dimension’s primary key
Relates to the foreign key fields of the fact table
Loading One Record Per Application Key
Maintains uniqueness in the dimension
Depends on how you manage changing dimension data
Maintains integrity of the fact table
Managing Changing Dimension Data
Dimensions with Changing Column Values
Inserts of new data
Updates of existing data
Slowly-Changing Dimension Design Solutions
Type 1: Overwrite the dimension record
Type 2: Write another dimension record
Type 3: Add attributes to the dimension record
Type 1: Overwriting the Dimension Slide
Existing recordis changed
product keyproduct nameproduct sizeproduct packageproduct deptproduct catproduct subcat...
product keyproduct nameproduct sizeproduct packageproduct deptproduct catproduct subcat...
Product Dimension
001Rice Puffs10 oz.BagGroceryDry GoodsSnacks...
001Rice Puffs10 oz.BagGroceryDry GoodsSnacks...
Before After001Rice Puffs12 OzBagGroceryDry GoodsSnacks...
001Rice Puffs12 OzBagGroceryDry GoodsSnacks...
12 oz.
Type 2: Writing Another Dimension Record
Adds a new record
product keyproduct nameproduct sizeproduct packageproduct deptproduct catproduct subcateffective_date…
product keyproduct nameproduct sizeproduct packageproduct deptproduct catproduct subcateffective_date…
Product Dimension001Rice Puffs10 oz.BagGroceryDry GoodsSnacks05-01-1995...
001Rice Puffs10 oz.BagGroceryDry GoodsSnacks05-01-1995...
Before After001Rice Puffs10 OzBagGroceryDry GoodsSnacks05-01-1995...
001Rice Puffs10 OzBagGroceryDry GoodsSnacks05-01-1995...
10 oz. 12 oz.Rice Puffs12 OzBagGroceryDry GoodsSnacks10-15-1998...
Rice Puffs12 OzBagGroceryDry GoodsSnacks10-15-1998...
731
Type 3: Adding Attributes in the Dimension Record
Additional information is storedin an existing record
Product Dimension
product keyproduct nameproduct sizeproduct packageproduct deptproduct catproduct subcatcurrent product size dateprevious product sizeprevious product size date2nd previous product size2nd previous product size date...
product keyproduct nameproduct sizeproduct packageproduct deptproduct catproduct subcatcurrent product size dateprevious product sizeprevious product size date2nd previous product size2nd previous product size date...
product size
previous product sizeprevious product size date
Before001Rice Puffs10 OzBagGroceryDry GoodsSnacks05-01-199511 Oz03-20-1994(null)(null)...
001Rice Puffs10 OzBagGroceryDry GoodsSnacks05-01-199511 Oz03-20-1994(null)(null)...
10 oz.
11 oz.03-20-1994
After001Rice Puffs12 oz.BagGroceryDry GoodsSnacks10-15-199810 oz.05-01-199511 Oz03-20-1994...
001Rice Puffs12 oz.BagGroceryDry GoodsSnacks10-15-199810 oz.05-01-199511 Oz03-20-1994...
12 oz
10-15-1998
11 oz.03-20-1994
05-01-1995
Verifying the Fact Table Source Data
Verifying Accuracy of Source Data Integrating data from multiple sources
Applying business rules
Checking structural requirements
Managing Invalid Data Rejecting invalid data
Saving invalid data to a log
Correcting Invalid Data Transforming data
Reassigning data values
Assigning Foreign Keys
DimensionTables
DimensionTables
customer_dimcustomer_dimcustomer_dimcustomer_dim201 ALFI Alfreds201 ALFI Alfreds
product_dimproduct_dimproduct_dimproduct_dim 25 123 Chai 25 123 Chai
Source Data
customer idcustomer id
ALFI 123 1/1/2000 400
134 1/1/2000134 1/1/2000
time_dimtime_dimtime_dimtime_dim
product idproduct id order dateorder date quantity_salesquantity_sales amount_salesamount_sales
10,789123 1/1/2000 400 10,789
cust_keycust_key
123 1/1/2000 400
prod_keyprod_key time_keytime_key quantity_salesquantity_sales amount_salesamount_sales
25 134 400 10,789201
Sales Fact Data
Defining Measures
Loading Measures from the Source System
Calculating Additional Measures
Source System Data
Fact Table Data
customer_idcustomer_idcustomer_idcustomer_id
VINETVINET
ALFIALFI
HANARHANAR
......
product_idproduct_idproduct_idproduct_id
9GZ9GZ
1KJ1KJ
0ZA0ZA
......
pricepricepriceprice
.55.55
1.101.10
.98.98
......
qtyqtyqtyqty
3232
4848
99
......
customer_keycustomer_keycustomer_keycustomer_key
100100
238238
437437
......
product_keyproduct_keyproduct_keyproduct_key
512512
207207
338338
......
qtyqtyqtyqty
3232
4848
99
......
total_salestotal_salestotal_salestotal_sales
17.6017.60
52.8052.80
8.828.82
......
Maintaining Data Integrity
Adhering to the Fact Table Grain
A fact table can only have one grain
You must load a fact table with data at the same level of detail as defined by the grain
Enforcing Column Constraints
NOT NULL constraints
FOREIGN KEY constraints
Implementing Staging Tables
Centralize and Integrate Source Data
Break Up Complex Data Transformations
Facilitate Error Recovery
Staging Area sales_stagesales_stage
inventory_stageinventory_stage
market_stagemarket_stage
shipments_stageshipments_stage
DTS Functionality
Accessing Heterogeneous Data Sources
Importing, Exporting, and Transforming Data
Creating Reusable Transformations and Functions
Automating Data Loads
Managing Metadata
Customizing and Extending Functionality
Defining DTS Packages
Identifies Data Sources and Destinations
Defines Tasks or Actions
Implements Transformation Logic
Defines Order of Operations
Identifying Package Components
Connections Access Data Sources and Destinations
Tasks Describe Data Transformations or Functions
Steps Define the Order of Task Operations or Workflow
Global Variables Store Data that Can Be Shared Across Tasks
Creating Packages
Using the DTS Import / Export Wizard
Perform ad-hoc table and data transfers Develop a prototype package
Using DTS Package Designer
Edit packages created with the DTS Import/Export Wizard
Create packages with a wide range of functionality Programming DTS Applications
Directly access the functionality of the DTS Object Model Requires Microsoft Visual Basic or Microsoft Visual C++
Using DTS to Populate the Sales Star
Populating the Sales Star Dimensions
Populating the Sales Star Fact Table
Populating the Sales Star Dimensions
Product Product Tab DelimitedTab Delimited
FilesFiles
Northwind Northwind OLTPOLTP
DTSDTS
DTSDTS
time_dimtime_dim
customer_dimcustomer_dim
product_dimproduct_dim
SQL Server SQL Server Stored ProcedureStored Procedure
DTSDTS
Populating the Sales Star Fact Table
DTSDTS
sales_factDTSDTS
sales_stagesales_stage
time_dimtime_dimcustomer_dimcustomer_dim
product_dimproduct_dim sales_stagesales_stage
Sales DataSales DataFileFile
Designing Modular Packages
Creating Modular Packages
Simplify complex workflows Create more readable packages Produce smaller packages that are easier to debug
Using Outer Packages
Execute multiple packages within a single package Combine modular packages into logical workflows Reuse modular packages in different workflows Execute packages in parallel
Using DTS to Populate the Sales Star