IST722 Data Warehousing An Introduction to Data Warehousing Michael A. Fudge, Jr.
Introduction to Data Warehousing
-
Upload
tameem-ahmad -
Category
Documents
-
view
911 -
download
0
description
Transcript of Introduction to Data Warehousing
Introduction to Data Warehousing
December 20, 2012
Tameem AhmadM.Tech. (F)ZHCET, AMU, Aligarh
04/10/2023 Tameem Ahmad 2
References:
• “Building Data Warehouse” by Inmon (Third Edition), New York: John Wiley & Sons, (2002)
• “Data Mining: Concepts and Techniques” by Han,Kamber. 2000
• http://www.data-warehouse-online.com/ [Accessed: November 4, 2012]
• Data Warehousing Battle of the Giants: Comparing the Basics of the Kimball and Inmon Models: by Mary Breslin
http://www.bibestpractices.com/view-articles/4768
04/10/2023 Tameem Ahmad 3
Plan for the Presentation
• Necessity of Data Warehousing. (Why it is needed?)• What is Data Warehousing?• Architecture• Schema• How to build Data Warehouse (components)• Data Warehousing Tools
04/10/2023 Tameem Ahmad 4
Necessity is the mother of invention…
Why Data Warehouse?
? ? ? ?
04/10/2023 Tameem Ahmad 5
Scenario
• ABC Pvt Ltd is a company with branches at Mumbai, Delhi, Chennai and Banglore. The Sales Manager wants quarterly sales report. Each branch has a separate operational system.
04/10/2023 Tameem Ahmad 6
Scenarion: ABC Pvt. Ltd.
6
Mumbai
Delhi
Chennai
Banglore
SalesManager
Sales per item type per branchfor first quarter.
04/10/2023 Tameem Ahmad 7
Solution: ABC Pvt. Ltd.
Extract sales information from each database.
Store the information in a common repository at a single site.
04/10/2023 Tameem Ahmad 8
Solution: ABC Pvt. Ltd.
Mumbai
Delhi
Chennai
Banglore
DataWarehouse
SalesManager
Query &Analysis tools
Report
04/10/2023 Tameem Ahmad 9
Data Warehousing…
• DefinitionA data warehouse is» -subject-oriented,» -integrated,» -time-variant,» -nonvolatile
collection of data in support of management’s decision making process.
04/10/2023 Tameem Ahmad 10
Subject-oriented
• Data warehouse is organized around subjects such as sales, product, customer.
• It focuses on modeling and analysis of data for decision makers.
• Excludes data not useful in decision support process.
04/10/2023 Tameem Ahmad 11
Integration
• Data Warehouse is constructed by integrating multiple heterogeneous sources.
• Data Preprocessing are applied to ensure consistency.
RDBMS
LegacySystem
DataWarehouse
Flat File Data ProcessingData Transformation
04/10/2023 Tameem Ahmad 12
Time-variant
• Provides information from historical perspective e.g. past 5-10 years
04/10/2023 Tameem Ahmad 13
Nonvolatile
• Data once recorded cannot be updated.• Data warehouse requires two operations
in data accessing– Initial loading of data– Access of data
load access
04/10/2023 Tameem Ahmad 14
Data Warehousing Architecture
04/10/2023 Tameem Ahmad 15
Data Warehousing Architecture (Contt…)
• Data Warehouse server• almost always a relational DBMS, rarely flat files
• OLAP servers• to support and operate on multi-dimensional data
structures• Clients
• Query and reporting tools• Analysis tools• Data mining tools
04/10/2023 Tameem Ahmad 16
Data Warehousing Schema
• Star Schema• Snowflake Schema
04/10/2023 Tameem Ahmad 17
Measures & Dimensions
• Measure – Units sold, Amount.
• Dimensions – Product, Time, Region
04/10/2023 Tameem Ahmad 18
Star Schema
• A single, large and central fact table and one table for each dimension.
• Every fact points to one tuple in each of the dimensions and has additional attributes.
• Does not capture hierarchies directly.
04/10/2023 19
Star Schema (Contt…)
Store Key
Product Key
Period Key
Units
Price
Store Dimension
Time Dimension
Product Dimension
Fact Table
Store Key
Store Name
City
State
Region
Period Key
Year
Quarter
Month
Product Key
Product Desc
Benefits: Easy to understand, easy to define hierarchies, reduces no. of physical joins.
Tameem Ahmad
04/10/2023 Tameem Ahmad 20
Snowflake Schema
• Variant of star schema model.• A single, large and central fact table and one or more tables
for each dimension.• Dimension tables are normalized i.e. split dimension table
data into additional tables
04/10/2023 Tameem Ahmad 21
Snowflake Schema (Contt…)
Store Key
Product Key
Period Key
Units
Price
Time Dimension
Product Dimension
Fact Table
Store Key
Store Name
City Key
Period Key
Year
Quarter
Month
Product Key
Product Desc
City Key
City
State
Region
City Dimension
Store Dimension
Drawbacks: Time consuming joins,report generation slow
04/10/2023 22
Building the Data Warehouse
• Data Selection
• Data Pre-processing
– Fill missing values
– Remove inconsistency
• Data Transformation & Integration
• Data Loading
Data in warehouse is stored in form of fact tables and dimension tables.
Tameem Ahmad
04/10/2023 Tameem Ahmad 23
Data Warehousing Tools
• Data Warehouse– SQL Server 2000 DTS– Oracle 8i Warehouse Builder
• ETL tools– Ab Initio– Informatica
• OLAP tools– SQL Server Analysis
Services– Oracle Express Server
• Reporting tools− MS Excel Pivot Chart− VB Applications− cognos, − Microstrategy,
− Hyperion
Thank You