Star Schema
-
Upload
api-26355935 -
Category
Documents
-
view
96 -
download
9
Transcript of Star Schema
04/08/23 Sudarshan 1
Review Today Star Schema
Fact table Dimensions Drilling Down &
Roll up Slicing & Dicing
Implementation techniques for OLAP Bit map indexes Join indexes File org.
ArchitectureArchitecture
CharacteristicsCharacteristics
Relational OLAPRelational OLAP
Multidimensional OLAPMultidimensional OLAP
ROLAP VS. MOLAPROLAP VS. MOLAP
04/08/23 Sudarshan 2
Star Schema is a relational database schema for representing multidimensional data.
It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables.
It is called a star schema because the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions.
The center of the star schema consists of a large fact table and it points towards the dimension tables.
The advantage of star schema are slicing down, performance increase and easy understanding of data.
What is Star Schema?
04/08/23 Sudarshan 3
Steps in designing Star Schema Identify a business process for
analysis(like sales). Identify measures or facts (sales dollar). Identify dimensions for facts(product
dimension, location dimension, time dimension, organization dimension).
List the columns that describe each dimension.(region name, branch name, region name).
Determine the lowest level of summary in a fact table(sales dollar).
04/08/23 Sudarshan 4
Important aspects of Star Schema & Snow Flake Schema In a star schema every dimension will have a
primary key. In a star schema, a dimension table will not have
any parent table. Whereas in a snow flake schema, a dimension
table will have one or more parent tables. Hierarchies for the dimensions are stored in the
dimensional table itself in star schema. Whereas hierarchies are broken into separate
tables in snow flake schema. These hierarchies helps to drill down the data from topmost hierarchies to the lowermost hierarchies.
04/08/23 Sudarshan 5
Fact Facts are numeric measurements
(values) that represent a specific business activity.
Example, sales figures are numeric measurements that represent product and/or service sales.
Facts are used in business data analysis, are units, cost, prices and revenues.
Facts are stored in a FACT table I.e. the center of the star schema.
04/08/23 Sudarshan 6
Fact Table
The centralized table in a star schema is called as FACT table, that contains facts and connected to dimensions. A fact table typically has two types of columns:
those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite
key that is made up of all of its foreign keys. A fact table might contain either detail level facts or
facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables). A fact table usually contains facts with the same level of aggregation.
04/08/23 Sudarshan 7
Many OLAP applications are based on a fact table
For example, a supermarket application might be based on a table
SalesSales (Market_Id, Product_Id, Time_Id, Sales_Amt)
The table can be viewed as multidimensional Market_Id, Product_Id, Time_Id are the dimensions
that represent specific supermarkets, products, and time intervals
Sales_Amt is a function of the other three
04/08/23 Sudarshan 8
Fact Table (Conclusion)
Central table mostly raw numeric items narrow rows, a few columns at most large number of rows (millions to a
billion) Access via dimensions
04/08/23 Sudarshan 9
Dimension
Qualifying characteristics that provide additional perspective to a given fact.
Example, sales might be compared by product from region to region and from one time period to the next.
Here sales have product, location and time dimensions.
Such dimensions are stored in DIMENSIONAL TABLE.
04/08/23 Sudarshan 10
Dimension Tables The dimensions of the fact table are further
described with dimension tables Fact table:
SalesSales (Market_id, Product_Id, Time_Id, Sales_Amt)
Dimension Tables: MarketMarket (Market_Id, City, State, Region) ProductProduct (Product_Id, Name, Category, Price) TimeTime (Time_Id, Week, Month, Quarter)
04/08/23 Sudarshan 11
Attributes Each dimension table contain
attributes. Used to search, filter and classify facts. Example, Sales, we can identify some
attributes for each dimension: Product Dimension: product ID,
description, product type Location Dimension: region, state, city. Time Dimension: year quarter, month,
week and date.
04/08/23 Sudarshan 12
Attributes hierarchy AH provides a top-down data
organization Used for aggregation and
drill-down/roll-up data analysis. Example, location dimension attributes
can be organized in a hierarchy by region, state and city.
AH provides the capability to perform drill-down and roll-up searches.
Allows the DW and OLAP systems to to have defined path.
04/08/23 Sudarshan 13
A Concept Hierarchy: Dimension (location)
all
Europe North_America
MexicoCanadaSpainGermany
Vancouver
M. WindL. Chan
...
......
... ...
...
all
region
office
country
TorontoFrankfurtcity
04/08/23 Sudarshan 14
Multidimensional Data
Sales volume as a function of product, month, and region
Pro
duct
Regio
n
Month
Dimensions: Product, Location, TimeHierarchical summarization paths
Industry Region Year
Category Country Quarter
Product City Month Week
Office Day
04/08/23 Sudarshan 15
A Sample Data Cube
Total annual salesof TV in U.S.A.Date
Produ
ct
Cou
ntr
ysum
sum TV
VCRPC
1Qtr 2Qtr 3Qtr 4Qtr
U.S.A
Canada
Mexico
sum
04/08/23 Sudarshan 16
Star Schema A single fact table and for each dimension
one dimension table Does not capture hierarchies directly
T ime
prod
cust
city
fact
date, custno, prodno, cityname, ...
04/08/23 Sudarshan 17
Example of Star Schema
Example of Star Schema
Example of Star Schema: Figure 1.6
04/08/23 Sudarshan 18
In the example, sales fact table is connected to dimensions location, product, time and organization. It shows that data can be sliced across all dimensions and again it is possible for the data to be aggregated across multiple dimensions. "Sales dollar" in sales fact table can be calculated across all dimensions independently or in a combined manner which is explained below.
Sales dollar value for a particular product Sales dollar value for a product in a location Sales dollar value for a product in a year within a location Sales dollar value for a product in a year within a location sold or serviced by an employee
04/08/23 Sudarshan 19
Example of Star Schema
time_keydayday_of_the_weekmonthquarteryear
time
location_keystreetcityprovince_or_streetcountry
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_keyitem_namebrandtypesupplier_type
item
branch_keybranch_namebranch_type
branch
04/08/23 Sudarshan 20
Aggregation Many OLAP queries involve aggregation of the data
in the fact table
For example, to find the total sales (over time) of each product in each market, we might use
SELECT S.Market_Id, S.Product_Id, SUM (S.Sales_Amt) FROM SalesSales S GROUP BY S.Market_Id, S.Product_Id
The aggregation is over the entire time dimension and thus produces a two-dimensional view of the data
04/08/23 Sudarshan 21
Aggregation Over Time
The output of the previous query
SUM(Sales_Amt)
M1 M2 M3 M4
P1 3003 1503 …
P2 6003 2402 …
P3 4503 3 …
P4 7503 7000 …
P5 … … …
Market_Id
Product_Id
04/08/23 Sudarshan 22
Typical OLAP Operations
Roll up (drill-up): summarize data by climbing up hierarchy or by dimension reduction
Drill down (roll down): reverse of roll-up from higher level summary to lower level summary or
detailed data, or introducing new dimensions Slice and dice:
project and select Pivot (rotate):
reorient the cube, visualization, 3D to series of 2D planes. Other operations
drill across: involving (across) more than one fact table drill through: through the bottom level of the cube to its
back-end relational tables (using SQL)
04/08/23 Sudarshan 23
Drilling Down and Rolling Up
Some dimension tables form an aggregation hierarchy Market_Id City State Region
Executing a series of queries that moves down a hierarchy (e.g., from aggregation over regions to that over states) is called drilling down Requires the use of the fact table or
information more specific than the requested aggregation (e.g., cities)
Executing a series of queries that moves up the hierarchy (e.g., from states to regions) is called rolling up
04/08/23 Sudarshan 24
Drilling down on market: from Region to StateSalesSales (Market_Id, Product_Id, Time_Id, Sales_Amt)
MarketMarket (Market_Id, City, State, Region)
1. SELECT S.Product_Id, M.Region, SUM
(S.Sales_Amt) FROM SalesSales S, MarketMarket M WHERE M.Market_Id = S.Market_Id GROUP BY S.Product_Id, M.Region
2. SELECT S.Product_Id, M.State, SUM (S.Sales_Amt) FROM SalesSales S, MarketMarket M WHERE M.Market_Id = S.Market_Id GROUP BY S.Product_Id, M.State,
Drilling Down
04/08/23 Sudarshan 25
Rolling Up Rolling up on market, from State to Region
If we have already created a table, State_SalesState_Sales, using
1. SELECT S.Product_Id, M.State, SUM (S.Sales_Amt)
FROM Sales Sales S, MarketMarket M WHERE M.Market_Id = S.Market_Id GROUP BY S.Product_Id, M.State
then we can roll up from there to:
22. SELECT T.Product_Id, M.Region, SUM (T.Sales_Amt)
FROM State_SalesState_Sales T, MarketMarket M WHERE M.State = T.State GROUP BY T.Product_Id, M.Region
04/08/23 Sudarshan 26
Roll-up and Drill Down
Sales Channel Region Country State Location Address Sales
Representative
Roll
Up
Higher Level ofAggregation
Low-levelDetails
Drill-D
ow
n
04/08/23 Sudarshan 27
“Slicing and Dicing”
Product
Sales Channel
Regio
ns
Retail Direct Special
Household
Telecomm
Video
Audio IndiaFar East
Europe
The Telecomm Slice
04/08/23 Sudarshan 28
Snowflake Schema A snowflake schema is a term that
describes a star schema structure normalized through the use of outrigger tables. i.e dimension table hierarchies are broken into simpler tables. In star schema example we had 4 dimensions like location, product, time, organization and a fact table (sales)
04/08/23 Sudarshan 29
Snowflake schema Represent dimensional hierarchy
directly by normalizing tables. Easy to maintain and saves storage
T ime
prod
cust
city
fact
date, custno, prodno, cityname, ...
region
04/08/23 Sudarshan 30
Example of Snowflake Schema
04/08/23 Sudarshan 31
Example of Snowflake Schema
time_keydayday_of_the_weekmonthquarteryear
time
location_keystreetcity_key
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_keyitem_namebrandtypesupplier_key
item
branch_keybranch_namebranch_type
branch
supplier_keysupplier_type
supplier
city_keycityprovince_or_streetcountry
city
04/08/23 Sudarshan 32
Indexing Techniques Exploiting indexes to reduce
scanning of data is of crucial importance
Bitmap Indexes Join Indexes Other Issues
Text indexing Parallelizing and sequencing of
index builds and incremental updates
04/08/23 Sudarshan 33
Indexing Techniques Bitmap index:
Index on a particular column
Each value in the column has a bit vector: bit-op is fast
The length of the bit vector: # of records in the base table
The i-th bit is set if the i-th row of the base table has the value for the indexed column
not suitable for high cardinality domains
04/08/23 Sudarshan 34
BitMap Indexes Example: the attribute sex has values M and F.
A table of 100 million people needs 2 lists of 100 million bits
04/08/23 Sudarshan 35
Customer Query : select * from customer wheregender = ‘F’ and vote = ‘Y’
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
Bitmap Index
M
F
F
F
F
M
Y
Y
Y
N
N
N
04/08/23 Sudarshan 36
Bit Map Index
Cust Region RatingC1 N HC2 S MC3 W LC4 W HC5 S LC6 W LC7 N H
Base Base TableTable
Row ID N S E W1 1 0 0 02 0 1 0 03 0 0 0 14 0 0 0 15 0 1 0 06 0 0 0 17 1 0 0 0
Row ID H M L1 1 0 02 0 1 03 0 0 04 0 0 05 0 1 06 0 0 07 1 0 0
Rating IndexRating Index
Region Region IndexIndex
Customers whereCustomers where Region = WRegion = W Rating = MRating = MAndAnd
04/08/23 Sudarshan 37
BitMap Indexes Comparison, join and aggregation operations
are reduced to bit arithmetic with dramatic improvement in processing time
Significant reduction in space and I/O (30:1) Adapted for higher cardinality domains as well. Compression (e.g., run-length encoding)
exploited Products that support bitmaps: Model 204,
TargetIndex (Redbrick), IQ (Sybase), Oracle 7.3
04/08/23 Sudarshan 38
Join Indexes Pre-computed joins A join index between a fact table and a dimension
table correlates a dimension tuple with the fact tuples that have the same value on the common dimensional attribute e.g., a join index on city dimension of calls fact table correlates for each city the calls (in the calls table) from
that city
04/08/23 Sudarshan 39
Join Indexes
Join indexes can also span multiple dimension tables e.g., a join index on city and time
dimension of calls fact table
04/08/23 Sudarshan 40
Star Join Processing Use join indexes to join dimension and fact
table
CallsC+T
C+T+L
C+T+L+P
Time
Loca-tion
Plan
04/08/23 Sudarshan 41
Bitmapped Join Processing
AND
Time
Loca-tion
Plan
Calls
Calls
Calls
Bitmaps101
001
110
04/08/23 Sudarshan 42
Nigel Pendse, Richard Creath - The OLAP ReportNigel Pendse, Richard Creath - The OLAP Report
OLAP Is FASMI
Fast Analysis Shared Multidimensional Information
04/08/23 Sudarshan 43
Warehouse Products Computer Associates -- CA-Ingres Hewlett-Packard -- Allbase/SQL Informix -- Informix, Informix XPS Microsoft -- SQL Server Oracle -- Oracle7, Oracle Parallel Server Red Brick -- Red Brick Warehouse SAS Institute -- SAS Software AG -- ADABAS Sybase -- SQL Server, IQ, MPP
04/08/23 Sudarshan 44
Warehouse Server Products
Oracle 8 Informix
Online Dynamic Server XPS --Extended Parallel Server Universal Server for object relational
applications Sybase
Adaptive Server 11.5 Sybase MPP Sybase IQ
04/08/23 Sudarshan 45
Warehouse Server Products
Red Brick Warehouse Tandem Nonstop IBM
DB2 MVS Universal Server DB2 400
Teradata