® IBM Software Group © 2009 IBM Corporation A First Look at Materialized Query Table (MQT) in DB2...
-
Upload
maria-barrow -
Category
Documents
-
view
223 -
download
3
Transcript of ® IBM Software Group © 2009 IBM Corporation A First Look at Materialized Query Table (MQT) in DB2...
®
IBM Software Group
© 2009 IBM Corporation
A First Look at Materialized Query Table (MQT) in DB2 LUW
Petrus Chan, IBM Toronto Lab, [email protected]
IBM Software Group | DB2 information management software
2
Agenda
Motivation Sample scenario
How MQT can help
Different types of MQTs & Maintenance
MQT Matching – what queries can benefit from MQTs
Design Considerations
Other Uses of MQT
IBM Software Group | DB2 information management software
3
Database Schema for Our Sample Scenario
Product_idStore_idDate_id
AmountQuantity…
SALESStore_id
Region_id…
STORE
Date_id
Month_idQuarter_idYear_id
TIME
Product_id
Class_idGroup_idFamily_idLine_idDivision_id…
PRODUCT
Star Schema
IBM Software Group | DB2 information management software
4
Dimensions Hierarchy
Store
Month
Product
Class
Group
Family
Line
Division
Quarter
Year
Region
Sales Fact
Product Dimension
Time Dimension
Store Dimension
Level 5
Level 1
Level 2
Level 3
Level 4
Level 0Day
IBM Software Group | DB2 information management software
5
Sample Query
Q9: Total sales of each product group at each store for December 2008.
SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id AND
f.product_id = p.product_id AND
t.month_id = 200812
GROUP BY t.month_id, f.store_id, p.group_id
IBM Software Group | DB2 information management software
6
Benefits of MQTs
Fact
GB
JOIN
Dim2
Dim1
JOIN
Q9Month=200812
Fact
GB
JOIN
Dim2
Dim1
JOIN
Joe’s QueryGroup=Laser Printer
Fact
GB
JOIN
Dim2
Dim1
JOIN
Bob’s QueryStore=McLean Fact
GB
JOIN
Dim2
Dim1
JOIN
MQT
Q9
Bob’s QJoe’s Q
Compute Once & Reuse
Month=200812
Store=McLeanGroup=Laser Printer
IBM Software Group | DB2 information management software
7
What is an MQT?
MQT stands for Materialized Query Table.
Basic Idea: Query results caching - pre-compute and store (in database tables)
the results of frequently issued queries Query optimizer automatically and transparently replace portions of
future queries with direct references to the pre-computed (i.e., materialized) results. This is referred to as MQT routing or matching.
Standard technique in data warehousing, also known as aggregate tables and aggregate (re-)routing, or alternatively, as materialized views.
IBM Software Group | DB2 information management software
8
Usages of MQTs – At a High Level
Aggregate data over one or more dimensions
Joins and aggregate data over a group of tables
Data from a commonly accessed subset of data, that is, from a "hot" horizontal or vertical database partition
Repartitioned data from a table, or part of a table, in a partitioned database environment
IBM Software Group | DB2 information management software
9
How query can be routed to MQT?MQT_SalesByMthStrGrp
SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id
AND f.product_id = p.product_id
GROUP BY t.month_id, f.store_id,
p.group_idQ9: Original Query
SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id AND
f.product_id = p.product_id
AND t.month_id = 200812
GROUP BY t.month_id, f.store_id, p.group_id
Q9: Rewritten Query
SELECT m.month_id,
m.store_id,
m.group_id,
m.total_sales
FROM MQT_SalesByMthStrGrp m
WHERE m.month_id = 200812
Route Query to MQT
IBM Software Group | DB2 information management software
10
MQT Creation
MQTs are created using the CREATE TABLE statement
Immediately after creation, the MQT is empty and not accessible. It is populated using the REFRESH TABLE statement
After running the REFRESH, the MQT becomes accessible
CREATE TABLE MQT_SalesByMthStrGrp AS(SELECT t.month_id, f.store_id, p.group_id, sum(amount) as total_sales FROM sales f, product p, time t WHERE f.date_id = t.date_id AND f.product_id = p.product_idGROUP BY t.month_id, f.store_id, p.group_id)DATA INITIALLY DEFERREDREFRESH DEFERRED;
REFRESH TABLE MQT_SalesByMthStrGrp;
RUNSTATS ON TABLE petrus.MQT_SalesByMthStrGrpWITH DISTRIBUTION AND DETAILED INDEXES ALL;
-- Confirm that MQT is being used by-- (a) setting CURRENT REFRESH AGE to ANY-- (b) explaining the query used as-- the MQT’s definition
IBM Software Group | DB2 information management software
11
Agenda
Motivation Sample scenario
How MQT can help
Types of MQTs & Maintenance
MQT Matching – what queries can benefit from MQTs
Design Considerations
Other Uses of MQT
IBM Software Group | DB2 information management software
12
Types of MQTs
MQT
System Maintained
User Maintained
Refresh IMMEDIATE
Refresh DEFERRED
WITH staging tableWITHOUT staging table
IBM Software Group | DB2 information management software
13
MQT Maintenance
Process of synchronizing the MQT data with changes to the underlying tables Incremental Maintenance
Immediate Propagate Immediate Applied (IPIA)
Full Maintenance
Hybrid
Immediate Propagate Deferred Applied (IPDA)
IBM Software Group | DB2 information management software
14
Incremental Maintenance
MQT
basetablesbase
tablesbase
tablesBase
Tables
Delta select/join
DeltaAggregate
Immediate Propagate Immediate Apply
Pair Delta to MQT
Combine old and new values
Primary-key:
grouping columns
the ability to maintain the MQT without the need to access the modified base table, only using the delta information (newly inserted/updated/deleted data) and the MQT itself.
REFRESH IMMEDIDATE MQTs are maintained incrementally during: INSERT/UPDATE/DELETE of base
table processing REFRESH TABLE processing after
LOAD INSERT into the base table. Immediate Propagate, Immediate Apply For performance reasons, only
incrementally maintainable MQTs can be defined as REFRESH IMMEDIATE (hence more restrictions in MQT definition)
IBM Software Group | DB2 information management software
15
Full Maintenance
MQTs are fully rebuilt by re-evaluating the query in the MQT definition
REFRESH DEFERRED MQTs (without staging table):
not maintained during INSERT/UPDATE/DELETE operations on the base table.
synchronized with base table using the REFRESH TABLE statement
MQT
basetablesbase
tablesbase
tablesBase
Tables
select/join
Aggregate
IBM Software Group | DB2 information management software
16
Hybrid - Immediate Propagate, Deferred Applied
AST
basetablesbase
tablesbase
tables
BaseTables
Delta select/join
DeltaAggregate
Immediate Propagate Deferred Apply
Pair Delta to AST
Combine old and new values
Primary-key:
grouping columns
StagingStagingTables
Net-effect
REFRESH DEFERRED MQTs with staging table Delta changes to MQT are
computed upon INSERT/UPDATE/DELETE of base tables
Delta aggregate is recorded in a separate staging table
Subsequent REFRESH of MQT can be done incrementally
Immediate Propagate, Deferred Apply
One staging table per MQTCREATE TABLE SALES_BY_YEAR_STG
FOR SALES_BY_YEAR
PROPAGATE IMMEDIATE;
IBM Software Group | DB2 information management software
17
User Maintained MQT
User is responsible for maintaining MQT content
REFRESH TABLE disallowed
Insert/update/delete on MQT allowed
Must be REFRESH DEFERRED
Scenarios where user maintained MQTs are For efficiency reasons, when the user is convinced that (s)he can implement
MQT maintenance far more efficiently than the mechanisms used by DB2.
For leveraging existing “user maintained” summaries, where the user wants DB2 to automatically consider them for optimization for new ad hoc queries being executed against the base tables.
IBM Software Group | DB2 information management software
18
Refresh Considerations
Refresh Immediate Refresh Deferred (WITH staging
table)
Refresh Deferred (NO staging table)
Latency of Data Always in sync Up to last REFRESH
Up to last REFRESH
Transaction Overhread
upon base table modification
Yes Yes No
Incremental Refresh
No need, except after LOAD
INSERT
Yes No
Lock Contention during base table
modifications
Potential Less No
IBM Software Group | DB2 information management software
19
Agenda
Motivation Sample scenario
How MQT can help
Types of MQTs & Maintenance
MQT Matching – what queries can benefit from MQTs
Design Considerations
Other Uses of MQT
IBM Software Group | DB2 information management software
20
MQT Matching Process to automatically use
the MQT to derive results for a query for improved performance
MQT_SalesByMthStrGrp
SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id
AND f.product_id = p.product_id
GROUP BY t.month_id, s.store_id,
p.group_idQ9: Original Query
SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id AND
f.product_id = p.product_id
AND t.month_id = 200812
GROUP BY t.month_id, f.store_id, p.group_id
Q9: Rewritten Query
SELECT m.month_id,
m.store_id,
m.group_id,
m.total_sales
FROM MQT_SalesByMthStrGrp m
WHERE m.month_id = 200812
Route Query to MQT
IBM Software Group | DB2 information management software
21
What queries can benefit from an MQT?
Queries
that matches an MQT perfectly!
with predicates that are a subset of those in MQT
with more table joins than MQT (re-join)
with fewer table joins than MQT (need RI constraint)
with columns not in MQT definition (need functional dependencies)
at a higher aggregation level than an MQT
IBM Software Group | DB2 information management software
22
Query with predicates that are a subset of those in MQT
Fact
GB
JOIN
Dim2
Dim1
JOIN
Q9Month=200812
Fact
GB
JOIN
Dim2
Dim1
JOIN
Joe’s QueryGroup=Laser Printer
Fact
GB
JOIN
Dim2
Dim1
JOIN
Bob’s QueryStore=McLean Fact
GB
JOIN
Dim2
Dim1
JOIN
MQT
Q9
Bob’s QJoe’s Q
Compute Once & Reuse
Month=200812
Store=McLeanGroup=Laser Printer
IBM Software Group | DB2 information management software
23
Query with more table joins than MQT
MQT:
Original Query:
Sales Product
Time
GB
Sales Product
Time
Store
GB
If the extra table join is through a column
in group-by list, match and re-join with tables not in
MQT
RewrittenQuery:
MQT Store
GB
Re-join
IBM Software Group | DB2 information management software
24
Query with more table joins than MQT (Example)
MQT_SalesByMthStrGrp
SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id
AND f.product_id = p.product_id
GROUP BY t.month_id, s.store_id,
p.group_id
Original Query
SELECT t.month_id,
s.region_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t,
store s
WHERE f.date_id = t.date_id AND
f.product_id = p.product_id
AND f.store_id = s.store_id
GROUP BY t.month_id, s.region_id,
p.group_id
Rewritten Query
SELECT m.month_id,
s.region_id,
m.group_id,
sum(m.total_sales)
FROM MQT_SalesByMthStrGrp m,
store s
WHERE m.stored_id = s.store_id
GROUP BY m.month_id, s.region_id,
m.group_id
Route Query to MQT
IBM Software Group | DB2 information management software
25
Queries with fewer joins than MQT In general, we can’t route queries to an MQT that has more table joins because
the extra joins affects the MQT content
However, if the extra joins in MQT are RI-joins, i.e. joins with RI parent table with join predicate of the form:
child.fk = parent.pk
They are “look-up” joins that do not add or delete rows
RI-join is common between Fact and Dimension tables.
Need to declare RI constraints (even just informational constraints) to tell DB2 about it
Query: MQT:
Sales Time
GB
Sales Time
Product
GB
RI-join
IBM Software Group | DB2 information management software
26
Queries with fewer joins than MQT (Example)
Original Query
SELECT t.month_id,
f.store_id,
sum(amount) as total_sales
FROM sales f, time t
WHERE f.date_id = t.date_id
GROUP BY t.month_id, f.store_id
Rewritten Query
SELECT m.month_id,
m.store_id,
sum(m.total_sales)
FROM MQT_SalesByMthStrGrp m
GROUP BY m.month_id, m.store_id
Route Query to MQT
MQT_SalesByMthStrGrp
SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id
AND f.product_id = p.product_id
GROUP BY t.month_id, s.store_id,
p.group_id
alter table sales add constraint fk_product foreign key (product_id) references product;
Sales (product_id int NOT NULL)
IBM Software Group | DB2 information management software
27
Importance of NOT NULL Constraint on Foreign Key Columns
Declaring NOT NULL constraint on the foreign key column of the child table is important!
If a row of column SALES.product_id has null value: the row will not qualify the join predicate f.product_id = p.product_id.
The MQT would be missing some rows that are needed by the query, hence incorrect to route.
MQT_SalesByMthStrGrp
SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id
AND f.product_id = p.product_id
GROUP BY t.month_id, s.store_id,
p.group_id
IBM Software Group | DB2 information management software
28
Queries at higher aggregation level than an MQT
Store
Month
Product
Class
Group
Family
Line
Division
Quarter
Year
Region
Sales Fact
Product Dimension
Time Dimension
Store Dimension
Level 5
Level 1
Level 2
Level 3
Level 4
Level 0Day
MQT for the Store-Month-Group slice
Queries that access data at or above that slice can be satisfied by the MQT with some additional aggregating.
Querying more detailed data below the slice, such as Day or Product, cannot be routed to this MQT.
IBM Software Group | DB2 information management software
29
MQT_SalesByMthStrGrp
SELECT t.month_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id
AND f.product_id = p.product_id
GROUP BY t.month_id, s.store_id,
p.group_id
Queries at a higher level aggregation - Example
Original Query
SELECT t.quarter_id,
f.store_id,
p.group_id,
sum(amount) as total_sales
FROM sales f, product p, time t
WHERE f.date_id = t.date_id AND
f.product_id = p.product_id
GROUP BY t.quarter_id, f.store_id,
p.group_id
Rewritten Query
SELECT t.quarer_id,
m.store_id,
m.group_id,
sum(m.total_sales)
FROM MQT_SalesByMthStrGrp m,
time t
WHERE m.month_id = t.month_id
GROUP BY t.quarter_id, m.store_id, m.group_id
Route Query to MQT
Functional Dependency
TIME (month_id→quarter_id)
IBM Software Group | DB2 information management software
30
Functional Dependencies
X functionally determines Y iff each X value is associated with precisely one Y value.
By definition, all columns of a relation are functionally dependent on the unique/primary key.
De-normalization introduces functional dependencies between non-key columns. Between ID and descriptive columns
Example: group_id → group_desc Between different levels of the hierarchy
Example: group_id → (family_id, line_id,
division_id)
Product
Class
Group
Family
Line
Division
Product Dimension
IBM Software Group | DB2 information management software
31
Functional Dependencies in DB2
User can help DB2 identifies the functional dependencies among non-key columns by declaring NOT ENFORCED FD constraints:
ALTER TABLE PRODUCT
ADD CHECK((group_desc) DETERMINED BY group_id)
NOT ENFORCED ENABLE QUERY OPTIMIZATION
During MQT routing, DB2 will re-join with the table to pick up the “dependents”
IBM Software Group | DB2 information management software
32
Functional Dependencies - ExampleQuery at year-region-division level:
Select t.year_id, s.region_desc, p.division_desc, sum(amount) as total_sales ...GROUP BY t.year_id, s.region_desc, p.division_desc;
MQT at the month-store- group aggregate level: SELECT t.month_id, t.year_id, f.store_id, s.region_id, s.region_desc, p.group_id, p.division_id, p.division_desc, sum(amount) as total_sales FROM sales f, product p, time t ,stored s WHERE f.date_id = t.date_id AND f.product_id = p.product_id AND f.store_id = s.store_id GROUP BY t.month_id, t.year_id, s.store_id, s.region_id, s.region_desc, p.group_id ,p.division_id, p.division_desc
FD: TIME: month_id -> year_id
UK: STORE store_id -> (region_id, region_desc)
FD: PRODUCT: group_id -> (division_id, division_desc)
MQT
PRODUCT
TIME
GB (year-region-division)
FD re-join
FD re-join
PRODUCT
FD re-join
(month_id, year_id)
(group_id, division_desc)
(store_id, region_desc)
(month-store-group)
IBM Software Group | DB2 information management software
33
Functional Dependencies in MQT
It is a good practice to declare functional dependencies to keep the minimum amount of dimensional information in the MQT. This will:
keep the size of the MQT small
Keep the MQT unaffected by dimension table updates - minimize the need for maintenance
IBM Software Group | DB2 information management software
34
MQT Matching – Insurance Policy
MQT matching performed during Query Rewrite
Rewrite transformations mostly rule/heuristic based
Insurance Policy:
Optimizer Cost based comparison between:
Access plan with MQT routing
Access plan WITHOUT MQT routing
Cheapest plan wins - against choosing a wrong heuristic
Parser
Query Graph Semantics
Query Rewrite
Optimizer
Code Generator
SQL Query
With and w
ithout Aggregate
MQ
T
IBM Software Group | DB2 information management software
35
Agenda
Motivation Sample scenario
How MQT can help
Types of MQTs & Maintenance
MQT Matching – what queries can benefit from MQTs
Design Considerations
Other Uses of MQT
IBM Software Group | DB2 information management software
36
Design Considerations On usage of MQT
Space requirement – MQTs are materialized tables
On MQT definition Level of aggregation vs. query processing time
Too high level – few queries can route to it
Too low level – costly compensation time
Workload-based vs. Model-based
Numbers of MQTs
Store
Month
Product
Class
Group
Family
Line
Division
Quarter
Year
Region
Sales Fact
Product Dim.
Time Dim.
Store Dim.
Level 5
Level 1
Level 2
Level 3
Level 4
Level 0Day
IBM Software Group | DB2 information management software
37
Refreshing with finer granularity MQTs
DB2 9 allows the use of existing MQTs to be used by higher level MQTs during REFRESH
Matching done during REFRESH
It is very important to aggregate in the right order
Example : MQT1 : GROUP BY DATE, MONTH, YEAR
MQT2 : GROUP BY MONTH, YEAR
REFRESH MQT1
REFRESH MQT2 uses MQT1
IBM Software Group | DB2 information management software
38
Another Use of Replicated MQTs
In partitioned database, colocation of rows involved in different tables => efficient join (colocated joins)
In a typical star schema setup, the fact table can be co-located with at most one dimension table.
The remaining (smaller) dimension tables are frequently replicated to all nodes of the fact table
Replicated MQTs are used to achieve co-location of data for efficient join processing in partitioned database environment
Product
ProdID1
Sales
ProdID1
Product
ProdID2
Sales
ProdID2
Product
ProdID3
Sales
ProdID3
Store Store Store Store
IBM Software Group | DB2 information management software
39
Summary
An MQT is a table that stores pre-computed query results.
Queries can be matched to an MQT to improve performance.
Different types of MQTs and their maintenance: immediate, deferred (with/without staging table), user maintained
replicated
Queries can be matched to an MQT that has: Predicates that are superset of those in queries
Finer grained aggregation level
More/few table joins
Not null constraint, foreign key constraints and functional dependencies can be defined to help MQT matching
IBM Software Group | DB2 information management software
40
References
Redbooks (http://www.redbooks.ibm.com) DB2 UDB's High Function Business Intelligence in e-Business
(SG24-6546-00)
Related Papers Wolfgang Lehner, Richard Sidle, Hamid Pirahesh, Roberta
Cochrane: Maintenance of Automatic Summary Tables. SIGMOD Conference 2000: 512-513
Markos Zaharioudakis, Roberta Cochrane, George Lapis, Hamid Pirahesh, Monica Urata: Answering Complex SQL Queries Using Automatic Summary Tables. SIGMOD Conference 2000: 105-116
IBM Software Group | DB2 information management software
41
Thank You