Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION”...
-
Upload
shavonne-walsh -
Category
Documents
-
view
215 -
download
0
Transcript of Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION”...
Academic Year 2014 Spring
MODULECC3005NI:Advanced Database Systems
“QUERY OPTIMIZATION”
Academic Year 2014 Spring
Query Optimization: Query Optimization is the process of choosing the most
efficient way to execute a SQL statement. When the cost-based optimizer was offered for the first
time with Oracle 7, Oracle supported only standard relational data.
Query Optimization is an important component of a modern relational database system.
Relational Database Systems provide a system managed optimization facility by making use of available tools.
Query Optimization: Description
A Query Optimizer is essentially a program for efficient
evaluation of relational queries, making use of relevant
statistic information Objective
To choose the most efficient strategy for implementing a given
relational query, thereby improve the efficiency and
performance of a relational database system
Need of Query Optimization:1. To perform automatic navigation:
A relational database system (based on non-navigational
relational model) allows users to simply state what data they
require and leave system to locate and process that data in
database
Need of Query Optimization:2. To achieve acceptable performance:
There may be different plans (called query plan) to perform a
single user query and query optimizer aims to select and
execute most efficient query plan based on information
available to system
Need of Query Optimization:3. To minimize existing differences:
Due to existing difference in speed between CPU and I/O
devices, a query optimizer aims to minimize I/O activities by
choosing ‘cheapest’ query plan for a given query
Effects of Optimization – Example: Consider following Student, Lending and Book tables:
Student (student_no, student_name, gender, address) Lending (lending_no, student_no, book_no) Book (book_no, title, author, edition)
Effects of Optimization – Example Assume that database tables contains
100 students in Student table 1000 lending in Lending table, of which only 50 are for book ‘B1’ 5000 books in Book table
Further assume that only results (intermediate relations) of up to 50 tuples can be kept in memory during query processing
Effects of Optimization – Example: Query
Retrieve names of students who have borrowed book ‘B1’ SQL
SELECT DISTINCT student_name
FROM student, lending
WHERE student.student_no = lending.student_no
AND lending.book_no = ‘B1’
Query Plan A – No Optimization: Operation Sequence – Join – Select – Project
Step 1 Join student and lending over student_no giving T1
Step 2 Select T1 where book_no = ‘B1’ giving T2
Step 3 Project T2 over student_name giving result
Query Plan A – No Optimization: We calculate number of database accesses (tuple I/O
operations) required for each item Number of tuple I/O is described as number of tuples
(records) to be read and written during operation
Query Plan A – Calculation: Step 1 – Join student and lending over student: no giving T1 Step 2 – Select T1 where book_no = ‘B1’ giving T2 Step 3 – Project T2 over student_name giving result
IR: Intermediate Relation
Total tuple I/O: 1,02,0000
Step Read Write IR Subtotal
1 100 x 10,000 10,000 10,000 1,01,0000
2 10,000 0 50 10,000
3 0 0 <= 50 0
Query Plan B – with Optimization: Operation Sequence – Select – Join – Project
Step 1 Select lending where book_no = ‘B1’ giving T1
Step 2 Join T1 and student over student_no giving T2
Step 3 Project T2 over student_name giving result
Query Plan B – with Optimization: We again calculate number of tuple I/O operations
required for each step
Query Plan B – Calculation: Step 1 – Select lending where book_no = ‘B1’ giving T1 Step 2 – Join T1 and student over student_no giving T2 Step 3 – Project T2 over student_name giving result
IR: Intermediate Relation
Total tuple I/O: 10,100
Step Read Write IR Subtotal
1 10,000 0 50 10,000
2 100 0 50 100
3 0 0 <= 50 0
Comparison Plan A vs. Plan B: Ratio of I/O tuples (Plan A to Plan B):
1,02,0000 / 10,100 Intermediate relations in Plan B are much smaller than
those in Plan A Tuple I/O can be further reduced by using indexes
If there is an index on book_no in lending table, tuples to be read will
be just 50 instead of 10000
Four Stages of Optimization:The query processing activity therefore acts as an interface between the querying individual/process and the database. It relieves the querying individual/ process of the burden of deciding the best execution strategy. So while the querying individual/ process specifies what, the query processor determines how.
Four Stages of Optimization: Stage 1
Convert query into some internal form more suitable for machine
manipulation e.g. Query Tree
Relational Algebra Stage 2
Further convert internal form into some equivalent and more efficient
Canonical Form making use of well defined transformation rules
Four Stages of Optimization: Example of Query Tree – Plan A (Join – Select – Project)
Student Lending
Join
Restrict
Project
Result
Over student_no
Where book_no = ‘B1’
Over student_name
Four Stages of Optimization: Stage 3
Choose a set of low-level procedures using statistics about database Low Level Operations (e.g. join, select, project)
Implementation procedures (one for each low level operation based on
varying conditions)
Cost formulae (one for each implementation procedure)
Four Stages of Optimization: Stage 4
Generate a set of candidate query plans and choose best of those
plans by evaluating cost formulae Process of selecting a query plan is also called ‘access path’ selection
‘cheapest’ query plan is normally considered to be one which produces
minimum I/O tuple operations and smallest set of intermediate
relations
Database Statistics: Selection of ‘optimal’ query plans in optimization process
makes use of database statistics stored in System Catalogue or Data Dictionary of database system
In other words, without this information (meta data) being available, query optimizer will not be able to choose most efficient query plan for implementing a given query
Database Statistics: Typical Database Statistics include
For each base table Cardinality
Number of pages for this tables
For each column of each base table Number of distinct values
Maximum, minimum and average value
Actual values and their frequencies
Database Statistics: Typical Database Statistics include (continued)
For each index Number of levels
Number of leaf pages
Thank you!!!
Questions are WELCOME
Academic Year 2014 Spring