Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION”...

27
Academic Year 2014 Spring

Transcript of Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION”...

Page 1: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Academic Year 2014 Spring

Page 2: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

MODULECC3005NI:Advanced Database Systems

“QUERY OPTIMIZATION”

Academic Year 2014 Spring

Page 3: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Query Optimization: Query Optimization is the process of choosing the most

efficient way to execute a SQL statement. When the cost-based optimizer was offered for the first

time with Oracle 7, Oracle supported only standard relational data.

Query Optimization is an important component of a modern relational database system.

Relational Database Systems provide a system managed optimization facility by making use of available tools.

Page 4: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.
Page 5: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Query Optimization: Description

A Query Optimizer is essentially a program for efficient

evaluation of relational queries, making use of relevant

statistic information Objective

To choose the most efficient strategy for implementing a given

relational query, thereby improve the efficiency and

performance of a relational database system

Page 6: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Need of Query Optimization:1. To perform automatic navigation:

A relational database system (based on non-navigational

relational model) allows users to simply state what data they

require and leave system to locate and process that data in

database

Page 7: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Need of Query Optimization:2. To achieve acceptable performance:

There may be different plans (called query plan) to perform a

single user query and query optimizer aims to select and

execute most efficient query plan based on information

available to system

Page 8: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Need of Query Optimization:3. To minimize existing differences:

Due to existing difference in speed between CPU and I/O

devices, a query optimizer aims to minimize I/O activities by

choosing ‘cheapest’ query plan for a given query

Page 9: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Effects of Optimization – Example: Consider following Student, Lending and Book tables:

Student (student_no, student_name, gender, address) Lending (lending_no, student_no, book_no) Book (book_no, title, author, edition)

Page 10: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Effects of Optimization – Example Assume that database tables contains

100 students in Student table 1000 lending in Lending table, of which only 50 are for book ‘B1’ 5000 books in Book table

Further assume that only results (intermediate relations) of up to 50 tuples can be kept in memory during query processing

Page 11: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Effects of Optimization – Example: Query

Retrieve names of students who have borrowed book ‘B1’ SQL

SELECT DISTINCT student_name

FROM student, lending

WHERE student.student_no = lending.student_no

AND lending.book_no = ‘B1’

Page 12: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Query Plan A – No Optimization: Operation Sequence – Join – Select – Project

Step 1 Join student and lending over student_no giving T1

Step 2 Select T1 where book_no = ‘B1’ giving T2

Step 3 Project T2 over student_name giving result

Page 13: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Query Plan A – No Optimization: We calculate number of database accesses (tuple I/O

operations) required for each item Number of tuple I/O is described as number of tuples

(records) to be read and written during operation

Page 14: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Query Plan A – Calculation: Step 1 – Join student and lending over student: no giving T1 Step 2 – Select T1 where book_no = ‘B1’ giving T2 Step 3 – Project T2 over student_name giving result

IR: Intermediate Relation

Total tuple I/O: 1,02,0000

Step Read Write IR Subtotal

1 100 x 10,000 10,000 10,000 1,01,0000

2 10,000 0 50 10,000

3 0 0 <= 50 0

Page 15: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Query Plan B – with Optimization: Operation Sequence – Select – Join – Project

Step 1 Select lending where book_no = ‘B1’ giving T1

Step 2 Join T1 and student over student_no giving T2

Step 3 Project T2 over student_name giving result

Page 16: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Query Plan B – with Optimization: We again calculate number of tuple I/O operations

required for each step

Page 17: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Query Plan B – Calculation: Step 1 – Select lending where book_no = ‘B1’ giving T1 Step 2 – Join T1 and student over student_no giving T2 Step 3 – Project T2 over student_name giving result

IR: Intermediate Relation

Total tuple I/O: 10,100

Step Read Write IR Subtotal

1 10,000 0 50 10,000

2 100 0 50 100

3 0 0 <= 50 0

Page 18: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Comparison Plan A vs. Plan B: Ratio of I/O tuples (Plan A to Plan B):

1,02,0000 / 10,100 Intermediate relations in Plan B are much smaller than

those in Plan A Tuple I/O can be further reduced by using indexes

If there is an index on book_no in lending table, tuples to be read will

be just 50 instead of 10000

Page 19: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Four Stages of Optimization:The query processing activity therefore acts as an interface between the querying individual/process and the database. It relieves the querying individual/ process of the burden of deciding the best execution strategy. So while the querying individual/ process specifies what, the query processor determines how.

Page 20: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Four Stages of Optimization: Stage 1

Convert query into some internal form more suitable for machine

manipulation e.g. Query Tree

Relational Algebra Stage 2

Further convert internal form into some equivalent and more efficient

Canonical Form making use of well defined transformation rules

Page 21: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Four Stages of Optimization: Example of Query Tree – Plan A (Join – Select – Project)

Student Lending

Join

Restrict

Project

Result

Over student_no

Where book_no = ‘B1’

Over student_name

Page 22: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Four Stages of Optimization: Stage 3

Choose a set of low-level procedures using statistics about database Low Level Operations (e.g. join, select, project)

Implementation procedures (one for each low level operation based on

varying conditions)

Cost formulae (one for each implementation procedure)

Page 23: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Four Stages of Optimization: Stage 4

Generate a set of candidate query plans and choose best of those

plans by evaluating cost formulae Process of selecting a query plan is also called ‘access path’ selection

‘cheapest’ query plan is normally considered to be one which produces

minimum I/O tuple operations and smallest set of intermediate

relations

Page 24: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Database Statistics: Selection of ‘optimal’ query plans in optimization process

makes use of database statistics stored in System Catalogue or Data Dictionary of database system

In other words, without this information (meta data) being available, query optimizer will not be able to choose most efficient query plan for implementing a given query

Page 25: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Database Statistics: Typical Database Statistics include

For each base table Cardinality

Number of pages for this tables

For each column of each base table Number of distinct values

Maximum, minimum and average value

Actual values and their frequencies

Page 26: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Database Statistics: Typical Database Statistics include (continued)

For each index Number of levels

Number of leaf pages

Page 27: Academic Year 2014 Spring. MODULE CC3005NI: Advanced Database Systems “QUERY OPTIMIZATION” Academic Year 2014 Spring.

Thank you!!!

Questions are WELCOME

Academic Year 2014 Spring