CS263 Lecture 19 Query Optimisation. Motivation for Query Optimisation Phases of Query Processing ...

36
CS263 Lecture 19 Query Optimisation
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    239
  • download

    1

Transcript of CS263 Lecture 19 Query Optimisation. Motivation for Query Optimisation Phases of Query Processing ...

Page 1: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

CS263Lecture 19

Query Optimisation

Page 2: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Motivation for Query Optimisation Phases of Query Processing Query Trees RA Transformation Rules Heuristic Processing Strategies Cost Estimation for RA Operations

LECTURE PLAN

Page 3: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Motivation for Query Optimisation

List all the managers that work in the sales department.

SELECT *

FROM emp, dept

WHERE emp.deptno = dept.deptno

AND emp.job = ‘Manager’

AND dept.name = ‘Sales’;

(job = ‘Manager’) (name=‘Sales’) (emp.deptno = dept.deptno) (EMP X DEPT)

(job = ‘Manager’) (name=‘Sales’) (EMP emp.deptno = dept.deptno DEPT)

((job = ‘Manager’) (EMP)) emp.deptno = dept.deptno ((name=‘Sales’) (DEPT))

There are at least three alternative ways of representing this query as a Relational Algebra expression.

Page 4: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Motivation for Query Optimisation

(job = ‘Manager’) (name=‘Sales’) (emp.deptno = dept.deptno) (EMP X DEPT)

Metrics:1000 tuples in the EMP relation50 tuples in the DEPT relation50 employees are Managers (one per department)5 separate Sales departments (across the country)

Cost of processing the following query alternate:

Cartesian product of EMP and DEPT: (1000 + 50) record I/O’s to read the relations

+ (1000 * 50) record I/O’s to create an intermediate relation to store result

Selection on result of Cartesian product: (1000 * 50) record I/O’s to read tuples and compare against predicate

Total cost of the query: (1000 + 50) + 2*(1000 * 50) = 101, 050 record I/O’s.

Page 5: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Motivation for Query OptimisationMetrics:1000 tuples in the EMP relation50 tuples in the DEPT relation50 employees are Managers (one per department)5 separate Sales departments (across the country)

Cost of processing the following query alternate:

Join of EMP and DEPT over deptno: (1000 + 50) record I/O’s to read the relations

+ (1000) record I/O’s to create an intermediate relation to store join result

Selection on result of Join: (1000) record I/O’s to read each tuple and compare against predicate

Total cost of the query: (1000 + 50) + 2*(1000) = 3, 050 record I/O’s.

(job = ‘Manager’) (name=‘Sales’) (EMP emp.deptno = dept.deptno DEPT)

Page 6: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Motivation for Query Optimisation

Cost of processing the following query:

((job = ‘Manager’) (EMP)) emp.deptno = dept.deptno ((name=‘Sales’) (DEPT))

Select ‘Managers’ in EMP: (1000) record I/O’s to read the relations

+ (50) record I/O’s to create an intermediate relation to store select result

Select ‘Sales’ in DEPT: (50) record I/O’s to read the relations

+ (5) record I/O’s to create an intermediate relation to store select result

Join of previous two selections over deptno: (50 + 5) record I/O’s to read the relations

Total cost of the query: (1000 2*(50) + 5 +(50 +5)) = 1, 160 record I/O’s.

Page 7: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Phases of Query Processing

Page 8: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Query Processing Stage - 1

Cast the query into internal form

This involves the conversion of the original (SQL) query into some internal representation more suitable for machine manipulation.

The internal representation typically chosen is either some kind of ‘abstract syntax tree’, or a relational algebra ‘query tree’.

Page 9: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Relational Algebra Query Trees

A Relational Algebra query can be represented as a ‘query tree’. For example the query to list all the managers that work in the sales department could be described as one of the following:

(job = ‘Manager’) (name=‘Sales’) (emp.deptno = dept.deptno) (EMP X DEPT)

EMP DEPT

X

(job = ‘Manager’) (name=‘Sales’) (emp.deptno = dept.deptno)

Leaves

Intermediateoperations

Root

Page 10: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Relational Algebra Query Trees

A Relational Algebra query can be represented as a ‘query tree’. For example the query to list all the managers that work in the sales department could be described as one of the following:

(job = ‘Manager’) (name=‘Sales’) (emp.deptno = dept.deptno) (EMP X DEPT)

EMP DEPT

X

(job = ‘Manager’) (name=‘Sales’)

(emp.deptno = dept.deptno)

Leaves

Intermediateoperations

Root

Page 11: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Relational Algebra Query Trees

(job = ‘Manager’) (name=‘Sales’) (EMP emp.deptno = dept.deptno DEPT)

EMP DEPT

(job = ‘Manager’) (name=‘Sales’)

emp.deptno = dept.deptno

Alternative‘query tree’ for the query to list all the managers that work in the sales department:

Page 12: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Relational Algebra Query Trees

((job = ‘Manager’) (EMP)) emp.deptno = dept.deptno ((name=‘Sales’) (DEPT))

EMP DEPT

emp.deptno = dept.deptno

(job = ‘Manager’) (name=‘Sales’)

Alternative‘query tree’ for the query to list all the managers that work in the sales department:

Page 13: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Query Processing Stage - 2

Convert to canonical form

Find a more ‘efficient’ representation of the query by converting the internal representation into some equivalent (canonical) form through the application of a set of well-defined ‘transformation rules’.

The set of transformation rules to apply will generally be the result of the application of specific heuristic processing strategies associated with particular DBMSs.

Page 14: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

1. Conjunctive selection operations can cascade into individual selection operations (and vice versa).

Sometimes referred to as cascade of selection.

pqr(R) = p(q(r(R)))

Example:

deptno=10 sal>1000(Emp) = deptno=10(sal>1000(Emp))

Transformation Rules for RA Operations

Page 15: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

2. Commutativity of selection

p(q(R)) = q(p(R))

Example:

sal>1000(deptno=10(Emp)) = deptno=10(sal>1000(Emp))

Transformation Rules for RA Operations

Page 16: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

3. In a sequence of projection operations, only the last in the sequence is required.

LM … N(R) = L (R)

Example:

deptnoname(Dept) = deptno (Dept))

Transformation Rules for RA Operations

Page 17: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

4. Commutativity of selection and projection.

Ai, …, Am(p(R)) = p(Ai, …, Am(R))

where p {A1, A2, …, Am}

Example:

name, job(name=‘Smith’(Emp)) = name=‘Smith'(name, job(Staff))

Transformation Rules for RA Operations

Selection predicate (p) is only made up of projected attributes

Page 18: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

5. Commutativity of theta-join (and Cartesian product).

Rp S = Sp R

Transformation Rules for RA Operations

R X S = S X R

Example:

EMP emp.deptno = dept.deptno DEPT

= DEPT emp.deptno = dept.deptno EMP

NOTE: Theta-join is a generalisation of both the equi-join and natural-join

Page 19: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

6. Commutativity of selection and theta-join (or Cartesian

product).

Transformation Rules for RA Operations

Example:

emp.deptno=10 (EMP)) emp.deptno = dept.deptno DEPT

= emp.deptno=10 (EMP emp.deptno = dept.deptno DEPT)

(p(R)) r S = p(R r S)

where p {A1, A2, …, Am}

Selection predicate (p) is only made up of join attributes

Page 20: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

7. Commutativity of projection and theta-join (or Cartesian

product).

Transformation Rules for RA Operations

Example:

job, location, deptno (EMP emp.deptno = dept.deptno DEPT)

= ( job, deptno (EMP)) emp.deptno = dept.deptno ( location, deptno (DEPT))

L(R r S) = (L1(R)) r (L2(S))

Project attributes L = L1 L2, where L1 are attributes of R, and L2 are attributes of S. L will also contain the join attributes

Page 21: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

8. Commutativity of union and intersection (but not set

difference).

R S = S R

R S = S R

Transformation Rules for RA Operations

Page 22: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Transformation Rules for RA Operations

9. Commutativity of selection and set operations (union, intersection, and set difference).

Union

p(R S) = p(S) p(R)

Intersection

p(R S) = p(S) p(R)

Set Difference

p(R - S) = p(S) - p(R)

Page 23: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

10 Commutativity of projection and union

L(R S) = L(S) L(R)

Transformation Rules for RA Operations

Page 24: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

11 Associativity of natural join (and Cartesian product)

Natural Join

(R S) T = R (S T)

Cartesian Product

(R X S) X T = R X (S X T)

Transformation Rules for RA Operations

Page 25: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Transformation Rules for RA Operations

12 Associativity of union and intersection (but not set difference)

Union

(R S) T = S (R T)

Intersection

(R S) T = S (R T)

Page 26: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Heuristic Processing Strategies

Perform selection operations as early as possible

Translate a Cartesian product and subsequent selection (whose predicate represents a join condition) into a join operation.

Use associativity of binary operations to ensure that the most restrictive selection operations are executed first

Perform projections as early as possible.

Compute common expressions once

Page 27: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Heuristic Processing - Example

EMP DEPT

(job = ‘Manager’) (name=‘Sales’)

emp.deptno = dept.deptno

EMP DEPT

(job = ‘Manager’) (name=‘Sales’)

emp.deptno = dept.deptno

EMP DEPT

(job = ‘Manager’) (name=‘Sales’)

emp.deptno = dept.deptno

EMP DEPT

emp.deptno = dept.deptno

(job = ‘Manager’) (name=‘Sales’)

EMP DEPT

emp.deptno = dept.deptno

(job = ‘Manager’) (name=‘Sales’)

EMP DEPT

emp.deptno = dept.deptno

(job = ‘Manager’)(job = ‘Manager’) (name=‘Sales’)

EMP DEPT

X

(job = ‘Manager’) (name=‘Sales’)

(emp.deptno = dept.deptno)

EMP DEPT

X

(job = ‘Manager’) (name=‘Sales’)

(emp.deptno = dept.deptno)

EMP DEPT

X

(job = ‘Manager’) (name=‘Sales’)

(emp.deptno = dept.deptno)

OptimisedCanonical Query

Page 28: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Query Processing Stage - 3

Choose candidate low-level procedures

Consider the (optimised canonical) query as a series of low-level operations (join, restrict, etc…).

For each of these operations generate alternative execution strategies and calculate the cost of such strategies on the basis of statistical information held about the database tables (files).

Page 29: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Query Processing Stage - 4

Generate query plans and choose the cheapest

Construct a set of ‘candidate’ Query Execution Plans (QEPs).

Each QEP is constructed by selecting a candidate implementation procedure for each operation in the canonical query and then combining them to form a string of associated operations.

Each QEP will have an (estimated) cost associated with it – the sum of the cost of each of its operations.

Choose the QEP with the least cost.

Page 30: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Cost Based Optimisation

Cost Based Optimisation (stages 3 & 4)

A good declarative query optimiser does not rely solely on heuristic processing strategies.

It chooses the QEP with the lowest estimated cost.

After heuristic rules are applied to a query, there still remains a number of alternative ways to execute it .

The Query Optimiser estimates the cost of executing each one (or at least a number) of these alternatives, and selects the cheapest one.

Page 31: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Costs associated with query execution

Secondary storage access costs: Searching for data blocks on disk, Reading data blocks from disk Writing data block to disk

Storage costs Cost of storing intermediate (temp) files

Computation costs Cost of CPU usage

Main memory usage costs Cost of buffering data

Communication costs Cost of moving data across

Page 32: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Database statistics used in cost estimation

Information held on each relation:

number of tuples number of blocks blocking factor primary access method primary access attributes secondary indexes secondary indexing attributes number of levels for each index number of distinct values of each attribute

Page 33: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Physical Data Structures – File Types Heap (Sequential, Unordered)

no key columns queries, other than appends, scan every page rows are appended at the end duplicate rows are allowed

Ordered physically sorted data file with no index

Hash (Random, Direct) data is located based on the (calculated) value of a hash field (key)

Indexed Sequential (ISAM) sorted data file with a primary index

B+Tree dynamic multilevel index reuses deleted space on associated data pages

Page 34: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Strategies for implementing the RESTRICT operation

Different access strategies dependant upon the structure of the file in which the relation is stored, and whether the predicate attribute(s) have been indexed/hashed: Each uses a different cost algorithm (which refers to specific database statistics).

Linear Search (Heap) Binary Search (Ordered)

Equality on Hash Key Equality condition on primary key Inequality condition on primary key Equality condition on secondary index Inequality condition on secondary B+Tree index

If the selection predicate is a composite (AND & OR) then there are additional cost considerations!

Page 35: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Strategies for implementing the JOIN operation

Different access strategies dependant upon the structure of the files in which the relations to be joined are stored, and whether the join attributes have been indexed/hashed: Each uses its own cost algorithm (which refers to specific database statistics).

Block nested loop join Indexed nested loop join Sort-merge join Hash join

Page 36: CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.

Query Optimisation Summary

The aims of query processing are to transform a query written in a high-level language (SQL), into a correct and efficient execution strategy expressed in a low-level language (Relational Algebra), and to execute the strategy to retrieve the required data.

There are many equivalent transformations of the same high-level query, the DBMS has to choose the one that minimises resource usage.

There are two main techniques for query optimisation. The first uses heuristic rules that order the operations in a query. The second compares different execution strategies for those operations, based on their relative costs, and selects the least resource intensive (cheapest) ones.