Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor...

51
Completing the Physical- Completing the Physical- Query-Plan and Chapter 16 Query-Plan and Chapter 16 Summary (16.7-16.8) Summary (16.7-16.8) CS257 Spring 2009 CS257 Spring 2009 Professor Tsau Lin Professor Tsau Lin Student: Suntorn Sae-Eung Student: Suntorn Sae-Eung Donavon Norwood Donavon Norwood

description

3 Before complete Physical- Query-Plan  A query previously has been  Parsed and Preprocessed (16.1)  Converted to Logical Query Plans (16.3)  Estimated the Costs of Operations (16.4)  Determined costs by Cost-Based Plan Selection (16.5)  Weighed costs of join operations by choosing an Order for Joins

Transcript of Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor...

Page 1: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

Completing the Physical-Completing the Physical-Query-Plan and Chapter Query-Plan and Chapter 16 Summary (16.7-16.8)16 Summary (16.7-16.8)

CS257 Spring 2009CS257 Spring 2009Professor Tsau LinProfessor Tsau Lin

Student: Suntorn Sae-EungStudent: Suntorn Sae-EungDonavon NorwoodDonavon Norwood

Page 2: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

22

OutlineOutline

16.7 Completing the Physical-Query-Plan16.7 Completing the Physical-Query-PlanI. Choosing a Selection MethodI. Choosing a Selection MethodII. Choosing a Join MethodII. Choosing a Join MethodIII. Pipelining Versus MaterializationIII. Pipelining Versus MaterializationIV. Pipelining Unary OperationsIV. Pipelining Unary OperationsV. Pipelining Binary OperationsV. Pipelining Binary OperationsVI. Notation for Physical Query PlanVI. Notation for Physical Query PlanVII. Ordering the Physical OperationsVII. Ordering the Physical Operations

16.8 Summary of Chapter 1616.8 Summary of Chapter 16

Page 3: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

33

Before complete Physical-Before complete Physical-Query-PlanQuery-Plan

A query previously has been A query previously has been Parsed and Preprocessed (16.1)Parsed and Preprocessed (16.1) Converted to Logical Query Plans (16.3) Converted to Logical Query Plans (16.3) Estimated the Costs of Operations (16.4)Estimated the Costs of Operations (16.4) Determined costs by Cost-Based Plan Determined costs by Cost-Based Plan

Selection (16.5)Selection (16.5) Weighed costs of join operations by Weighed costs of join operations by

choosing an Order for Joinschoosing an Order for Joins

Page 4: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

44

16.7 Completing the Physical-16.7 Completing the Physical-Query-PlanQuery-Plan

3 topics related to turning LP into a 3 topics related to turning LP into a complete physical plancomplete physical plan

1.1. Choosing of physical implementations such Choosing of physical implementations such as as SelectionSelection and and Join methodsJoin methods

2.2. Decisions regarding to intermediate results Decisions regarding to intermediate results ((MaterializedMaterialized or or PipelinedPipelined))

3.3. NotationNotation for physical-query-plan operators for physical-query-plan operators

Page 5: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

55

I. Choosing a Selection Method I. Choosing a Selection Method (A)(A)

Algorithms for each selection operatorsAlgorithms for each selection operators1. Can we use an created index on an 1. Can we use an created index on an

attribute?attribute? If yes, index-scan. Otherwise table-scan)If yes, index-scan. Otherwise table-scan)2. After retrieve all condition-satisfied tuples in 2. After retrieve all condition-satisfied tuples in

(1), then (1), then filterfilter them with the rest selection them with the rest selection conditionsconditions

Page 6: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

66

Choosing a Selection Choosing a Selection Method(A) (cont.)Method(A) (cont.) Recall Recall Cost of query = # disk I/O’sCost of query = # disk I/O’s How costs for various plans are estimated from How costs for various plans are estimated from σσCC(R)(R) operationoperation

1.1. Cost of table-scan algorithmCost of table-scan algorithma)a) B(R)B(R) if R is clusteredif R is clusteredb)b) T(R)T(R) if R is not clusteredif R is not clustered

2.2. Cost of a plan picking an equality term (e.g. a = 10) w/ index-scanCost of a plan picking an equality term (e.g. a = 10) w/ index-scana)a) B(R) / V(R, a)B(R) / V(R, a) clustering indexclustering indexb)b) T(R) / V(R, a)T(R) / V(R, a) nonclustering index nonclustering index

3.3. Cost of a plan picking an inequality term (e.g. b < 20) w/ index-scanCost of a plan picking an inequality term (e.g. b < 20) w/ index-scana)a) B(R)B(R) / 3 / 3 clustering indexclustering indexb)b) T(R)T(R) / 3 / 3 nonclustering index nonclustering index

Page 7: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

77

ExampleExampleSelection: Selection: σσx=1 x=1 ANDAND y=2 y=2 ANDAND z<5 z<5 (R)(R)

- Where parameters of - Where parameters of R(x, y, z) R(x, y, z) areare : : T(R)=5000,T(R)=5000, B(R)=200,B(R)=200,V(R,x)=100, andV(R,x)=100, and V(R, y)=500V(R, y)=500

- Relation Relation RR is is clusteredclustered- x, y x, y have nonclustering indexeshave nonclustering indexes, , only index on only index on zz

is clustering.is clustering.

Page 8: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

88

Example (cont.)Example (cont.)Selection options:Selection options:1.1. Table-scan Table-scan filter filter x, y, zx, y, z. Cost is. Cost is B(R) B(R) = = 200200 sincesince

R R is clustered.is clustered.2.2. Use index onUse index on x =1 x =1 filter on filter on y y,, z. z. Cost is 50 sinceCost is 50 since

T(R) / V(R, x) T(R) / V(R, x) is (5000/100) = 50 tuples, index is is (5000/100) = 50 tuples, index is not clustering.not clustering.

3.3. Use index onUse index on y =2 y =2 filter on filter on x, z. x, z. Cost is 10 sinceCost is 10 since T(R) / V(R, y) T(R) / V(R, y) is (5000/500) = 10 tuples using is (5000/500) = 10 tuples using nonclustering index.nonclustering index.

4.4. Index-scan on clustering index w/ Index-scan on clustering index w/ z < 5z < 5 filter filter x x ,,y. y. Cost is about Cost is about B(R)B(R)/3 = 67/3 = 67

Page 9: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

99

Example (cont.)Example (cont.)

CostsCostsoption 1 = 200 option 1 = 200 option 2 = 50 option 2 = 50 option 3 = 10 option 3 = 10 option 4 = 67 option 4 = 67

The lowest Cost is option 3. The lowest Cost is option 3. Therefore, the preferred physical plan Therefore, the preferred physical plan

1.1. retrieves all tuples with y = 2 retrieves all tuples with y = 2 2.2. then filters for the rest two conditions (x, z).then filters for the rest two conditions (x, z).

Page 10: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

1010

II. Choosing a Join II. Choosing a Join MethodMethod

Determine costs associated with each join Determine costs associated with each join algorithms: algorithms: 1. 1. One-pass joinOne-pass join, and , and nested-loop joinnested-loop join devotes devotes

enough buffer to joiningenough buffer to joining2. 2. Sort-joinSort-join is preferred when attributes are pre-sorted is preferred when attributes are pre-sorted

or two or more join on the same attribute such as or two or more join on the same attribute such as

((R(a, b) S(a, c)) T(a, d) R(a, b) S(a, c)) T(a, d) - where sorting R and S on a will produce result of R - where sorting R and S on a will produce result of R S to be sorted on a and used directly in next join S to be sorted on a and used directly in next join

Page 11: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

1111

3. Index-join for a join with high chance of 3. Index-join for a join with high chance of using index created on the join attribute such using index created on the join attribute such as as R(a, b) S(b, c)R(a, b) S(b, c)

4. Hashing join is the best choice for unsorted 4. Hashing join is the best choice for unsorted or non-indexing relations which needs or non-indexing relations which needs multipass join.multipass join.

Choosing a Join Method Choosing a Join Method (cont.)(cont.)

Page 12: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

1212

III. Pipelining Versus III. Pipelining Versus MaterializationMaterialization

Materialization (naïve way)Materialization (naïve way) store (intermediate) result of each operations on store (intermediate) result of each operations on diskdisk

Pipelining (more efficient way) Pipelining (more efficient way) Interleave the execution of several operations, the tuples Interleave the execution of several operations, the tuples

produced by one operation are passed directly to the produced by one operation are passed directly to the

operations that used itoperations that used it

store (intermediate) result of each operations on store (intermediate) result of each operations on bufferbuffer, , which which

is implemented on is implemented on main memorymain memory

Page 13: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

1313

UnaryUnary = = a-tuple-at-a-timea-tuple-at-a-time or or full relationfull relation selection and projection are the best selection and projection are the best

candidates for pipelining.candidates for pipelining.

IV. Pipelining Unary IV. Pipelining Unary OperationsOperations

R

In buf Unaryoperation

Out buf

In buf Unaryoperation

Out buf

M-1 buffers

Page 14: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

1414

Pipelining Unary Operations Pipelining Unary Operations (cont.)(cont.)

Pipelining Unary Operations are implemented Pipelining Unary Operations are implemented by iteratorsby iterators

Page 15: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

1515

V. Pipelining Binary V. Pipelining Binary OperationsOperations

BinaryBinary operations : operations : ,, ,, - , , x- , , x The results of binary operations can also The results of binary operations can also

be pipelined.be pipelined. Use one buffer to pass result to its Use one buffer to pass result to its

consumer, one block at a time.consumer, one block at a time. The extended example shows tradeoffs The extended example shows tradeoffs

and opportunitiesand opportunities

Page 16: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

1616

ExampleExample Consider physical query plan for the Consider physical query plan for the

expressionexpression((R(w, x) S(x, y)) U(y, z)R(w, x) S(x, y)) U(y, z)

AssumptionAssumption R R occupies 5,000 blocks, occupies 5,000 blocks, SS and and UU each 10,000 each 10,000

blocks.blocks. The intermediate result R S occupies The intermediate result R S occupies kk blocks for blocks for

some some kk.. Both joins will be implemented as hash-joins, either Both joins will be implemented as hash-joins, either

one-pass or two-pass depending on one-pass or two-pass depending on kk There are 101 buffers available.There are 101 buffers available.

Page 17: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

1717

Example (cont.)Example (cont.) First consider join First consider join R SR S, neither relations , neither relations fits in buffersfits in buffers Needs two-pass Needs two-pass hash-join to partition hash-join to partition RR into 100 buckets into 100 buckets (maximum possible) each bucket has 50 blocks(maximum possible) each bucket has 50 blocks The 2The 2ndnd pass hash-join uses 51 buffers, leaving pass hash-join uses 51 buffers, leaving

the rest 50 buffers for joining result of the rest 50 buffers for joining result of R SR S with with UU..

Page 18: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

1818

Example (cont.)Example (cont.) Case 1: Case 1: suppose suppose kk 49, the result of 49, the result of

R SR S occupies at most 49 blocks. occupies at most 49 blocks. Steps Steps

1.1. Pipeline in Pipeline in R SR S into 49 buffers into 49 buffers2.2. Organize them for lookup as a hash tableOrganize them for lookup as a hash table3.3. Use one buffer left to read each block of Use one buffer left to read each block of UU

in turnin turn4.4. Execute the second join as one-pass join.Execute the second join as one-pass join.

Page 19: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

1919

Example (cont.)Example (cont.) The total number of The total number of

I/O’s is I/O’s is 55,00055,000 45,000 for two-pass hash 45,000 for two-pass hash

join of join of RR and and SS 10,000 to read 10,000 to read UU for one- for one-

pass hash join of pass hash join of (R S) U(R S) U..

Page 20: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

2020

Example (cont.)Example (cont.) Case 2Case 2: suppose: suppose k k > 49 but < 5,000, we can > 49 but < 5,000, we can

still pipeline, but need another strategy which still pipeline, but need another strategy which intermediate results join with U in a 50-intermediate results join with U in a 50-bucket, two-pass hash-join. Steps are:bucket, two-pass hash-join. Steps are:

1.1. Before start on Before start on R SR S, we hash , we hash UU into 50 buckets of into 50 buckets of 200 blocks each.200 blocks each.

2.2. Perform two-pass hash join of Perform two-pass hash join of RR and and UU using 51 using 51 buffers as case 1, and placing results in 50 remaining buffers as case 1, and placing results in 50 remaining buffers to form 50 buckets for the join of buffers to form 50 buckets for the join of R SR S with with UU..

3.3. Finally, join Finally, join R SR S with with UU bucket by bucket. bucket by bucket.

Page 21: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

2121

Example (cont.)Example (cont.) The number of disk I/O’s is:The number of disk I/O’s is:

20,000 to read U and write its tuples into 20,000 to read U and write its tuples into bucketsbuckets

45,000 for two-pass hash-join 45,000 for two-pass hash-join R SR S kk to write out the buckets of to write out the buckets of R SR S kk+10,000 to read the buckets of +10,000 to read the buckets of R SR S and and UU

in the final joinin the final join The total cost is The total cost is 75,000+275,000+2kk..

Page 22: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

2222

Example (cont.)Example (cont.) Compare Increasing I/O’s between case Compare Increasing I/O’s between case

1 and case 21 and case 2 kk 49 (case 1) 49 (case 1)

Disk I/O’s is Disk I/O’s is 55,00055,000 kk > 50 > 50 5000 (case 2) 5000 (case 2)

kk=50 , I/O’s is 75,000+(2*50) = =50 , I/O’s is 75,000+(2*50) = 75,10075,100 kk=51 , I/O’s is 75,000+(2*51) = =51 , I/O’s is 75,000+(2*51) = 75,10275,102 kk=52 , I/O’s is 75,000+(2*52) = =52 , I/O’s is 75,000+(2*52) = 75,10475,104

Notice:Notice: I/O’s discretely grows as I/O’s discretely grows as kk increases from 49 increases from 49 50. 50.

Page 23: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

2323

Example (cont.)Example (cont.) Case 3Case 3:: k k > 5,000, we cannot perform > 5,000, we cannot perform

two-pass join in 50 buffers available if two-pass join in 50 buffers available if result of result of R SR S is pipelined. Steps are is pipelined. Steps are

1.1. Compute Compute R SR S using two-pass join and store using two-pass join and store the result on disk.the result on disk.

2.2. Join result on (1) with Join result on (1) with UU, using two-pass join, using two-pass join..

Page 24: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

2424

Example (cont.)Example (cont.) The number of disk I/O’s is:The number of disk I/O’s is:

45,000 for two-pass hash-join 45,000 for two-pass hash-join RR and and SS k to store k to store R SR S on disk on disk 30,000 + 30,000 + kk for two-pass join of for two-pass join of UU in in R SR S

The total cost is The total cost is 75,000+475,000+4kk..

Page 25: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

2525

Example (cont.)Example (cont.) In summary, costs of physical plan as In summary, costs of physical plan as

function of function of R SR S size. size.

Page 26: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

2626

VI. Notation for Physical VI. Notation for Physical Query PlansQuery Plans

Several types of operators: Several types of operators: 1.1. Operators for leavesOperators for leaves2.2. (Physical) operators for Selection(Physical) operators for Selection3.3. (Physical) Sorts Operators(Physical) Sorts Operators4.4. Other Relational-Algebra OperationsOther Relational-Algebra Operations

In practice, each DBMS uses its own In practice, each DBMS uses its own internal notation for physical query plan.internal notation for physical query plan.

Page 27: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

2727

Notation for Physical Query Notation for Physical Query Plans (cont.)Plans (cont.)

1.1. Operator for leavesOperator for leaves A leaf operand is replaced in LQP treeA leaf operand is replaced in LQP tree

TableScan(TableScan(RR)) : read all blocks : read all blocks SortScan(SortScan(R, LR, L)) : read in order according to L : read in order according to L IndexScan(IndexScan(R, CR, C):): scan index attribute A by scan index attribute A by

condition C of form condition C of form AθcAθc.. IndexScan(IndexScan(R, AR, A)) : scan index attribute : scan index attribute R.AR.A. This . This

behaves like behaves like TableScanTableScan but more efficient if but more efficient if RR is is not clustered.not clustered.

Page 28: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

2828

Notation for Physical Query Notation for Physical Query Plans (cont.)Plans (cont.)2.2. (Physical) operators for Selection(Physical) operators for Selection

Logical operator Logical operator σσCC(R)(R) is often combined is often combined with access methods.with access methods. If If σσCC(R)(R) is replaced by is replaced by Filter(C),Filter(C), and there is no and there is no

index on index on RR or an attribute on condition or an attribute on condition CC Use Use TableScanTableScan oror SortScan(R, L)SortScan(R, L) to access to access RR

If condition If condition CC Aθc Aθc ANDAND D D for condition for condition DD, , and there is an index on and there is an index on R.AR.A, then we may, then we may Use operator Use operator IndexScan(IndexScan(RR, , AθcAθc)) to access to access RR and and Use Use Filter(DFilter(D)) in place of the selection in place of the selection σσCC(R)(R)

Page 29: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

2929

Notation for Physical Query Notation for Physical Query Plans (cont.)Plans (cont.)

3.3. (Physical) Sort Operators(Physical) Sort Operators Sorting can occur any point in physical Sorting can occur any point in physical

plan, which use a notation plan, which use a notation SortScan(R, L).SortScan(R, L). It is common to use an explicit operator It is common to use an explicit operator

Sort(L)Sort(L) to sort relation that is not stored. to sort relation that is not stored. Can apply at the top of physical-query-plan Can apply at the top of physical-query-plan

tree if the result needs to be sorted with tree if the result needs to be sorted with ORDER BY clauseORDER BY clause ( (гг). ).

Page 30: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

3030

Notation for Physical Query Notation for Physical Query Plans (cont.)Plans (cont.)

4.4. Other Relational-Algebra OperationsOther Relational-Algebra Operations Descriptive text definitions and signs to elaborate Descriptive text definitions and signs to elaborate

Operations performed e.g. Join or grouping.Operations performed e.g. Join or grouping. Necessary parameters e.g. theta-join or list of Necessary parameters e.g. theta-join or list of

elements in a grouping.elements in a grouping. A general strategy for the algorithm e.g. sort-A general strategy for the algorithm e.g. sort-

based, hashed based, or index-based.based, hashed based, or index-based. A decision about number of passed to be used A decision about number of passed to be used

e.g. one-pass, two-pass or multipass.e.g. one-pass, two-pass or multipass. An anticipated number of buffers the operations An anticipated number of buffers the operations

will required.will required.

Page 31: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

3131

Notation for Physical Query Notation for Physical Query Plans (cont.)Plans (cont.) Example of a physical-query-planExample of a physical-query-plan

A physical-query-plan in example 16.36 for the case A physical-query-plan in example 16.36 for the case k > 5000k > 5000 TableScanTableScan Two-pass hash joinTwo-pass hash join Materialize (double line)Materialize (double line) Store operatorStore operator

Page 32: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

3232

Notation for Physical Query Notation for Physical Query Plans (cont.)Plans (cont.) Another exampleAnother example

A physical-query-plan in example 16.36 for the case A physical-query-plan in example 16.36 for the case k < 49k < 49 TableScanTableScan (2) Two-pass hash join(2) Two-pass hash join PipeliningPipelining Different buffers needsDifferent buffers needs Store operatorStore operator

Page 33: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

3333

Notation for Physical Query Notation for Physical Query Plans (cont.)Plans (cont.)

A physical-query-plan in example 16.35A physical-query-plan in example 16.35 Use Index on condition y = 2 firstUse Index on condition y = 2 first Filter with the rest condition later on.Filter with the rest condition later on.

Page 34: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

3434

VII. Ordering of Physical VII. Ordering of Physical OperationsOperations

The PQP is represented as a The PQP is represented as a tree tree structure structure implied order of operations.implied order of operations.

Still, the order of evaluation of interior Still, the order of evaluation of interior nodes may not always be clear.nodes may not always be clear.

Iterators are used in pipeline mannerIterators are used in pipeline manner Overlapped time of various nodes will Overlapped time of various nodes will

make “ordering” no sense.make “ordering” no sense.

Page 35: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

3535

Ordering of Physical Ordering of Physical Operations (cont.)Operations (cont.)

3 rules summarize the ordering of events 3 rules summarize the ordering of events in a PQP tree:in a PQP tree:

1.1. Break the tree into sub-trees at each edge Break the tree into sub-trees at each edge that represent materialization. that represent materialization. Execute one subtree at a time.Execute one subtree at a time.

2.2. Order the execution of the subtreeOrder the execution of the subtree Bottom-topBottom-top Left-to-rightLeft-to-right

3.3. All nodes of each sub-tree are executed All nodes of each sub-tree are executed simultaneously.simultaneously.

Page 36: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

3636

Summary of Chapter 16Summary of Chapter 16

In this part of the presentation I will talk In this part of the presentation I will talk about the main topics of Chapter 16.about the main topics of Chapter 16.

Page 37: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

3737

COMPILATION OF COMPILATION OF QUERIESQUERIES

Compilation means turning a query into a Compilation means turning a query into a physical query plan, which can be physical query plan, which can be implemented by query engine.implemented by query engine.

Steps of query compilation :Steps of query compilation : ParsingParsing Semantic checkingSemantic checking Selection of the preferred logical query planSelection of the preferred logical query plan Generating the best physical planGenerating the best physical plan

Page 38: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

3838

THE PARSERTHE PARSER The first step of SQL query processing.The first step of SQL query processing. Generates a parse treeGenerates a parse tree Nodes in the parse tree corresponds to Nodes in the parse tree corresponds to

the SQL constructsthe SQL constructs Similar to the compiler of a programming Similar to the compiler of a programming

languagelanguage

Page 39: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

3939

VIEW EXPANSIONVIEW EXPANSION A very critical part of query compilation.A very critical part of query compilation. Expands the view references in the query Expands the view references in the query

tree to the actual view.tree to the actual view. Provides opportunities for the query Provides opportunities for the query

optimization.optimization.

Page 40: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

4040

SEMANTIC CHECKINGSEMANTIC CHECKING Checks the semantics of a SQL query.Checks the semantics of a SQL query. Examines a parse tree.Examines a parse tree. Checks :Checks :

AttributesAttributes Relation namesRelation names TypesTypes

Resolves attribute referencesResolves attribute references..

Page 41: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

4141

CONVERSION TO A CONVERSION TO A LOGICAL QUERY PLANLOGICAL QUERY PLAN

Converts a semantically parsed tree to a Converts a semantically parsed tree to a algebraic expression.algebraic expression.

Conversion is straightforward but sub Conversion is straightforward but sub queries need to be optimized.queries need to be optimized.

Two argument selection approach can be Two argument selection approach can be used.used.

Page 42: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

4242

ALGEBRAIC ALGEBRAIC TRANSFORMATIONTRANSFORMATION Many different ways to transform a logical query plan to Many different ways to transform a logical query plan to

an actual plan using algebraic transformations.an actual plan using algebraic transformations. The laws used for this transformation :The laws used for this transformation :

Commutative and associative laws Commutative and associative laws Laws involving selectionLaws involving selection Pushing selectionPushing selection Laws involving projectionLaws involving projection Laws about joins and productsLaws about joins and products Laws involving duplicate eliminationsLaws involving duplicate eliminations Laws involving grouping and aggregationLaws involving grouping and aggregation

Page 43: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

4343

ESTIMATING SIZES OF ESTIMATING SIZES OF RELATIONSRELATIONS

True running time is taken into consideration True running time is taken into consideration when selecting the best logical plan.when selecting the best logical plan.

Two factors the affects the most in estimating Two factors the affects the most in estimating the sizes of relation : the sizes of relation : Size of relations ( No. of tuples ) Size of relations ( No. of tuples ) No. of distinct values for each attribute of each No. of distinct values for each attribute of each

relationrelation Histograms are used by some systems.Histograms are used by some systems.

Page 44: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

4444

COST BASED OPTIMIZINGCOST BASED OPTIMIZING Best physical query plan represents the Best physical query plan represents the

least costly plan.least costly plan. Factors that decide the cost of a query plan :Factors that decide the cost of a query plan :

Order and grouping operations like joins, unions Order and grouping operations like joins, unions and intersections.and intersections.

Nested loop and the hash loop joins used.Nested loop and the hash loop joins used. Scanning and sorting operations.Scanning and sorting operations. Storing intermediate resultsStoring intermediate results..

Page 45: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

4545

PLAN ENUMERATION PLAN ENUMERATION STRATEGIESSTRATEGIES

Common approaches for searching the space Common approaches for searching the space for best physical plan .for best physical plan . Dynamic programming : Tabularizing the best plan Dynamic programming : Tabularizing the best plan

for each sub expressionfor each sub expression Selinger style programming : sort-order the results as Selinger style programming : sort-order the results as

a part of tablea part of table Greedy approaches : Making a series of locally Greedy approaches : Making a series of locally

optimal decisionsoptimal decisions Branch-and-bound : Starts with enumerating the Branch-and-bound : Starts with enumerating the

worst plans and reach the best planworst plans and reach the best plan

Page 46: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

4646

LEFT-DEEP JOIN TREES LEFT-DEEP JOIN TREES Left – Deep Join Trees are the binary trees Left – Deep Join Trees are the binary trees

with a single spine down the left edge and with a single spine down the left edge and with leaves as right children.with leaves as right children.

This strategy reduces the number of plans This strategy reduces the number of plans to be considered for the best physical plan.to be considered for the best physical plan.

Restrict the search to Left – Deep Join Restrict the search to Left – Deep Join Trees when picking a grouping and order Trees when picking a grouping and order for the join of several relations.for the join of several relations.

Page 47: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

4747

PHYSICAL PLANS FOR PHYSICAL PLANS FOR SELECTIONSELECTION

Breaking a selection into an index-scan Breaking a selection into an index-scan of relation, followed by a filter operation.of relation, followed by a filter operation.

The filter then examines the tuples The filter then examines the tuples retrieved by the index-scan.retrieved by the index-scan.

Allows only those to pass which meet the Allows only those to pass which meet the portions of selection conditionportions of selection condition..

Page 48: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

4848

PIPELINING VERSUS PIPELINING VERSUS MATERIALIZINGMATERIALIZING

This flow of data between the operators can be This flow of data between the operators can be controlled to implement “ Pipelining “ .controlled to implement “ Pipelining “ .

The intermediate results should be removed from main The intermediate results should be removed from main memory to save space for other operators.memory to save space for other operators.

This techniques can implemented using “ This techniques can implemented using “ materialization “ .materialization “ .

Both the pipelining and the materialization should be Both the pipelining and the materialization should be considered by the physical query plan generator.considered by the physical query plan generator.

An operator always consumes the result of other An operator always consumes the result of other operator and is passed through the main memory.operator and is passed through the main memory.

Page 49: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

4949

Questions & Answers Questions & Answers

Page 50: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

For your attentionFor your attention

Page 51: Completing the Physical- Query-Plan and Chapter 16 Summary (16.7-16.8) CS257 Spring 2009 Professor Tsau Lin Student: Suntorn Sae-Eung Donavon Norwood.

5151

ReferenceReference

[1] H. Garcia-Molina, J. Ullman, and J. Widom, [1] H. Garcia-Molina, J. Ullman, and J. Widom, ““Database System: The Complete BookDatabase System: The Complete Book,” ,” second edition: p.897-913, Prentice Hall, second edition: p.897-913, Prentice Hall, New Jersey, 2008New Jersey, 2008