Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example:...
Transcript of Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example:...
![Page 1: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/1.jpg)
Query Execution 2 and Query Optimization
Instructor: Matei Zahariacs245.stanford.edu
![Page 2: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/2.jpg)
Query Execution Overview
Query representation (e.g. SQL)
Logical query plan(e.g. relational algebra)
Optimized logical plan
Physical plan(code/operators to run)
CS 245 2
Query optim
ization
![Page 3: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/3.jpg)
Example SQL QuerySELECT titleFROM StarsInWHERE starName IN (
SELECT nameFROM MovieStarWHERE birthdate LIKE ‘%1960’
);
(Find the movies with stars born in 1960)
CS 245 3
![Page 4: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/4.jpg)
Parse Tree <Query>
<SFW>
SELECT <SelList> FROM <FromList> WHERE <Condition>
<Attribute> <RelName> <Tuple> IN <Query>
title StarsIn <Attribute> ( <Query> )
starName <SFW>
SELECT <SelList> FROM <FromList> WHERE <Condition>
<Attribute> <RelName> <Attribute> LIKE <Pattern>
name MovieStar birthDate ‘%1960’
CS 245 4
![Page 5: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/5.jpg)
Ptitle
sstarName=name
StarsIn Pname
sbirthdate LIKE ‘%1960’
MovieStar
´
Logical Query Plan
CS 245 5
![Page 6: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/6.jpg)
Improved Logical Query Plan
Ptitle
starName=name
StarsIn Pname
sbirthdate LIKE ‘%1960’
MovieStarCS 245 6
![Page 7: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/7.jpg)
Need expected size
StarsIn
MovieStar
P
s
Estimate Result Sizes
CS 245 7
![Page 8: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/8.jpg)
Parameters: join order,memory size, project attributes, ...Hash join
Seq scan Index scan Parameters:select condition, ...
StarsIn MovieStar
One Physical Plan
H
CS 245 8
![Page 9: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/9.jpg)
Parameters: join order,memory size, project attributes, ...Hash join
Index scan Seq scan Parameters:select condition, ...
StarsIn MovieStar
Another Physical Plan
H
CS 245 9
![Page 10: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/10.jpg)
Sort-merge join
Seq scan Seq scan
StarsIn MovieStar
Another Physical Plan
CS 245 10
![Page 11: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/11.jpg)
Logical plan
P1 P2 … Pn
C1 C2 … Cn
Pick best!
Estimating Plan Costs
Physical plancandidates
CS 245 11
![Page 12: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/12.jpg)
Execution Methods: Once We Have a Plan, How to Run it?
Several options that trade between complexity, performance and startup time
CS 245 12
![Page 13: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/13.jpg)
Example: Simple Query
SELECT quantity * priceFROM ordersWHERE productId = 75
Pquanity*price (σproductId=75 (orders))
CS 245 13
![Page 14: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/14.jpg)
Method 1: Interpretationinterface Operator {Tuple next();
}
class TableScan: Operator {String tableName;
}
class Select: Operator {Operator parent;Expression condition;
}
class Project: Operator {Operator parent;Expression[] exprs;
}
CS 245 14
interface Expression {Value compute(Tuple in);
}
class Attribute: Expression {String name;
}
class Times: Expression {Expression left, right;
}
class Equals: Expression {Expression left, right;
}
![Page 15: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/15.jpg)
Example Expression Classes
CS 245 15
class Attribute: Expression {String name;
Value compute(Tuple in) {return in.getField(name);
}}
class Times: Expression {Expression left, right;
Value compute(Tuple in) {return left.compute(in) * right.compute(in);
}}
probably better to use anumeric field ID instead
![Page 16: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/16.jpg)
Example Operator Classes
CS 245 16
class TableScan: Operator {String tableName;
Tuple next() {// read & return next record from file
}}
class Project: Operator {Operator parent;Expression[] exprs;
Tuple next() {tuple = parent.next();fields = [expr.compute(tuple) for expr in exprs];return new Tuple(fields);
}}
![Page 17: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/17.jpg)
Running Our Query with Interpretation
CS 245 17
ops = Project(expr = Times(Attr(“quantity”), Attr(“price”)),parent = Select(expr = Equals(Attr(“productId”), Literal(75)),parent = TableScan(“orders”)
));
while(true) {Tuple t = ops.next();if (t != null) {out.write(t);
} else {break;
}}
Pros & cons of this approach?
recursively calls Operator.next()and Expression.compute()
![Page 18: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/18.jpg)
Method 2: Vectorization
Interpreting query plans one record at a time is simple, but it’s too slow» Lots of virtual function calls and branches for
each record (recall Jeff Dean’s numbers)
Keep recursive interpretation, but make Operators and Expressions run on batches
CS 245 18
![Page 19: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/19.jpg)
Implementing Vectorization
CS 245 19
class TupleBatch {// Efficient storage, e.g.// schema + column arrays
}
interface Operator {TupleBatch next();
}
class Select: Operator {Operator parent;Expression condition;
}
...
class ValueBatch {// Efficient storage
}
interface Expression {ValueBatch compute(TupleBatch in);
}
class Times: Expression {Expression left, right;
}
...
![Page 20: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/20.jpg)
Typical Implementation
Values stored in columnar arrays (e.g. int[]) with a separate bit array to mark nulls
Tuple batches fit in L1 or L2 cache
Operators use SIMD instructions to update both values and null fields without branching
CS 245 20
![Page 21: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/21.jpg)
Pros & Cons of Vectorization
+ Faster than record-at-a-time if the queryprocesses many records
+ Relatively simple to implement
– Lots of nulls in batches if query is selective
– Data travels between CPU & cache a lot
CS 245 21
![Page 22: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/22.jpg)
Method 3: Compilation
Turn the query into executable code
CS 245 22
![Page 23: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/23.jpg)
Compilation Example
Pquanity*price (σproductId=75 (orders))
class MyQuery {void run() {
Iterator<OrdersTuple> in = openTable(“orders”);for(OrdersTuple t: in) {
if (t.productId == 75) {out.write(Tuple(t.quantity * t.price));
}}
}}
CS 245 23
generated class with the rightfield types for orders table
Can also theoretically generate vectorized code
![Page 24: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/24.jpg)
Pros & Cons of Compilation
+ Potential to get fastest possible execution
+ Leverage existing work in compilers
– Complex to implement
– Compilation takes time
– Generated code may not match hand-written
CS 245 24
![Page 25: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/25.jpg)
What’s Used Today?
Depends on context & other bottlenecks
Transactional databases (e.g. MySQL):mostly record-at-a-time interpretation
Analytical systems (Vertica, Spark SQL):vectorization, sometimes compilation
ML libs (TensorFlow): mostly vectorization (the records are vectors!), some compilation
CS 245 25
![Page 26: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/26.jpg)
Query Optimization
CS 245 26
![Page 27: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/27.jpg)
Outline
What can we optimize?
Rule-based optimization
Data statistics
Cost models
Cost-based plan selection
CS 245 27
![Page 28: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/28.jpg)
Outline
What can we optimize?
Rule-based optimization
Data statistics
Cost models
Cost-based plan selection
CS 245 28
![Page 29: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/29.jpg)
What Can We Optimize?
Operator graph: what operators do we run, and in what order?
Operator implementation: for operators with several impls (e.g. join), which one to use?
Access paths: how to read each table?» Index scan, table scan, C-store projections, …
CS 245 29
![Page 30: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/30.jpg)
Typical Challenge
There is an exponentially large set of possible query plans
Result: we’ll need techniques to prune the search space and complexity involved
Access pathsfor table 1
Access pathsfor table 2
Algorithmsfor join 1
Algorithmsfor join 2⨯ ⨯ ⨯ ⨯ …
CS 245 30
![Page 31: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/31.jpg)
Outline
What can we optimize?
Rule-based optimization
Data statistics
Cost models
Cost-based plan selection
CS 245 31
![Page 32: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/32.jpg)
What is a Rule?
Procedure to replace part of the query plan based on a pattern seen in the plan
Example: When I see expr OR TRUE for an expression expr, replace this with TRUE
CS 245 32
![Page 33: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/33.jpg)
Implementing Rules
Each rule is typically a function that walks through query plan to search for its pattern
void replaceOrTrue(Plan plan) {for (node in plan.nodes) {if (node instanceof Or) {if (node.right == Literal(true)) {plan.replace(node, Literal(true));break;
}// Similar code if node.left == Literal(true)
}}
}
Or
expr TRUE
node
node.left node.right
CS 245 33
![Page 34: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/34.jpg)
Implementing Rules
Rules are often grouped into phases» E.g. simplify Boolean expressions, pushdown
selects, choose join algorithms, etc
Each phase runs rules till they no longer applyplan = originalPlan;while (true) {for (rule in rules) {rule.apply(plan);
}if (plan was not changed by any rule) break;
}
CS 245 34
![Page 35: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/35.jpg)
Result
Simple rules can work together to optimize complex query plans (if designed well):
SELECT * FROM users WHERE(age>=16 && loc==CA) || (age>=16 && loc==NY) || age>=18
(age>=16) && (loc==CA || loc==NY) || age>=18
(age>=16 && (loc IN (CA, NY)) || age>=18
age>=18 || (age>=16 && (loc IN (CA, NY))
CS 245 35
![Page 36: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/36.jpg)
Example Extensible Optimizer
For Thursday, you’ll read about Spark SQL’s Catalyst optimizer» Written in Scala using its pattern matching
features to simplify writing rules» >500 contributors worldwide, >1000 types of
expressions, and hundreds of rules
We’ll also use Spark SQL in assignment 2
CS 245 36
![Page 37: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/37.jpg)
CS 245 37
![Page 38: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/38.jpg)
CS 245 38
![Page 39: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/39.jpg)
Common Rule-Based OptimizationsSimplifying expressions in select, project, etc» Boolean algebra, numeric expressions, string
expressions, etc» Many redundancies because queries are
optimized for readability or generated by code
Simplifying relational operator graphs» Select, project, join, etc
These relational optimizations have the most impact
CS 245 39
![Page 40: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/40.jpg)
Common Rule-Based OptimizationsSelecting access paths and operator implementations in simple cases» Index column predicate ⇒ use index» Small table ⇒ use hash join against it» Aggregation on field with few values ⇒ use
in-memory hash table
Rules also often used to do type checking and analysis (easy to write recursively)
Also veryhigh impact
CS 245 40
![Page 41: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/41.jpg)
Common Relational Rules
Push selects as far down the plan as possible
Recall:
σp(R ⨝ S) = σp(R) ⨝ S if p only references R
σq(R ⨝ S) = R ⨝ σq(S) if q only references S
σp∧q(R ⨝ S) = σp(R) ⨝ σq(S) if p on R, q on S
CS 245 41
Idea: reduce # of records early to minimize work in later ops; enable index access paths
![Page 42: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/42.jpg)
Common Relational Rules
Push projects as far down as possible
Recall:
Px(σp(R)) = Px(σp(Px∪z(R))) z = the fields in p
Px∪y(R ⨝ S) = Px∪y ((Px∪z (R)) ⨝ (Py∪z (S)))
x = fields in R, y = in S, z = in both
CS 245 42Idea: don’t process fields you’ll just throw away
![Page 43: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/43.jpg)
Project Rules Can Backfire!
Example: R has fields A, B, C, D, Ep: A=3 ∧ B=“cat”x: {E}
Px(σp(R)) vs Px(σp(P{A,B,E}(R)))
CS 245 43
![Page 44: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/44.jpg)
What if R has Indexes?
A = 3 B = “cat”
Intersect buckets to getpointers to matching tuples
CS 245 44
In this case, should do σp(R) first!
![Page 45: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/45.jpg)
Bottom Line
Many possible transformations aren’t always good for performance
Need more info to make good decisions» Data statistics: properties about our input or
intermediate data to be used in planning» Cost models: how much time will an operator
take given certain input data statistics?
CS 245 45
![Page 46: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/46.jpg)
Outline
What can we optimize?
Rule-based optimization
Data statistics
Cost models
Cost-based plan selection
CS 245 46
![Page 47: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/47.jpg)
What Are Data Statistics?
Information about the tuples in a relation that can be used to estimate size & cost» Example: # of tuples, average size of tuples,
# distinct values for each attribute, % of null values for each attribute
Typically maintained by the storage engine as tuples are added & removed in a relation» File formats like Parquet can also have them
CS 245 47
![Page 48: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/48.jpg)
Some Statistics We’ll Use
For a relation R,
T(R) = # of tuples in R
S(R) = average size of R’s tuples in bytes
B(R) = # of blocks to hold all of R’s tuples
V(R, A) = # distinct values of attribute A in R
CS 245 48
![Page 49: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/49.jpg)
Example
CS 245 49
R: A: 20 byte string
B: 4 byte integer
C: 8 byte date
D: 5 byte string
A B C Dcat 1 10 acat 1 20 bdog 1 30 adog 1 40 cbat 1 50 d
![Page 50: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/50.jpg)
Example
CS 245 50
T(R) = 5 S(R) = 37V(R, A) = 3 V(R, C) = 5V(R, B) = 1 V(R, D) = 4
R: A: 20 byte string
B: 4 byte integer
C: 8 byte date
D: 5 byte string
A B C Dcat 1 10 acat 1 20 bdog 1 30 adog 1 40 cbat 1 50 d
![Page 51: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/51.jpg)
Challenge: Intermediate Tables
Keeping stats for tables on disk is easy, but what about intermediate tables that appear during a query plan?
Examples:
σp(R)
R ⨝ S
CS 245 51
We already have T(R), S(R), V(R, a), etc,but how to get these for tuples that pass p?
How many and what types of tuple passthe join condition?
Should we do (R ⨝ S) ⨝ T or R ⨝ (S ⨝ T) or (R ⨝ T) ⨝ S?
![Page 52: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/52.jpg)
Stat Estimation Methods
Algorithms to estimate subplan stats
An ideal algorithm would have:1) Accurate estimates of stats2) Low cost3) Consistent estimates (e.g. different plans
for a subtree give same estimated stats)
Can’t always get all this!
CS 245 52
![Page 53: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/53.jpg)
Size Estimates for W = R1⨯R2
S(W) =
T(W) =
CS 245 53
![Page 54: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/54.jpg)
Size Estimates for W = R1⨯R2
S(W) =
T(W) =
CS 245 54
S(R1) + S(R2)
T(R1) ´ T(R2)
![Page 55: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/55.jpg)
Size Estimate for W = σA=a(R)
S(W) =
T(W) =
CS 245 55
![Page 56: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/56.jpg)
Size Estimate for W = σA=a(R)
S(W) = S(R)
T(W) =
CS 245 56
Not true if some variable-length fieldsare correlated with value of A
![Page 57: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/57.jpg)
Example
CS 245 57
R V(R,A)=3V(R,B)=1V(R,C)=5V(R,D)=4
W = σZ=val(R) T(W) =
A B C Dcat 1 10 acat 1 20 bdog 1 30 adog 1 40 cbat 1 50 d
![Page 58: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/58.jpg)
Example
CS 245 58
R V(R,A)=3V(R,B)=1V(R,C)=5V(R,D)=4
W = σZ=val(R) T(W) =
A B C Dcat 1 10 acat 1 20 bdog 1 30 adog 1 40 cbat 1 50 d what is probability this
tuple will be in answer?
![Page 59: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/59.jpg)
Example
CS 245 59
R V(R,A)=3V(R,B)=1V(R,C)=5V(R,D)=4
W = σZ=val(R) T(W) =
A B C Dcat 1 10 acat 1 20 bdog 1 30 adog 1 40 cbat 1 50 d
T(R)V(R,Z)
![Page 60: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/60.jpg)
Assumption:
Values in select expression Z=val are uniformly distributed over all V(R, Z) values
CS 245 60
![Page 61: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/61.jpg)
Alternate Assumption:
Values in select expression Z=val are uniformly distributed over a domain with DOM(R, Z) values
CS 245 61
![Page 62: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/62.jpg)
Example
CS 245 62
R V(R,A)=3, DOM(R,A)=10V(R,B)=1, DOM(R,B)=10V(R,C)=5, DOM(R,C)=10V(R,D)=4, DOM(R,D)=10
W = σZ=val(R) T(W) =
A B C Dcat 1 10 acat 1 20 bdog 1 30 adog 1 40 cbat 1 50 d
Alternate assumption
![Page 63: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/63.jpg)
Example
CS 245 63
R V(R,A)=3, DOM(R,A)=10V(R,B)=1, DOM(R,B)=10V(R,C)=5, DOM(R,C)=10V(R,D)=4, DOM(R,D)=10
W = σZ=val(R) T(W) =
A B C Dcat 1 10 acat 1 20 bdog 1 30 adog 1 40 cbat 1 50 d
Alternate assumption
what is probability thistuple will be in answer?
![Page 64: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/64.jpg)
Example
CS 245 64
R V(R,A)=3, DOM(R,A)=10V(R,B)=1, DOM(R,B)=10V(R,C)=5, DOM(R,C)=10V(R,D)=4, DOM(R,D)=10
W = σZ=val(R) T(W) =
A B C Dcat 1 10 acat 1 20 bdog 1 30 adog 1 40 cbat 1 50 d
T(R)DOM(R,Z)
Alternate assumption
![Page 65: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/65.jpg)
SC(R, A) = average # records that satisfyequality condition on R.A
T(R)
V(R,A)
SC(R,A) =
T(R)
DOM(R,A)CS 245 65
Selection Cardinality
![Page 66: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/66.jpg)
What About W = σz ³ val(R)?
T(W) = ?
CS 245 66
![Page 67: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/67.jpg)
What About W = σz ³ val(R)?
T(W) = ?
Solution 1: T(W) = T(R) / 2
CS 245 67
![Page 68: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/68.jpg)
What About W = σz ³ val(R)?
T(W) = ?
Solution 1: T(W) = T(R) / 2
Solution 2: T(W) = T(R) / 3
CS 245 68
![Page 69: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/69.jpg)
Solution 3: Estimate Fraction of Values in Range
Example: R
CS 245 69
ZMin=1 V(R,Z)=10
W = σz ³ 15(R)
Max=20
f = 20-15+1 = 6 (fraction of range)20-1+1 20
T(W) = f ´ T(R)
![Page 70: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/70.jpg)
Equivalently, if we know values in column:
f = fraction of distinct values ≥ val
T(W) = f ´ T(R)
CS 245 70
Solution 3: Estimate Fraction of Values in Range
![Page 71: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/71.jpg)
What About More Complex Expressions?E.g. estimate selectivity for
SELECT * FROM RWHERE user_defined_func(a) > 10
CS 245 71
![Page 72: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/72.jpg)
CS 245 72
![Page 73: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/73.jpg)
Size Estimate for W = R1 ⨝ R2
Let X = attributes of R1
Y = attributes of R2
CS 245 73
Case 1: X ∩ Y = ∅:
Same as R1 x R2
![Page 74: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/74.jpg)
R1 A B C R2 A D
CS 245 74
Case 2: W = R1 ⨝ R2, X ∩ Y = A
![Page 75: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/75.jpg)
R1 A B C R2 A D
CS 245 75
Case 2: W = R1 ⨝ R2, X ∩ Y = A
Assumption (“containment of value sets”):V(R1, A) £ V(R2, A) Þ Every A value in R1 is in R2
V(R2, A) £ V(R1, A) Þ Every A value in R2 is in R1
![Page 76: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/76.jpg)
R1 A B C R2 A D
Take 1 tuple Match
Computing T(W) whenV(R1, A) £ V(R2, A)
CS 245 76
1 tuple matches with T(R2) tuples...V(R2, A)
so T(W) = T(R1) ´ T(R2)V(R2, A)
![Page 77: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/77.jpg)
CS 245 77
V(R1, A) £ V(R2, A) ⇒ T(W) = T(R1) ´ T(R2)V(R2, A)
V(R2, A) £ V(R1, A) ⇒ T(W) = T(R1) ´ T(R2)V(R1, A)
![Page 78: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/78.jpg)
T(W) = T(R1) ⨯ T(R2)
max(V(R1, A), V(R2, A))
CS 245 78
In General for W = R1 ⨝ R2
Where A is the common attribute set
![Page 79: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/79.jpg)
Values uniformly distributed over domain
R1 A B C R2 A D
This tuple matches T(R2) / DOM(R2, A), so
T(W) = T(R1) T(R2) = T(R1) T(R2) DOM(R2, A) DOM(R1, A)
Assume these are the sameCS 245 79
Case 2 with Alternate Assumption
![Page 80: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/80.jpg)
Tuple Size after Join
In all cases:
S(W) = S(R1) + S(R2) – S(A)
size of attribute A
CS 245 80
![Page 81: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/81.jpg)
PAB(R)
σA=aÙB=b(R)
R ⨝ S with common attributes A, B, C
Set union, intersection, difference, …
CS 245 81
Using Similar Ideas, Can Estimate Sizes of:
![Page 82: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/82.jpg)
E.g. W = σA=a(R1) ⨝ R2
Treat as relation U
T(U) = T(R1) / V(R1, A) S(U) = S(R1)
Also need V(U, *) !!
CS 245 82
For Complex Expressions, Need Intermediate T, S, V Results
![Page 83: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/83.jpg)
To Estimate V
E.g., U = σA=a(R1)
Say R1 has attributes A, B, C, D
V(U, A) =
V(U, B) =
V(U, C) =
V(U, D) =
CS 245 83
![Page 84: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/84.jpg)
R1 V(R1, A)=3V(R1, B)=1V(R1, C)=5V(R1, D)=3
U = σA=a(R1)
A B C Dcat 1 10 10cat 1 20 20dog 1 30 10dog 1 40 30bat 1 50 10
CS 245 84
Example
![Page 85: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/85.jpg)
R1 V(R1, A)=3V(R1, B)=1V(R1, C)=5V(R1, D)=3
U = σA=a(R1)
A B C Dcat 1 10 10cat 1 20 20dog 1 30 10dog 1 40 30bat 1 50 10
CS 245 85
Example
V(U, A) = 1 V(U, B) = 1 V(U, C) = T(R1)V(R1,A)
V(U, D) = somewhere in between…
![Page 86: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/86.jpg)
V(U, A) = V(R, A) / 2
V(U, B) = V(R, B)
CS 245 86
Possible Guess in U = σA≥a(R)
![Page 87: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/87.jpg)
For Joins: U = R1(A,B) ⨝ R2(A,C)
We’ll use the following estimates:
V(U, A) = min(V(R1, A), V(R2, A))
V(U, B) = V(R1, B)
V(U, C) = V(R2, C)
Called “preservation of value sets”CS 245 87
![Page 88: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/88.jpg)
Example:
Z = R1(A,B) ⨝ R2(B,C) ⨝ R3(C,D)
T(R1) = 1000 V(R1,A)=50 V(R1,B)=100
T(R2) = 2000 V(R2,B)=200 V(R2,C)=300
T(R3) = 3000 V(R3,C)=90 V(R3,D)=500
R1
R2
R3
CS 245 88
![Page 89: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/89.jpg)
T(U) = 1000´2000 V(U,A) = 50200 V(U,B) = 100
V(U,C) = 300
Partial Result: U = R1 ⨝ R2
CS 245 89
![Page 90: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/90.jpg)
End Result: Z = U ⨝ R3
T(Z) = 1000´2000´3000 V(Z,A) = 50200´300 V(Z,B) = 100
V(Z,C) = 90V(Z,D) = 500
CS 245 90
![Page 91: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/91.jpg)
Another Statistic: Histograms
CS 245 91
10 20 30 40
5
10
1512
number of tuplesin R with A valuein a given range
σA=a(R) = ?
Requires some care to set bucket boundaries
σA≥a(R) = ?
![Page 92: Query Execution 2 and Query Optimizationcomplexity, performance and startup time CS 245 12 Example: Simple Query SELECT quantity * price FROM orders WHERE productId = 75 P quanity*price](https://reader034.fdocuments.net/reader034/viewer/2022050212/5f5e2bd6b79a9504b66dd507/html5/thumbnails/92.jpg)
Outline
What can we optimize?
Rule-based optimization
Data statistics
Cost models
Cost-based plan selection
CS 245 92