Shaping Optimizer's Search Space

Shaping the Optimizer’s Search-Space @MarkusWinand

The Optimizer’s Search-Space is Limited

“the query optimizer determinesthe most efficient execution plan*”

...most efficient? Out of what?

*http://docs.oracle.com/cd/E16655_01/server.121/e15857/pfgrf_perf_overview.htm#TGDBA94082

http://docs.oracle.com/cd/E16655_01/server.121/e15857/pfgrf_perf_overview.htm#

The Optimizer’s Search-Space is Limited

The Optimizer...

‣Considers existing indexes only ➡ Other indexes might give even better performance

‣Doesn’t de-obfuscate queries very well ➡ Writing it in simpler terms might improve performance

‣Has built-in limitations ➡ Some theoretically possible plans are never considered

Bring the Best Plan in the Search-Space

... it determines the most efficient execution plan out of the remaining ones.

Before the optimizer can find theabsolutely best plan we must first

make sure it is within these boundaries.

Two steps to get the absolutely best access path:

1. Maximize data-locality‣ Plain old B-tree index is the #1 tool for that‣ Partitions are greatly overrated‣ Table clusters are slightly underrated

It’s All About Matching Queries to Indexes

2. Write the query to exploit it‣ Use explicit range conditions‣ Use top-n based termination‣ Exploit index order

Thinking in

OrderedSets

Visualizing Indexes as Pyramids

Visualize Simplify

The Order of Multi-Column Indexes

Using Indexes:

Column Order Defines Row-Locality

Example: WHEREA>:aANDB=:b

Using Indexes:


Simple-man’s guidelines (best in ~97% of the cases):

‣Conjunctive equality conditions are king Column order doesn’t affect data-locality ➡ Put them first into the index and choose the column

order so that other queries can use the index too.

‣Conjunctive range conditions are tricky Column order affects data-locality ➡ Put them after the equality columns. If there are

multiple range conditions, put the most-selective first.

Using Indexes:


Common mistakes:‣Arbitrary column order ☠ (bad)

“Just put all columns from the where-clause in the index” ➡ Works only for all-conjunctive all-equality searches ➡ Doesn’t make the index useful for other queries

‣Most-selective first ☠ (bad) “Order the columns according to the selectivity” ➡ Only valid to prioritize among multiple range conditions

Using Indexes:

Finding Bad Index Row-Locality

------------------------------------|Id|Operation|------------------------------------|0|SELECTSTATEMENT||1|TABLEACCESSBYINDEXROWID||*2|INDEXSKIPSCAN|------------------------------------PredicateInformation:------------------------------------2-access("B"=20AND"A">25)filter("B"=20)

Index on (A, B)------------------------------------|Id|Operation|------------------------------------|0|SELECTSTATEMENT||1|TABLEACCESSBYINDEXROWID||*2|INDEXRANGESCAN|------------------------------------PredicateInformation:------------------------------------2-access("B"=20AND"A">25)

Index on (B, A)

Mostefficient s

olution Most efficient

workaround

‣ Index filter predicates are a “bad smell”‣ Index Skip Scan is a “bad smell”‣ Index Fast Full Scan is a “bad smell”

Using Indexes:

Trailing-Columns to Avoid Table-Access

Example: SELECTCFROMXWHEREA>:aANDB=:b

Using Indexes:


Add all needed columns to the index to avoid table access.The so-called index-only scan.

‣Useful to nullify a bad clustering factor Consequently, not very useful if ➡ Clustering factor close to the number of table blocks or ➡ Selecting only a few rows

‣A single non-indexed column breaks it No matter where it is mentioned (SELECT, ORDERBY,...) ➡ All or nothing: no benefit from adding some SELECT

columns to the index.

Using Indexes:


Common mistakes:

‣Selecting unneeded columns* ☠ (bad) SELECT * anybody? ORM-tools in use? Hooray! ➡ Adding many columns to many indexes is a no-no.

‣Pushing too hard ☠ (bad) ➡ Index gets bigger, clustering factor (CF) gets worse ➡ Small benefit for low CF or if selecting a few rows only ➡ You’ll hit the hard limits (32 columns, 6398 bytes@8k)

* http://use-the-index-luke.com/blog/2013-08/its-not-about-the-star-stupid



1. Maximize data-locality ‣ Plain old B-tree index is the #1 tool for that ‣ Partitions are greatly overrated ‣ Table clusters are slightly underrated

2. Write the query to exploit it ‣ Use explicit range conditions ‣ Use top-n based termination ‣ Exploit index order

Thinking in

OrderedSets

✓ ✓

Example:

List yesterday’s orders

CREATETABLEorders(...,order_dtDATENOTNULL,...);

INSERTINTOorders(...,order_dt,...)VALUES(...,sysdate,...);

100k rows Evenly distributed

over 4 weeks.

Example:


1. Maximize data-locality

Example:


1. Lower bound:ORDER_DT>=TRUNC(sysdate-1)

2. Upper bound:ORDER_DT<TRUNC(sysdate)

2. Write query using explicit range conditions

----------------------------------------------|Id|Operation|----------------------------------------------|0|SELECTSTATEMENT||*1|FILTER||2|TABLEACCESSBYINDEXROWIDBATCHED||*3|INDEXRANGESCAN|----------------------------------------------PredicateInformation(identifiedbyoperationid):---------------------------------------------------1-filter(TRUNC(SYSDATE@!)>TRUNC(SYSDATE@!-1))3-access("ORDER_DT">=TRUNC(SYSDATE@!-1)AND"ORDER_DT"<TRUNC(SYSDATE@!))

Example:


Common anti-pattern:

‣TRUNC(order_dt)=:yesterday☠ (bad)This is an “obfuscation” of the actual intention➡Requires function-based indexCREATEINDEX…(TRUNC(order_dt));

➡Doesn’t support ordering by order_dtWHERETRUNC(order_dt)=:yesterdayORDERBYorder_dtDESC;

Indexnot ordered b

y that

--------------------------------------|Id|Operation|--------------------------------------|0|SELECTSTATEMENT||*1|FILTER||2|TABLEACCESSBYINDEXROWID||*3|INDEXRANGESCANDESCENDING|--------------------------------------PredicateInformation(identifiedbyoperationid):---------------------------------------------------1-filter(TRUNC(SYSDATE@!)>TRUNC(SYSDATE@!-1))3-access("ORDER_DT"<TRUNC(SYSDATE@!)AND"ORDER_DT">=TRUNC(SYSDATE@!-1))

Example:

List yesterday’s orders reverse chronologically

1. Lower & upper bounds:ORDER_DT>=TRUNC(sysdate-1)ORDER_DT<TRUNC(sysdate)

2. OrderORDERBYORDER_DTDESC

2. Write query - exploit index order

--------------------------------------|Id|Operation|--------------------------------------|0|SELECTSTATEMENT||*1|FILTER||2|TABLEACCESSBYINDEXROWID||*3|INDEXRANGESCANDESCENDING|--------------------------------------PredicateInformation(identifiedbyoperationid):---------------------------------------------------1-filter(TRUNC(SYSDATE@!)>TRUNC(SYSDATE@!-1))3-access("ORDER_DT"<TRUNC(SYSDATE@!)AND"ORDER_DT">=TRUNC(SYSDATE@!-1))

Example:


2. Write query - exploit index order1. Lower & upper bounds:

TRUNC(ORDER_DT) =TRUNC(sysdate)-1


Example:


2. Write query - exploit index order

----------------------------------------------|Id|Operation|----------------------------------------------|0|SELECTSTATEMENT||1|SORTORDERBY||2|TABLEACCESSBYINDEXROWIDBATCHED||*3|INDEXRANGESCAN|----------------------------------------------PredicateInformation(identifiedbyoperationid):---------------------------------------------------3-access("ORDERS"."SYS_NC00004$"=TRUNC(SYSDATE@!-1))

Tradeoff:

CPU MemoryIO

1. Lower & upper bounds:TRUNC(ORDER_DT) =TRUNC(sysdate)-1


Example:

List orders from last 24 hours

1. Data-locality for the TRUNC variant

* http://www.sqlfail.com/2014/05/05/oracle-can-now-use-function-based-indexes-in-queries-without-functions/


Example:


-------------------------------------------------|Id|Operation|-------------------------------------------------|0|SELECTSTATEMENT||*1|TABLEACCESSBYINDEXROWIDBATCHED||*2|INDEXRANGESCANonTRUNC(ORDER_DT)|-------------------------------------------------PredicateInformation(identifiedbyoperationid):---------------------------------------------------1-filter("ORDER_DT">SYSDATE@!-1)2-access("ORDERS"."SYS_NC00004$">=TRUNC(SYSDATE@!-1))

2. Upper bound: none (unbounded)

1. Lower bound:ORDER_DT>sysdate-1To use FBI Oracle adds (since 11.2.0.2*) TRUNC(ORDER_DT)>=TRUNC(sysdate-1)

* http://www.sqlfail.com/2014/05/05/oracle-can-now-use-function-based-indexes-in-queries-without-functions/

Example:


1. Maximize data-locality using straight index

Example:


2. Write query using explicit range conditions1. Lower bound:

ORDER_DT>sysdate-12. Upper bound: none (unbounded)

--------------------------------------------|Id|Operation|--------------------------------------------|0|SELECTSTATEMENT||1|TABLEACCESSBYINDEXROWIDBATCHED||*2|INDEXRANGESCAN|--------------------------------------------PredicateInformation:----------------------2-access("ORDER_DT">SYSDATE@!-1)

Example:


--------------------------------------------|Id|Operation|--------------------------------------------|0|SELECTSTATEMENT||1|TABLEACCESSBYINDEXROWIDBATCHED||*2|INDEXRANGESCAN|--------------------------------------------PredicateInformation:----------------------2-access("ORDER_DT">SYSDATE@!-1)

--------------------------------------------|Id|Operation|--------------------------------------------|0|SELECTSTATEMENT||*1|TABLEACCESSBYINDEXROWIDBATCHED||*2|INDEXRANGESCAN|--------------------------------------------PredicateInformation:----------------------1-filter("ORDER_DT">SYSDATE@!-1)2-access("ORDERS"."SYS_NC00004$">=TRUNC(SYSDATE@!-1))

Most efficient solution

Most efficient

workaroun

d





Thinking in

OrderedSets

✓

✓

✓

✓

Example:

List 10 Most Recent Orders


Example:



1. Lower bound...? After 10 rows...???2. Upper bound? sysdate? Unbounded!

Example:


1. Lower bound...? After 10 rows...???2. Upper bound? sysdate? Unbounded!

2. Write query using top-n based termination

3. Start with: most recentORDERBYORDER_DTDESC

4. Stop after: 10 rowsFETCHFIRST10ROWSONLY (since 12c)

Example:


2. Write query using top-n based termination3. Start with: most recent

ORDERBYORDER_DTDESC4. Stop after: 10 rows

FETCHFIRST10ROWSONLY (since 12c)----------------------------------------------------------|Id|Operation|A-Rows|Buffers|----------------------------------------------------------|0|SELECTSTATEMENT|10|8||*1|VIEW|10|8||*2|WINDOWNOSORTSTOPKEY|10|8||3|TABLEACCESSBYINDEXROWID|11|8||4|INDEXFULLSCANDESCENDING|11|3|----------------------------------------------------------PredicateInformation(identifiedbyoperationid):---------------------------------------------------1-filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=10)2-filter(ROW_NUMBER()OVER(ORDERBYORDER_DTDESC)<=10)ROW_NUMBER()OVER(ORDERBYORDER_DTDESC)<=10

SELECTorders.*,ROW_NUMBER()OVER(ORDERBYorder_dtDESC)rnFROMorders

Window-Functions for Top-N Termination

SELECT*FROM(SELECTorders.*,ROW_NUMBER()OVER(ORDERBYorder_dtDESC)rnFROMorders)WHERErn<=10ORDERBYorder_dtDESC;


Select 10 rows


SELECT*FROM(SELECTorders.*,DENSE_RANK()OVER(ORDERBYTRUNC(order_dt)DESC)rnFROMorders)WHERErn<=1ORDERBYorder_dtDESC;

Select 1 group



Useful to

abort on edges



---------------------------------------------------------------------------|Id|Operation|E-Rows|A-Rows|Buffers|Reads|---------------------------------------------------------------------------|0|SELECTSTATEMENT||2057|695|695||1|SORTORDERBY|100K|2057|695|695||*2|VIEW|100K|2057|695|695||*3|WINDOWNOSORTSTOPKEY|100K|2057|695|695||4|TABLEACCESSBYINDEXROWID|100K|2058|695|695||5|INDEXFULLSCANDESCENDING|100K|2058|8|8|---------------------------------------------------------------------------DE

NSE_RANK


SELECT*FROMordersWHERETRUNC(order_dt)=(SELECTTRUNC(MAX(order_dt))FROMorders)ORDERBYorder_dt;


NSE_RANK


---------------------------------------------------------------------------------|Id|Operation|E-Rows|A-Rows|Buffers|Reads|---------------------------------------------------------------------------------|0|SELECTSTATEMENT||2057|1038|694||1|SORTORDERBY|3448|2057|1038|694||2|TABLEACCESSBYINDEXROWIDBATCHED|3448|2057|1038|694||*3|INDEXRANGESCAN|3448|2057|10|8||4|SORTAGGREGATE|1|1|2|2||5|INDEXFULLSCAN(MIN/MAX)|1|1|2|2|---------------------------------------------------------------------------------


NSE_RANK

SUB-SELECT

Top-N vs. Max()-Subquery

Common mistakes:‣Breaking ties with sub-queries ☠ (bad) WHERE(a,b)=(selectmax(a),max(b)...)

➡ max() values coming from different rows... ➡ No rows selected.

‣Selecting Nth largest ☠ (bad) WHEREX<(SELECTMAX()...WHEREX<(SELECTMAX()...))

WHERE(N-1)=(SELECTCOUNT(DISTINCT(DT))...


Two steps to get the absolutely best execution plan:



Thinking in

OrderedSets

✓

✓

✓

✓ ✓


Example:

List next 10 orders

Example:

List next 10 orders

2. Use explicit range condition & top-n abort 1. Lower bound: unbounded (top-n)2. Upper bound: where we stopped

WHEREORDER_DT<:prev_dt3. ORDERBYORDER_DTDESC

4. FETCHFIRST10ROWSONLY

What about ties?

Explicit range conditions: the general case

Example:

List next 10 orders


Example:

List next 10 orders

1. Use definite sort order 2. Use Row-Value filter to

remove what we have seen before (SQL:92)

3. Hit Enter


Example:

List next 10 orders


Example:

List next 10 orders

(x,y)=(a,b)

(x,y)IN((a,b),(c,d))

(x,y)<(a,b)

(x,y)>(a,b)

✓✓

✗

✗ Oracle

limitation


Example:

List next 10 orders

Oracle

limitation

Two semantically equivalent workarounds:

X<=AANDNOT(X=AANDY>=B)

(X<A)OR(X=AANDY<B)

* http://use-the-index-luke.com/sql/partial-results/fetch-next-page#sb-equivalent-logic

☠No proper index use*

Using OFFSET to fetch next rows

‣After adding FETCHFIRST...ROWSONLY, with SQL:2008, SQL:2011 introduced OFFSET to skip rows.‣Rows can be skipped with the ROWNUM pseudo column too (ROWNUM>:x)‣ROW_NUMBER() can do the trick too.

It doesn’t matter how to write it, ...

Using OFFSET to fetch next rows

OFFSET = SLEEP The bigger the number,

the slower the execution. Even worse: it eats up resources

and yields drifting results.





Thinking in

OrderedSets

✓

✓

✓

✓ ✓✓

Index Smart, Not Hard

About @MarkusWinand‣Training for Developers ‣ SQL Performance (Indexing) ‣ Modern SQL ‣ On-Site or Online

‣SQL Tuning ‣ Index-Redesign ‣ Query Improvements ‣ On-Site or Online

http://winand.at/

About @MarkusWinand@ModernSQL

http://modern-sql.com@SQLPerfTips

http://use-the-index-luke.com

Shaping Optimizer's Search Space

Software

Transcript of Shaping Optimizer's Search Space