Download - Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

1

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 1Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Lecture 8

• Storage– Disk Architectures

• Indexes– Definition– Classification

• Tree (B+) vs. Hash

• Clustered vsNon-Clustered

• Sparse vs. Dense– Composite Search

Keys

• Join Algorithms– Nested Loop– Sort-Merge– Hash

• Query Optimization– RA equivalences– Generating Plans– Costing Plans

• Statistics– Enumerating Plans– Nested Queries


Lecture 8 cont.

• Physical Database Design and Tuning– Workload– Decisions to be

made– Heuristics– Tuning– Horizontal

Decomposition

2


We’ll just introduce these ideasand we’ll start from bottom

Query Optimization

Relational Operator Algs.

Files and Access Methods

Buffer Management

Disk Space Management

DB

Relation Algebra Query

Search for a cheap plan

Join algorithms, …

Heap, Index, …

Operating system levelIssues (may be handled byDBMS or by O/S)

how a disk works

1

2

3

4


Components of a Disk

Platters

• platters are always spinning (say, 120rps).

• one head reads/writes at any one time.

• to read a record:• position arm (seek)• engage head• wait for data to spin by• read (transfer data)

SpindleDisk head

Arm movement

Arm assembly

Tracks

Sector

3


More terminology

Each track is made up offixed size sectors.

Page size is a multiple ofsector size.

All the tracks that youcan reach from one position of the arm iscalled a cylinder(imaginary!).

Platters

SpindleDisk head

Arm movement

Arm assembly

Tracks

Sector


Cost of Accessing Data on Disk• Time to access (read/write) a disk block:

– seek time (moving arms to position disk head on track)– rotational delay (waiting for block to rotate under head)– transfer time (actually moving data to/from disk surface)

• Key to lower I/O cost: reduce seek/rotation delays!(you have to wait for the transfer time, no matter

what)

• Query cost is often measured in the number of page I/Os – often simplified to assume each page I/O costs the same

4


Disk Drive StatisticsSector size: 512 bytesSeek timeAverage 4-10 ms Track to track .6-1.0 ms Average Rotational Delay - 6 to 10 ms (rotational speed 10,000 RPM to 5,400RPM)Transfer Time - Sustained data rate.3-.1 msec per 8K page, or 25-75 Meg/secondDensity30GB/square inch


Time to access a disk page

RegistersOn Chip CacheOn Board Cache

Memory

Disk

12

10

100

Tape /Optical Robot

109

106

Sacramento

This CampusThis Room

My Head

10 min

1.5 hr

2 Years

1 min

Pluto

2,000 YearsAndromdeda

Clo

ck T

icks

Figure from AlphaSort paper – see

research.microsoft.com/~Gray

5


10,000 times slower than memory

• how much time is 10,000 seconds?

• contrast 1 second (to pick up a piece of paper)

• vs.• driving to Seattle to get it


Block (page) size vs. record size

• Page –smallest unit of transfer supported by OS• Block – Multiple of page, smallest unit of transfer

supported by an application or a disk volume.• Block and page are often used interchangeably.• “typical” record size … maybe a few hundred up to

1,000 bytes • “typical” page size 4K, 8K• When would we choose block size to be larger?• When would we choose block size to be smaller?

6


How to minimize the cost of Disk I/Os

• If possible, store a file to be read sequentially as follows:– Consecutive pages on same track, followed by– Consecutive tracks on same cylinder, followed by– Consecutive cylinders adjacent to each other– First two incur no seek time or rotational delay, seek for third is only

one-track.Remember: disk access time = seek time + rotational latency + transfer timeWhat is saved with this storage pattern?• In the book, they assume that all I/O operations take the same

amount of time. This is a simplification! Real query optimizers would consider sequential vs. random disk reads – because sequential reads are much faster.


Index for a File

• An Index is a data structure that speeds up selections on the search key field(s)

• An index transforms a search key k into a data entry k*.

• Given k*, you can get to the record(s) with the search key k in one I/O.

7


Real-life Indexes

• What is the search key? What is the data entry?– Library Catalog– Clerk in a video store– Terminal in a book store


Database Indexes• Given Emp(ID, name, age, address)• What are the possible search keys?• What data structure might be used for the index?• What could be the format of the data entry k*?Nota Bene:• You can build an index on any subset of the fields of

a table. • You can build more than one index for the same

table. • “Search key” is not the same as a key for the table.

Values of a “search key” need not be unique.

8


Most Indexes are Tree Structured

• Tree-structured indexes support range searches and equality searches.

– ISAM: static structure (old technology)…index is built just once, when the file is loaded. Uses overflow areas, so the tree can become very unbalanced.

– B+ tree: dynamic – index is adjusted as records are inserted and deleted in the file. Index remains balanced.


B+ Tree Indexes

Leaf pages contain data entries, and are chained (prev & next)Non-leaf pages have index entries; only used to direct searches:

P0 K 1 P 1 K 2 P 2 K m P m

index entry

Non-leafPages

Pages (Sorted by search key)

Leaf

9


Example B+ Tree

• Find 28*? 29*? All > 15* and < 30*• Insert/delete: Find data entry in leaf, then change

it. Need to adjust parent sometimes.– And change sometimes bubbles up the tree– This keeps the tree balanced: each data retrieval takes

the same number of I/Os and each page is always at least half full.

2* 3*

Root

17

30

14* 16* 33* 34* 38* 39*

135

7*5* 8* 22* 24*

27

27* 29*

Entries <= 17 Entries > 17

Note how data entriesin leaf level are sorted


Hash-Based Indexes

• Good for equality selections. • Index is a collection of buckets.

– Bucket = primary page plus zero or moreoverflow pages.

– Buckets contain data entries.• Hashing function h: h(r) = bucket in

which (data entry for) record r belongs. hlooks at the search key fields of r.– No need for “index entries” in this scheme.

10


Hash-based Index Examples

Smith,44,3000

Jones,40,6003

Tracy,44,5004

Ashby,25,3000

Basu,33,4003

Sagar,29,2007

Cass,50,5004

Kery,22,6003

h1age

3000

3000

5004

5004

4003

2007

6003

6003

h2sal

h1(age)=00

h1(age)=01

h1(age)=10

h2(sal)=00

h2(sal)=11


Costs of an Index

• If you define an index in your database, you will incur three costs– Space to store the index– Updates to the search key will be slower– The optimizer will take longer because it has more

choices• There is one advantage to having an index

– Some queries run faster (better be sure about this)

11


Smith, 44, 3000

Ashby, 25, 3000

Bristow, 30, 2007Basu, 33, 4003

Tracy, 44, 5004

Cass, 50, 5004Daniels, 22, 6003Jones, 40, 6003

Clustered Index: Records are sorted based on search key for the index

Search key is “Name”Recordsare sortedby “Name”in the file

Index Data File

Each pagecontains 3records.

AshbyBasu

CassBristow

DanielsJones

TracySmith


Smith, 44, 3000

Ashby, 25, 3000

Bristow, 30, 2007Basu, 33, 4003

Tracy, 44, 5004


AshbyCassSmith

Another kind of Clustered Index (Sparse)

Search key is “Name”

Recordsare sortedby “Name”in the file

Index Data File

Each pagecontains 3records.

12


Smith, 44, 3000

Ashby, 25, 3000

Bristow, 30, 2007Basu, 44, 4003

Tracy, 33, 5004


222530

40444450

33

Unclustered Index: Records NOT Sorted on Search Key

Search key is “Age”

Index Data File


Index Classification

Clustered, sparse indexes are smaller; they work well for range searches and sorting.

But…some useful optimizations are based on dense indexes.Note: one file can have at most one single-attribute clustered

index - all of the additional single-attribute indices must be unclustered. sparse dense

clustered YES YES

unclustered NO! YES

13


Clustered vs. Non-clustered Index

• Consider a telephone book as an index to telephone numbers– What is the primary search key?– Is it a clustered or unclustered index?– Is it a dense or sparse index?– Can you find a range of entries using this index?

• Imagine we have an unclustered index for a phone book based on street address– Can you efficiently find a range of entries – a

range of addresses?


Using Composite Search Keys

Which indexes can you use for each of these queries?

(use = the answer is one contiguous block of data in the data entry level)

• age = 12• age = 12 and sal = 20• age=12 and sal > 10• age > 12 and sal > 30

sue 13 75

bobcaljoe 12

10

208011

12

name age sal

<sal, age>

<age, sal> <age>

<sal>

12,2012,10

11,80

13,75

20,12

10,12

75,1380,11

11121213

10207580

Data recordssorted by name

14


Important refinement for unclustered indexes

1. Find qualifying data entries.2. Sort the rid’s of the data records to be retrieved.3. Fetch rids in order.

This ensures that each data page is looked at just once (though # of such pages likely to be higher than with clustering).

Challenges: What order do we sort the rids into?Will the file system allow this?


Join Algorithms – an Introduction

R ⋈ S is very common! And R × S followed by a selection is inefficient. So we process joins (rather than cross product) whenever possible. Lots of effort invested in join algorithms.

Assume: M pages in R, pR tuples per page, N pages in S, pStuples per page.

In our examples, R is Reserves and S is Sailors.Our algorithms work for any equijoins.

SELECT *FROM Reserves R1, Sailors S1WHERE R1.sid=S1.sid

15


Simple Nested Loops Join (very naive)

• For each tuple in the outer relation R, we scan the entire inner relation S, tuple by tuple. – Cost: M + (pR * M) * N = 1000 + 100*1000*500 I/Os– 50,001,000 I/Os ≈ 500,010 seconds ≈ 6 days

Join on ith column of R and jth column of Sforeach tuple r in R do

foreach tuple s in S doif ri == sj then add <r, s> to result

We assume approximately 100 I/Os per secondM = 1000 pages in R, pR = 100 tuples per page, N = 500 pages in S, pS = 80 tuples per page.


Simple Nested Loops Join (yes, this is dumb)

2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5

Memory Buffers:

16


Simple Nested Loops Join

2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5

Memory Buffers:2 ...12 …6 ...

... 2… 13

Query Answer2 … … 2



2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5


... 2… 13


No match:Discard!

17



2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5



No match:Discard!

… 12… 27



2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5



No match:Discard!

… 12… 27

18



2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5



No match:Discard!

… 1… 5



2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5



No match:Discard!

… 1… 5

19



2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5


... 2… 13


No match:Discard!



2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5


... 2… 13


No match:Discard!

20



2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5


Query Answer2 … … 212 … … 12

Match!

… 12… 27

And so forth …


Page-oriented Nested Loops Join

For each page of R, get each page of S, write out matching pairs of tuples <r, s>.

Cost: M + M*N = 1000 + 1000*500 = 501,000 (R outer)Cost: N + N*M = 500 + 500*1000 = 500,500 (S outer)Therefore – typically use smaller relation as outer

relation.500,000 I/Os ≈ 1.4 hours

for each page of tuples r in R dofor each page of tuples s in S do

(match all combinations in memory)if ri == sj then add <r, s> to result

21


Page-Oriented Nested Loops Join

2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5


... 2… 13

Once we’ve got these two pages in memory,check every combination from one pageto the other page!


Page-Oriented Nested Loops Join

2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5


Do the same thing…compare allcombinations in memory - between these two pages!

… 12… 27

22


The best loops-based join algorithm: Block Nested-Loops Join

• Algorithm:– One page is assigned to be the output buffer– One page assigned to input from S, B-2 pages assigned to input from R

Until all of R has been read {Read in B-2 pages of RFor each page in S {

Read in the single S pageCheck pairs of tuples in memory and output if they match } }

Cost: M + (M/(B-2))*N. For B=35, cost is 1000 + 1000*500/33 = 16,000 I/Os ≈ 3 minutes

2 ...12 …6 ...

1 …5 …27 …

R on disk... 2… 13

… 12… 27

S on disk

… 1… 5

B pages of Memory Buffer

1 …5 …27 …

2 ...12 …6 ...

... 2… 13


Index Nested Loops Join

If there is an index on the search key sj then can use the index on the inner table - get matching tuples!

Cost: M + ( (M*pR) * cost of finding matching S tuples) = 500 + (500*80*4) = 160,500 ≈ 1/2 hour (Reserves as inner) = 1000 + (1000*100*3) = 301,000 ≈ 1 hour (Sailor as inner)

foreach tuple r in R doforeach tuple s in S where ri == sj do

add <r, s> to result

For each R tuple, cost of probing S index is about 2-4 for B+ tree.

These could be smaller – if top levels of B+ tree are in memory

23


External Sorting

• Various relational operator algorithms require sorting a table

• Issue: table won’t fit in memory• Approach: Use merge-sort where sorted

runs can be read sequentially into memory

78 72 68 55 54 54 40

92 88 66 51 43

23 21 20 18 9 736

29


N-Way External Sorting• On the initial pass, read and write runs a memory full

at a time• Do an n-way merge rather than a 2-way merge• Each pass does 2*M I/Os (where M is number of

pages in the table)• Number of passes depends on how many pages of

memory are devoted to sorting– #Passes = Ceiling (Log B-1 (M/B))– Can sort 100 million pages in 4 passes with 129 pages of

memory space

• Can sort M pages using B memory pages in 2 passesif sqrt(M) <= B (this condition is satisfied often)

24


Sort-Merge Join1. Sort R on join attribute2. Sort S on join attribute3. Merge R and S

– Advance scan of R until current R-tuple >= current S tuple, then advance scan of S until current S-tuple >= current R tuple; do this until current R tuple = current S tuple.

– At this point, all R tuples with same value in Ri (current R group) and all S tuples with same value in Sj (current S group) match; output <r, s> for all pairs of such tuples.

– Then resume scanning R and S.R is scanned once; each S group is scanned once per matching

R tuple. Depends on the size of the group! If the matching group is small - matching is in memory.

Best case: cost is: Cost to sort R + Cost to sort S + (M+N) assuming all matches fit in memory

Worst case: R and S all have the same value - thus the matching group is the entire relation, for R and for S. Cost is: Cost to sort R + Cost to sort S + (M*N)


58 rusty 10 35.0

Example of Sort-Merge Join

Cost: (cost to sort R) + (cost to sort S) + (M+N) (in memory matches)With 35 buffers, Reserves and Sailors can each be sorted in 2 passesCost is: 4 * 1000 + 4 * 500 + 1000 + 500 = 7500

(we multiply by 4 because there are 2 passes, and we read and write each page, each pass)

sid sname rating age22 dustin 7 45.028 yuppy 9 35.031 lubber 8 55.5

sid bid day rname

28 103 12/4/96 guppy28 103 11/3/96 yuppy31 101 10/10/96 dustin31 102 10/12/96 lubber

44 guppy 5 35.031 101 10/11/96 lubber58 103 11/12/96 dustin

25


Cost of Sort-Merge

(cost to sort R)+(cost to sort S)+(cost to merge)Cost to sort M pages in 2 passes = 4*M. Why?Cost to merge is typically M+N. Why?

If both R and S can be sorted in 2 passes, thenCost is: 4M+4N+(M+N) = 5*(M+N)

There is an optimization (page 462 in our text) that improves this to 3*(M+N)

Thus the cost of joining Sailors and Reservations, assuming there are enough buffers to sort each table in two passes, is

5*(M+N) = 7500 Pages


Hash Join

Simple case – S fits in main memory– Build an in-memory hash index for S– Proceed as for index nested-loops join

Harder case – neither R nor S fits in memory– Divide them both in the same way (1 pass)

so that each partition of S fits in memory– Do the simple case on each pair of

matching partitions

26


PartitioningTable 1

2 …

4 …

7 …

11 …

13 …

19 …

24 …

24 …

27 …

Table 2

3 …

4 …

8 …

10 …

13 …

13 …

27 …

29 …

29 …

Partition 1: 1-10

Partition 2: 11-20

Partition 2: 21-30


Use Hash Function Instead of Ranges

• No guarantee we can find ranges of values that will divide Table 2 into roughly equal-sized partitions

• Apply hash function h to join valuePartition 1: h(val) = 1Partition 2: h(val) = 2Partition 3: h(val) = 3

27


Hash Join Cost

• Cost to partition R: 2M• Cost to partition S: 2N

Can do this in one pass if sqrt(M) <= B • Cost to join partitions: M+N• Total: 3*(M+N), same as sort-merge

with the optimization.


Comparison of Approximate Costs of Joining R and S, assuming 100 I/Os/second

1 minute4,500Hash join**

1 minute4,500Sort-Merge**

½ hour160,500Index Nested Loops3 minutes16000Block Nested Loops*

1.4 hours500,000Page Nested Loops

6 days50,000,000Simple Nested LoopsTimeI/OsAlgorithm

*Assuming 35 buffer pages

**Assuming appropriate files, M, satisfy sqrt(M) < pages of buffer

28


Summary – Algorithms for Relational Algebra Operators

• A virtue of relational DBMSs: queries are composed of a few basic operators; the implementation of these operators can be carefully tuned (and it is important to do this!).

• Many alternative implementation techniques for each operator; no universally superior technique for most operators.

• Must consider available alternatives for each operation in a query and choose best one based on system statistics, etc. This is part of the broader task of optimizing a query composed of several ops.


Query Optimization

Translate SQL query into a query tree(operators: relational algebra plus a few other ones)Generate other, equivalent query trees(e.g., using relational algebra equivalences)For each possible query tree:

select an algorithm for each operator (producing a query plan)estimate the cost of the plan

Choose the plan with lowest cost - of the plans considered (which is not necessarily all possible plans)

29


Initial Query Tree - Equivalent to SQL(without any algorithms selected)

SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid = S.sid AND

R.bid = 100 ANDS.rating > 5;

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

Relational Algebra Tree:SQL Query:


Relational Algebra Equivalences

• σc1∧… ∧ cn(R) ≡ σc1( … σcn(R))• This symbol means equivalence.• So you can replace σc1( … σcn(R)) with σc1∧… ∧ cn(R) • And you can replace σc1∧… ∧ cn(R) with σc1( … σcn(R))• If you have several conditions connected by “AND” in

a select operator, then you can apply them one at a time.

30


Example: σc1∧… ∧ cn(R) ≡ σc1( … σcn(R))



Reserves Sailors

sid=sid

rating > 5

sname

Reserves Sailors

sid=sid

bid=100 rating > 5

sname bid=100


Relational Algebra Equivalence

σc(R⋈S) ≡ σc(R)⋈S

Given a select operation following a join, if the select condition applies ONLY to one of the tables (R, in this example), then you can introduce a new select operator (before the join operator) with that condition.

This is called pushing down a SELECT.

31


Example: σc(R⋈S) ≡ σc(R)⋈SSELECT S.snameFROM Reserves R, Sailors SWHERE R.sid = S.sid AND


Reserves Sailors

sid=sid

rating > 5

sname

bid=100

This applies only tothe Sailors table!

Reserves

Sailors

sid=sid

rating > 5

sname

bid=100


Example: σc(R⋈S) ≡ σc(R)⋈S (cont.)SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid = S.sid AND


Reserves

Sailors

sid=sid

rating > 5

sname

bid=100

What are the advantages of “pushing” aselect past a join operator?

What are the disadvantages of “pushing” aselect past a join operator?

32


Relational Algebra Equivalences

• Selections:σc1∧… ∧ cn(R) ≡ σc1( … σcn(R)) Selects Cascadeσc1(σc2(R)) ≡ σc2(σc1(R)) Selects Commute

• Projections:πa (R) ≡ πa (πa1 (… πan(R))) If each ai contains a.

Only last project matters

• Joins:R⋈(S⋈T) ≡ (R⋈S)⋈T Joins are AssociativeR⋈S ≡ S⋈R Joins Commute

Try to prove that: R⋈(S⋈T) ≡ (T⋈R)⋈T


Some Rel. Algebra Equivalences have Constraints

σc1∧… ∧ cn(R) ≡ σc1( … σcn(R))Is this always true? No matter what the

conditions are … no matter what table R we use?

What about this one?πa(σc(R)) ≡ σc(πa(R))

Is this always true? No matter what the condition is? No matter what project list is used?

33


Equivalences with 2 or More Operations

• Projection commutes with a selectionPROVIDED that the selection uses attributes that are retained by the projection (c’s attrs ⊆ a):πa(σc(R)) ≡ σc(πa(R))

• A cross-product can be converted to a joinPROVIDED that the selection condition involves attributes of the tables involved in a cross-productσc(R × S) ≡ R⋈cS

• A selection can be pushed past a cross product (or join)PROVIDED the select uses just attributes of Rσc(R⋈S) ≡ σc(R)⋈S


Equivalences with 2 or More Operations

• Join distributes over the various set operators (union, intersection, difference. For example, with union:(Q⋈S) ∪ (R⋈S) ≡ (Q ∪ R)⋈S

34


Query Tree - Equivalent to SQL(without any algorithms selected)



Reserves Sailors

sid=sid

bid=100 rating > 5

sname

RA Tree:SQL Query:


Choosing algorithms for each operator(algorithms shown in red)

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

(Page-Oriented Nested Loops join)

(On-the-fly)

(On-the-fly)

RA Tree: One Possible Plan:

35


“On the fly”

On the fly means that we evaluate the operator in memory - while we have thetuple available.

“On the fly” induces no I/O cost!

Relation on left is assumed to be the outer relation - for any algorithm that uses nested loops.

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

(Page-Oriented Nested

Loops Join)

(On-the-fly)

(On-the-fly)

Plan:




Limitations of “On the fly”

• Can only happen if:– Computation can be done entirely on

tuples in memory– Results do not need to be materialized

• Cannot apply to all operations!

36


Cost of plan 1no index (Sailors – inner loop)

M = # of pages in outer tableN = # of pages in inner table

Cost of page-oriented nested loops join is:

M + M * N1000 + 1000 * 500 = 501,000

And the “on-the-fly” operations have no I/O - so plan cost is 501,000



Reserves Sailors

sid=sid

sname

(On-the-fly)

(On-the-fly)Plan:


Loops Join)

σ bid=100 ∧ rating > 5


How can we create other plans?

• One Answer: use relational algebra equivalences to produce new query trees.– Advantage: you are sure that the new tree is

equivalent to the original, because equivalences have proofs

• Or, assign different algorithms to operators• Example: Apply commutativity of join to the

current RA tree– What is the result?– Let’s use the same algorithms for Plan 2

37


Cost of plan 2no index (Reserves: inner loop)



Sailors as the outer relation rather than Reserves.

M = # of pages in outer tableN = # of pages in inner table

Cost of page-oriented nested loops join is:

500 + 500 * 1000 = 500,500And the “on-the-fly” operations

have no I/O - so plan cost is 500,500

ReservesSailors

sid=sid

σ bid=100 ∧ rating > 5

sname

(On-the-fly)

(On-the-fly)Plan:


Loops Join)

⋈


⋈

Cost of plan 3Push down selects

ReservesSailors

sid=sid

sname (On-the-fly)

Plan:




Loops Join)

Apply this equivalence:σc(R⋈S) ≡ σc(R)⋈S

To the previous query tree to get an equivalent query tree.

What is the cost of the plan shown?Scanning sailors and reserves cost

M+N I/Os. What about the cost of the join? It

depends on how many reservations are for boat 100 and how many sailors have a rating >5.

How would you find this information?Statistics will help.

σ rating > 5σbid=100 (On-the-fly)

38


To estimate cost, we need table sizes

• For all operators beyond the leaf level of the query plan, the input tables are the result of some earlier query.

• Thus, we need to estimate the size of intermediate results!(This can be difficult. This is one reason why the cost estimates may not be very good. Estimation errors tend to compound.)

• For example, what information would you need to estimate how many reservations are for bid 100 and how many sailors have a rating >5?


Estimating Sizes

• Size of table = size of tuple * number of rows returned– Size of tuple = sum of size of columns

• Index entries also have a size, just like tuples

• If table is M bytes and a page is N bytes, it takes M/N I/O operations to perform a full scan of the table

39


DBMS Usually Maintains Some Statistics in the DB Catalog

• Catalogs typically contain at least:– # tuples and # pages for each table.– # distinct key values and # pages for each index.– Index height, low/high key values for each tree index.

• Catalogs are updated periodically - say, once a week or once a month. Perhaps they’re updated during the backup.

• Simplest case: assume that all attribute values are uniformly distributed. Thus if gender was an attribute, the optimizer would assume that half of the rows have the male value and other half have the female value. (This might be grossly inaccurate.)


Calculating Selectivities

• Assume that rating values range from 1 to 10, and that bid values range from 1 to 100.

• What percentage of the incoming tuples, to the operator σ bid=100 , will be output?

• What about σ rating > 5 ?• σ bid=100 ∧ rating > 5 ?

40


Doing better than a uniform distribution

• The DBMS might gather more detailed information about how the values of attributes are distributed (e.g., histograms of the values in a field) and store it in the catalog.

Suppose there was an attribute degree-programwith three possible values: “BS CS” “MS CS” “PhD CS” – Then the DBMS might count the values and know that there are

428 “BS CS” values, 98 “MS CS” values and 25 “PhD CS” values.– This allows much better estimate of the reduction factor.


Independence of Reduction Factors

• So far, in our quest to compute costs of plans, we have gathered statistics and shown how to use them, assuming uniform distributions. But what about a select operator with terms separated by AND?– Example: σ bid=100 ∧ rating > 5

• We assume that all terms are independent!• Thus, if one attribute is class and the other is number-of-

hours - the query optimizer might assume that class is uniformly distributed over {Fresh, Soph, Jun, Sen} and that number-of-hours is uniformly distributed over {0, 1, …, 205}– But, we know that only class correlates with number-of-hours! Might

even be that number-of-hours → class.• What percentage of the incoming tuples, to the operator

σ bid=100 ∧ rating > 5 , will be output?

41


Enumerating Plans for Multiple Joins• Back to the problem of generating plans.• Are we trying to generate as many plans as possible?

– No, best to generate few, as long as cheap plans are among them.

• In System R: only left-deep join trees are considered.

BA

C

D

BA

C

D

C DBA

This one is left-deep - the other two are not.


Queries Over Multiple Relations (Joins)Left-deep trees allow us to

generate all fully pipelined plans.

• Intermediate results not written to temporary files.

• Not all left-deep trees are fully pipelined (e.g., SM join).

• Using only left-deep plans (obviously) restricts the search space. (So optimizer may not find the optimal plan.)

BA

C

D

42


Enumeration of Left-Deep Plans

• Need to consider all possible left-deep plans.– For SPJ queries, these are enumerated by orderings of the tables

• For each ordering, consider the access method for each relation and the join method for each join.

• Enumerated using N passes (if N relations joined):– Pass 1: Find best 1-relation plan for each relation.– Pass 2: Find best way to join result of each 1-relation plan (as outer) to

another relation. (All 2-relation plans.)– Pass N: Find best way to join result of a (N-1)-relation plan (as outer) to

the N’th relation. (All N-relation plans.)• At the end of each pass, retain only:

– Cheapest plan overall, plus– Cheapest plan for each interesting order of the tuples.

• Interesting order: corresponding to a join, ORDER BY or GROUP BY


Nested Queries• Nested block is optimized independently,

with the outer tuple considered as providing a selection condition.

• Outer block is optimized with the cost of ‘calling’ the nested block computation taken into account.

• Implicit ordering of these blocks means that some good strategies are not considered.

• The non-nested version of the query is typically optimized better. The optimizer might not find it from the nested version, so you may need to explicitly unnest the query.

SELECT S.snameFROM Sailors SWHERE EXISTS

(SELECT *FROM Reserves RWHERE R.bid=103 AND R.sid=S.sid)

Nested block to optimize:SELECT *FROM Reserves RWHERE R.bid=103

AND S.sid= outer value

Equivalent non-nested query:SELECT S.snameFROM Sailors S, Reserves RWHERE S.sid=R.sid

AND R.bid=103

43


Query Optimizers Don’t (Always) Find the Best Plan

• There are usually more plans than you can consider, even if only left deep plans are considered.

• The optimizer might not even try to generate all possible plans (it won’t be able to consider all of them anyway).

• Sometimes the optimizer will compare the optimization cost to the estimated execution cost and quit early.


Physical DB Design

Now that we know how query optimizers work…

How do we choose...the file organizationsandthe indices

for our database?

This is called “physical design” of a database.

This is sort of like query optimization - backwards.

44


We Need to Understand the Workload

• For each query in the workload:

– Which relations does it access?– Which attributes are retrieved?– Which attributes are involved in selection/join conditions?

How selective are these conditions likely to be?

• For each update in the workload:

– Which attributes are involved in selection/join conditions? How selective are these conditions likely to be?

– The type of update (INSERT/DELETE/UPDATE), and the attributes that are affected.


Physical Database Design Issues

• Choosing the right indexes• Mapping tables to physical storage• Partitioning tables

45


Choice of Indexes

• Consider the queries, one by one, in order of importance. – Consider the best plan for this query using the current indexes,

and see if a better plan is possible with an additional index. – If so, create it.

• But...before creating an index, we must consider the impact on updates!

It’s a trade-off: indexes make queries faster and updates slower.


Issues to Consider in Index Selection

• Attributes mentioned in a WHERE clause are candidates for indices

• Clustering is useful for range queries, equality queries with duplicates, and sorting

• Hash or Tree index?

• Try to choose indexes that benefit as many queries as possible.

• Remember only one index can be clustered per relation!

46


Example 1

• Hash index on E.dno allows us to get matching (inner) Emptuples for each selected (outer) Dept tuple.– What plans could use this index?

• Hash index on D.dname supports ‘Toy’ selection.– Given this, index on D.dno is not needed – why?– What plans could use this hash index?

• What if WHERE included: `` ... AND E.age=25’’ ?– Could retrieve Emp tuples using index on E.age, then join with Dept

tuples satisfying dname selection. – So, if E.age index is already created, is it worth adding an index on

E.dno?

SELECT E.ename, D.mgrFROM Emp E, Dept DWHERE D.dname=‘Toy’ AND E.dno=D.dno


Example 2

• What if we build a hash index on D.dno.– Which plans can use that index?

• What index should we build on Emp?– B+ tree on E.sal could be used, OR an index on E.hobby could be used.

Only one of these is needed, and which is better depends upon the selectivity of the conditions.

• As a rule of thumb, equality selections more selective than range selections.

• As both examples indicate, our choice of indexes is guided by the plan(s) that we expect an optimizer to consider for a query. Have to understand optimizers!

SELECT E.ename, D.mgrFROM Emp E, Dept DWHERE E.sal BETWEEN 10000 AND 20000

AND E.hobby=‘Stamps’ AND E.dno=D.dno

47


Examples of Clustering• B+ tree index on E.age can be

used to get qualifying tuples.– How selective is the condition?– Is the index clustered?

• Consider the GROUP BY query.– If many tuples have E.age > 10, using

E.age index and sorting the retrieved tuples may be costly.

– Clustered E.dno index may be better!

• Equality queries and duplicates:– Clustering on E.hobby helps!

SELECT E.dnoFROM Emp EWHERE E.age>40

SELECT E.dno, COUNT (*)FROM Emp EWHERE E.age>10GROUP BY E.dno

SELECT E.dnoFROM Emp EWHERE E.hobby=Stamps


Index-Only Plans

• A number of queries can be answered without retrieving any tuples from one or more of the relations involved if a suitable indexis available.

SELECT D.mgrFROM Dept D, Emp EWHERE D.dno=E.dno

SELECT E.dno, COUNT(*)FROM Emp EGROUP BY E.dno

SELECT E.dno, MIN(E.sal)FROM Emp EGROUP BY E.dno

<E.dno>

<E.dno>

<E.dno, E.sal>Tree index!

48


Index rules of thumb

• Don’t use indexes on small tables (<200 rows)

• Don’t use indexes on columns with few values (T/F, state)

• For most systems, indexes on primary keys and foreign keys are sufficient

• Don’t forget to add indexes when the schema changes!


Storage structure

chunk (segment, container)

extent1

extentn

…

Disk1dbspace (tablespace, database)

Diski

Disk sharing

page (block) pageN…Tables

Disk allocation

Logical grouping

49


The order in which you create chunks matters

• The system does maintenance tasks (for example, page cleaning during checkpoints) that cause it to visit each chunk in turn

• If you create chunks in a round-robin fashion across disks, can parallelize these tasks

Chunk 3Chunk 1

Chunk 4Chunk 2

Disk 1 Disk 2

Chunk 3 Chunk 4

Chunk 2Chunk 1

Disk 1 Disk 2


Where you put your tables matters

• If you have multiple disks (and you should have), then you should distribute your data to take advantage of it

• Keep your system data and your application data separate

50


How much space you give your tables matters

• Most major RDBMS systems allow you to specify a default extent size space for tables

• A single extent is contiguous space

• If you can fit your whole table into one extent, then your clustering will be a lot more effective


Tuning the Conceptual Schema

The choice of conceptual schema should be guided by the workload, in addition to redundancy issues:– We might denormalize or we might add fields to a

relation. – We must take care to avoid the problems caused by

redundancy!– We’ll cover these next lecture– We might also consider decompositions (partitioning)– Workloads may change over time

51


Partitioning of one table into several

• Relation is replaced by a collection of relations that are projections. (Vertical partitioning)

• Sometimes, might want to replace relation by a collection of relations that are selections. (Horizontal partitioning)– Each new relation has same schema as the original, but a subset

of the rows.– Collectively, new relations contain all rows of the original.

Typically, the new relations are disjoint.– Many database vendors provide specific syntax for horizontal

partitioning


Tuning the Conceptual Schema: Vertical decomposition using the project operator

Coursec# cname instructor room days

Course2c# cname

Course1c# room days

Course3c# instructor

52


Tuning the Conceptual Schema:Horizontal partition/decomposition using the

select operatorCoursec# cname instructor room days

Undergraduate-Coursec# cname instructor room days

Graduate-Coursec# cname instructor room days


How you partition your data matters

• Most vendors allow you to partition– In a round-robin fashion– By expression

• Round-robin is useful if:– The majority of the rows

will be examined during queries

– You expect the values to be evenly distributed

Row 1Row 2Row 3

Row 5Row 4

Row 6

53


Partitioning by expression

• Partitioning by expression is useful if:– Non-overlapping fragments can be created on a

single column• The database will eliminate entire fragments during

query

– Data access is not evenly distributed• Regularly access data can be divided among multiple

databases• Rarely accessed data can be held in a single fragment

– The order of your expression matters!


Partitioning by expression (cont.)

col1 > 3000 in dbspace1col1 > 2000 and col1 <= 3000 in dbspace2col1 <= 2000 in dbspace3

• Search for col1>2888• Search for col1=3345• What if this were

round-robin?

5635412988

31113345

2931

54


Masking Conceptual Schema Changes

• The replacement of Contracts by LargeContracts and SmallContracts can be masked by the view.

• However, queries with the condition val>10000 must be asked wrt LargeContracts for efficient execution: so users concerned with performance have to be aware of the change.

CREATE VIEW Contracts(cid, sid, jid, did, pid, qty, val)AS SELECT * FROM LargeContractsUNIONSELECT *FROM SmallContracts


Tuning Queries and Views

• If a query runs slower than expected, check if an index needs to be re-built, or if statistics are too old.

• Sometimes, the DBMS may not be executing the plan you had in mind. Common areas of weakness:

– Selections involving null values.– Selections involving arithmetic or string expressions.– Selections involving OR conditions.– Lack of evaluation features like index-only strategies or

certain join methods or poor size estimation.

• Check the plan that is being used! Then adjust the choice of indexes or rewrite the query/view.

55


Rewriting SQL Queries• Complicated by interaction of:

– NULLs, duplicates, aggregation, subqueries.• Guideline: Use only one “query block”, if

possible.

SELECT DISTINCT *FROM Sailors SWHERE S.sname IN

(SELECT Y.snameFROM YoungSailors Y)

SELECT DISTINCT S.*FROM Sailors S,

YoungSailors YWHERE S.sname = Y.sname

=


More Guidelines for Query Tuning• Minimize the use of DISTINCT: don’t need it if duplicates are

acceptable, or if answer contains a key. • Minimize the use of GROUP BY and HAVING:

SELECT MIN (E.age)FROM Employee EGROUP BY E.dnoHAVING E.dno=102

SELECT MIN (E.age)FROM Employee EWHERE E.dno=102

Consider DBMS use of index when writing arithmetic expressions: E.age=2*D.age will benefit from index on E.age, but might not benefit from index on D.age!