Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say,...

55
CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 1 Some slides adapted from R. Ramakrishnan, with permission Lecture 7 Lecture 8 Storage Disk Architectures Indexes – Definition – Classification Tree (B+) vs. Hash Clustered vs Non-Clustered Sparse vs. Dense Composite Search Keys Join Algorithms Nested Loop – Sort-Merge – Hash Query Optimization RA equivalences Generating Plans Costing Plans • Statistics Enumerating Plans Nested Queries CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 2 Some slides adapted from R. Ramakrishnan, with permission Lecture 7 Lecture 8 cont. • Physical Database Design and Tuning – Workload Decisions to be made – Heuristics – Tuning – Horizontal Decomposition

Transcript of Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say,...

Page 1: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

1

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 1Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Lecture 8

• Storage– Disk Architectures

• Indexes– Definition– Classification

• Tree (B+) vs. Hash

• Clustered vsNon-Clustered

• Sparse vs. Dense– Composite Search

Keys

• Join Algorithms– Nested Loop– Sort-Merge– Hash

• Query Optimization– RA equivalences– Generating Plans– Costing Plans

• Statistics– Enumerating Plans– Nested Queries

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 2Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Lecture 8 cont.

• Physical Database Design and Tuning– Workload– Decisions to be

made– Heuristics– Tuning– Horizontal

Decomposition

Page 2: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

2

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 3Some slides adapted from R. Ramakrishnan, with permission Lecture 7

We’ll just introduce these ideasand we’ll start from bottom

Query Optimization

Relational Operator Algs.

Files and Access Methods

Buffer Management

Disk Space Management

DB

Relation Algebra Query

Search for a cheap plan

Join algorithms, …

Heap, Index, …

Operating system levelIssues (may be handled byDBMS or by O/S)

how a disk works

1

2

3

4

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 4Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Components of a Disk

Platters

• platters are always spinning (say, 120rps).

• one head reads/writes at any one time.

• to read a record:• position arm (seek)• engage head• wait for data to spin by• read (transfer data)

SpindleDisk head

Arm movement

Arm assembly

Tracks

Sector

Page 3: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

3

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 5Some slides adapted from R. Ramakrishnan, with permission Lecture 7

More terminology

Each track is made up offixed size sectors.

Page size is a multiple ofsector size.

All the tracks that youcan reach from one position of the arm iscalled a cylinder(imaginary!).

Platters

SpindleDisk head

Arm movement

Arm assembly

Tracks

Sector

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 6Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Cost of Accessing Data on Disk• Time to access (read/write) a disk block:

– seek time (moving arms to position disk head on track)– rotational delay (waiting for block to rotate under head)– transfer time (actually moving data to/from disk surface)

• Key to lower I/O cost: reduce seek/rotation delays!(you have to wait for the transfer time, no matter

what)

• Query cost is often measured in the number of page I/Os – often simplified to assume each page I/O costs the same

Page 4: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

4

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 7Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Disk Drive StatisticsSector size: 512 bytesSeek timeAverage 4-10 ms Track to track .6-1.0 ms Average Rotational Delay - 6 to 10 ms (rotational speed 10,000 RPM to 5,400RPM)Transfer Time - Sustained data rate.3-.1 msec per 8K page, or 25-75 Meg/secondDensity30GB/square inch

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 8Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Time to access a disk page

RegistersOn Chip CacheOn Board Cache

Memory

Disk

12

10

100

Tape /Optical Robot

109

106

Sacramento

This CampusThis Room

My Head

10 min

1.5 hr

2 Years

1 min

Pluto

2,000 YearsAndromdeda

Clo

ck T

icks

Figure from AlphaSort paper – see

research.microsoft.com/~Gray

Page 5: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

5

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 9Some slides adapted from R. Ramakrishnan, with permission Lecture 7

10,000 times slower than memory

• how much time is 10,000 seconds?

• contrast 1 second (to pick up a piece of paper)

• vs.• driving to Seattle to get it

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 10Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Block (page) size vs. record size

• Page –smallest unit of transfer supported by OS• Block – Multiple of page, smallest unit of transfer

supported by an application or a disk volume.• Block and page are often used interchangeably.• “typical” record size … maybe a few hundred up to

1,000 bytes • “typical” page size 4K, 8K• When would we choose block size to be larger?• When would we choose block size to be smaller?

Page 6: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

6

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 11Some slides adapted from R. Ramakrishnan, with permission Lecture 7

How to minimize the cost of Disk I/Os

• If possible, store a file to be read sequentially as follows:– Consecutive pages on same track, followed by– Consecutive tracks on same cylinder, followed by– Consecutive cylinders adjacent to each other– First two incur no seek time or rotational delay, seek for third is only

one-track.Remember: disk access time = seek time + rotational latency + transfer timeWhat is saved with this storage pattern?• In the book, they assume that all I/O operations take the same

amount of time. This is a simplification! Real query optimizers would consider sequential vs. random disk reads – because sequential reads are much faster.

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 12Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Index for a File

• An Index is a data structure that speeds up selections on the search key field(s)

• An index transforms a search key k into a data entry k*.

• Given k*, you can get to the record(s) with the search key k in one I/O.

Page 7: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

7

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 13Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Real-life Indexes

• What is the search key? What is the data entry?– Library Catalog– Clerk in a video store– Terminal in a book store

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 14Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Database Indexes• Given Emp(ID, name, age, address)• What are the possible search keys?• What data structure might be used for the index?• What could be the format of the data entry k*?Nota Bene:• You can build an index on any subset of the fields of

a table. • You can build more than one index for the same

table. • “Search key” is not the same as a key for the table.

Values of a “search key” need not be unique.

Page 8: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

8

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 15Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Most Indexes are Tree Structured

• Tree-structured indexes support range searches and equality searches.

– ISAM: static structure (old technology)…index is built just once, when the file is loaded. Uses overflow areas, so the tree can become very unbalanced.

– B+ tree: dynamic – index is adjusted as records are inserted and deleted in the file. Index remains balanced.

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 16Some slides adapted from R. Ramakrishnan, with permission Lecture 7

B+ Tree Indexes

Leaf pages contain data entries, and are chained (prev & next)Non-leaf pages have index entries; only used to direct searches:

P0 K 1 P 1 K 2 P 2 K m P m

index entry

Non-leafPages

Pages (Sorted by search key)

Leaf

Page 9: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

9

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 17Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Example B+ Tree

• Find 28*? 29*? All > 15* and < 30*• Insert/delete: Find data entry in leaf, then change

it. Need to adjust parent sometimes.– And change sometimes bubbles up the tree– This keeps the tree balanced: each data retrieval takes

the same number of I/Os and each page is always at least half full.

2* 3*

Root

17

30

14* 16* 33* 34* 38* 39*

135

7*5* 8* 22* 24*

27

27* 29*

Entries <= 17 Entries > 17

Note how data entriesin leaf level are sorted

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 18Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Hash-Based Indexes

• Good for equality selections. • Index is a collection of buckets.

– Bucket = primary page plus zero or moreoverflow pages.

– Buckets contain data entries.• Hashing function h: h(r) = bucket in

which (data entry for) record r belongs. hlooks at the search key fields of r.– No need for “index entries” in this scheme.

Page 10: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

10

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 19Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Hash-based Index Examples

Smith,44,3000

Jones,40,6003

Tracy,44,5004

Ashby,25,3000

Basu,33,4003

Sagar,29,2007

Cass,50,5004

Kery,22,6003

h1age

3000

3000

5004

5004

4003

2007

6003

6003

h2sal

h1(age)=00

h1(age)=01

h1(age)=10

h2(sal)=00

h2(sal)=11

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 20Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Costs of an Index

• If you define an index in your database, you will incur three costs– Space to store the index– Updates to the search key will be slower– The optimizer will take longer because it has more

choices• There is one advantage to having an index

– Some queries run faster (better be sure about this)

Page 11: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

11

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 21Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Smith, 44, 3000

Ashby, 25, 3000

Bristow, 30, 2007Basu, 33, 4003

Tracy, 44, 5004

Cass, 50, 5004Daniels, 22, 6003Jones, 40, 6003

Clustered Index: Records are sorted based on search key for the index

Search key is “Name”Recordsare sortedby “Name”in the file

Index Data File

Each pagecontains 3records.

AshbyBasu

CassBristow

DanielsJones

TracySmith

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 22Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Smith, 44, 3000

Ashby, 25, 3000

Bristow, 30, 2007Basu, 33, 4003

Tracy, 44, 5004

Cass, 50, 5004Daniels, 22, 6003Jones, 40, 6003

AshbyCassSmith

Another kind of Clustered Index (Sparse)

Search key is “Name”

Recordsare sortedby “Name”in the file

Index Data File

Each pagecontains 3records.

Page 12: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

12

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 23Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Smith, 44, 3000

Ashby, 25, 3000

Bristow, 30, 2007Basu, 44, 4003

Tracy, 33, 5004

Cass, 50, 5004Daniels, 22, 6003Jones, 40, 6003

222530

40444450

33

Unclustered Index: Records NOT Sorted on Search Key

Search key is “Age”

Index Data File

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 24Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Index Classification

Clustered, sparse indexes are smaller; they work well for range searches and sorting.

But…some useful optimizations are based on dense indexes.Note: one file can have at most one single-attribute clustered

index - all of the additional single-attribute indices must be unclustered. sparse dense

clustered YES YES

unclustered NO! YES

Page 13: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

13

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 25Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Clustered vs. Non-clustered Index

• Consider a telephone book as an index to telephone numbers– What is the primary search key?– Is it a clustered or unclustered index?– Is it a dense or sparse index?– Can you find a range of entries using this index?

• Imagine we have an unclustered index for a phone book based on street address– Can you efficiently find a range of entries – a

range of addresses?

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 26Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Using Composite Search Keys

Which indexes can you use for each of these queries?

(use = the answer is one contiguous block of data in the data entry level)

• age = 12• age = 12 and sal = 20• age=12 and sal > 10• age > 12 and sal > 30

sue 13 75

bobcaljoe 12

10

208011

12

name age sal

<sal, age>

<age, sal> <age>

<sal>

12,2012,10

11,80

13,75

20,12

10,12

75,1380,11

11121213

10207580

Data recordssorted by name

Page 14: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

14

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 27Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Important refinement for unclustered indexes

1. Find qualifying data entries.2. Sort the rid’s of the data records to be retrieved.3. Fetch rids in order.

This ensures that each data page is looked at just once (though # of such pages likely to be higher than with clustering).

Challenges: What order do we sort the rids into?Will the file system allow this?

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 28Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Join Algorithms – an Introduction

R ⋈ S is very common! And R × S followed by a selection is inefficient. So we process joins (rather than cross product) whenever possible. Lots of effort invested in join algorithms.

Assume: M pages in R, pR tuples per page, N pages in S, pStuples per page.

In our examples, R is Reserves and S is Sailors.Our algorithms work for any equijoins.

SELECT *FROM Reserves R1, Sailors S1WHERE R1.sid=S1.sid

Page 15: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

15

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 29Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Simple Nested Loops Join (very naive)

• For each tuple in the outer relation R, we scan the entire inner relation S, tuple by tuple. – Cost: M + (pR * M) * N = 1000 + 100*1000*500 I/Os– 50,001,000 I/Os ≈ 500,010 seconds ≈ 6 days

Join on ith column of R and jth column of Sforeach tuple r in R do

foreach tuple s in S doif ri == sj then add <r, s> to result

We assume approximately 100 I/Os per secondM = 1000 pages in R, pR = 100 tuples per page, N = 500 pages in S, pS = 80 tuples per page.

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 30Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Simple Nested Loops Join (yes, this is dumb)

2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5

Memory Buffers:

Page 16: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

16

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 31Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Simple Nested Loops Join

2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5

Memory Buffers:2 ...12 …6 ...

... 2… 13

Query Answer2 … … 2

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 32Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Simple Nested Loops Join

2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5

Memory Buffers:2 ...12 …6 ...

... 2… 13

Query Answer2 … … 2

No match:Discard!

Page 17: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

17

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 33Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Simple Nested Loops Join

2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5

Memory Buffers:2 ...12 …6 ...

Query Answer2 … … 2

No match:Discard!

… 12… 27

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 34Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Simple Nested Loops Join

2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5

Memory Buffers:2 ...12 …6 ...

Query Answer2 … … 2

No match:Discard!

… 12… 27

Page 18: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

18

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 35Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Simple Nested Loops Join

2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5

Memory Buffers:2 ...12 …6 ...

Query Answer2 … … 2

No match:Discard!

… 1… 5

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 36Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Simple Nested Loops Join

2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5

Memory Buffers:2 ...12 …6 ...

Query Answer2 … … 2

No match:Discard!

… 1… 5

Page 19: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

19

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 37Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Simple Nested Loops Join

2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5

Memory Buffers:2 ...12 …6 ...

... 2… 13

Query Answer2 … … 2

No match:Discard!

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 38Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Simple Nested Loops Join

2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5

Memory Buffers:2 ...12 …6 ...

... 2… 13

Query Answer2 … … 2

No match:Discard!

Page 20: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

20

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 39Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Simple Nested Loops Join

2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5

Memory Buffers:2 ...12 …6 ...

Query Answer2 … … 212 … … 12

Match!

… 12… 27

And so forth …

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 40Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Page-oriented Nested Loops Join

For each page of R, get each page of S, write out matching pairs of tuples <r, s>.

Cost: M + M*N = 1000 + 1000*500 = 501,000 (R outer)Cost: N + N*M = 500 + 500*1000 = 500,500 (S outer)Therefore – typically use smaller relation as outer

relation.500,000 I/Os ≈ 1.4 hours

for each page of tuples r in R dofor each page of tuples s in S do

(match all combinations in memory)if ri == sj then add <r, s> to result

Page 21: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

21

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 41Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Page-Oriented Nested Loops Join

2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5

Memory Buffers:2 ...12 …6 ...

... 2… 13

Once we’ve got these two pages in memory,check every combination from one pageto the other page!

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 42Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Page-Oriented Nested Loops Join

2 ...12 …6 ...

1 …5 …27 …

Table 1on disk

... 2… 13

… 12… 27

Table 2on disk

… 1… 5

Memory Buffers:2 ...12 …6 ...

Do the same thing…compare allcombinations in memory - between these two pages!

… 12… 27

Page 22: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

22

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 43Some slides adapted from R. Ramakrishnan, with permission Lecture 7

The best loops-based join algorithm: Block Nested-Loops Join

• Algorithm:– One page is assigned to be the output buffer– One page assigned to input from S, B-2 pages assigned to input from R

Until all of R has been read {Read in B-2 pages of RFor each page in S {

Read in the single S pageCheck pairs of tuples in memory and output if they match } }

Cost: M + (M/(B-2))*N. For B=35, cost is 1000 + 1000*500/33 = 16,000 I/Os ≈ 3 minutes

2 ...12 …6 ...

1 …5 …27 …

R on disk... 2… 13

… 12… 27

S on disk

… 1… 5

B pages of Memory Buffer

1 …5 …27 …

2 ...12 …6 ...

... 2… 13

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 44Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Index Nested Loops Join

If there is an index on the search key sj then can use the index on the inner table - get matching tuples!

Cost: M + ( (M*pR) * cost of finding matching S tuples) = 500 + (500*80*4) = 160,500 ≈ 1/2 hour (Reserves as inner) = 1000 + (1000*100*3) = 301,000 ≈ 1 hour (Sailor as inner)

foreach tuple r in R doforeach tuple s in S where ri == sj do

add <r, s> to result

For each R tuple, cost of probing S index is about 2-4 for B+ tree.

These could be smaller – if top levels of B+ tree are in memory

Page 23: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

23

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 45Some slides adapted from R. Ramakrishnan, with permission Lecture 7

External Sorting

• Various relational operator algorithms require sorting a table

• Issue: table won’t fit in memory• Approach: Use merge-sort where sorted

runs can be read sequentially into memory

78 72 68 55 54 54 40

92 88 66 51 43

23 21 20 18 9 736

29

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 46Some slides adapted from R. Ramakrishnan, with permission Lecture 7

N-Way External Sorting• On the initial pass, read and write runs a memory full

at a time• Do an n-way merge rather than a 2-way merge• Each pass does 2*M I/Os (where M is number of

pages in the table)• Number of passes depends on how many pages of

memory are devoted to sorting– #Passes = Ceiling (Log B-1 (M/B))– Can sort 100 million pages in 4 passes with 129 pages of

memory space

• Can sort M pages using B memory pages in 2 passesif sqrt(M) <= B (this condition is satisfied often)

Page 24: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

24

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 47Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Sort-Merge Join1. Sort R on join attribute2. Sort S on join attribute3. Merge R and S

– Advance scan of R until current R-tuple >= current S tuple, then advance scan of S until current S-tuple >= current R tuple; do this until current R tuple = current S tuple.

– At this point, all R tuples with same value in Ri (current R group) and all S tuples with same value in Sj (current S group) match; output <r, s> for all pairs of such tuples.

– Then resume scanning R and S.R is scanned once; each S group is scanned once per matching

R tuple. Depends on the size of the group! If the matching group is small - matching is in memory.

Best case: cost is: Cost to sort R + Cost to sort S + (M+N) assuming all matches fit in memory

Worst case: R and S all have the same value - thus the matching group is the entire relation, for R and for S. Cost is: Cost to sort R + Cost to sort S + (M*N)

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 48Some slides adapted from R. Ramakrishnan, with permission Lecture 7

58 rusty 10 35.0

Example of Sort-Merge Join

Cost: (cost to sort R) + (cost to sort S) + (M+N) (in memory matches)With 35 buffers, Reserves and Sailors can each be sorted in 2 passesCost is: 4 * 1000 + 4 * 500 + 1000 + 500 = 7500

(we multiply by 4 because there are 2 passes, and we read and write each page, each pass)

sid sname rating age22 dustin 7 45.028 yuppy 9 35.031 lubber 8 55.5

sid bid day rname

28 103 12/4/96 guppy28 103 11/3/96 yuppy31 101 10/10/96 dustin31 102 10/12/96 lubber

44 guppy 5 35.031 101 10/11/96 lubber58 103 11/12/96 dustin

Page 25: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

25

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 49Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Cost of Sort-Merge

(cost to sort R)+(cost to sort S)+(cost to merge)Cost to sort M pages in 2 passes = 4*M. Why?Cost to merge is typically M+N. Why?

If both R and S can be sorted in 2 passes, thenCost is: 4M+4N+(M+N) = 5*(M+N)

There is an optimization (page 462 in our text) that improves this to 3*(M+N)

Thus the cost of joining Sailors and Reservations, assuming there are enough buffers to sort each table in two passes, is

5*(M+N) = 7500 Pages

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 50Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Hash Join

Simple case – S fits in main memory– Build an in-memory hash index for S– Proceed as for index nested-loops join

Harder case – neither R nor S fits in memory– Divide them both in the same way (1 pass)

so that each partition of S fits in memory– Do the simple case on each pair of

matching partitions

Page 26: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

26

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 51Some slides adapted from R. Ramakrishnan, with permission Lecture 7

PartitioningTable 1

2 …

4 …

7 …

11 …

13 …

19 …

24 …

24 …

27 …

Table 2

3 …

4 …

8 …

10 …

13 …

13 …

27 …

29 …

29 …

Partition 1: 1-10

Partition 2: 11-20

Partition 2: 21-30

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 52Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Use Hash Function Instead of Ranges

• No guarantee we can find ranges of values that will divide Table 2 into roughly equal-sized partitions

• Apply hash function h to join valuePartition 1: h(val) = 1Partition 2: h(val) = 2Partition 3: h(val) = 3

Page 27: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

27

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 53Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Hash Join Cost

• Cost to partition R: 2M• Cost to partition S: 2N

Can do this in one pass if sqrt(M) <= B • Cost to join partitions: M+N• Total: 3*(M+N), same as sort-merge

with the optimization.

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 54Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Comparison of Approximate Costs of Joining R and S, assuming 100 I/Os/second

1 minute4,500Hash join**

1 minute4,500Sort-Merge**

½ hour160,500Index Nested Loops3 minutes16000Block Nested Loops*

1.4 hours500,000Page Nested Loops

6 days50,000,000Simple Nested LoopsTimeI/OsAlgorithm

*Assuming 35 buffer pages

**Assuming appropriate files, M, satisfy sqrt(M) < pages of buffer

Page 28: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

28

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 55Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Summary – Algorithms for Relational Algebra Operators

• A virtue of relational DBMSs: queries are composed of a few basic operators; the implementation of these operators can be carefully tuned (and it is important to do this!).

• Many alternative implementation techniques for each operator; no universally superior technique for most operators.

• Must consider available alternatives for each operation in a query and choose best one based on system statistics, etc. This is part of the broader task of optimizing a query composed of several ops.

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 56Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Query Optimization

Translate SQL query into a query tree(operators: relational algebra plus a few other ones)Generate other, equivalent query trees(e.g., using relational algebra equivalences)For each possible query tree:

select an algorithm for each operator (producing a query plan)estimate the cost of the plan

Choose the plan with lowest cost - of the plans considered (which is not necessarily all possible plans)

Page 29: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

29

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 57Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Initial Query Tree - Equivalent to SQL(without any algorithms selected)

SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid = S.sid AND

R.bid = 100 ANDS.rating > 5;

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

Relational Algebra Tree:SQL Query:

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 58Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Relational Algebra Equivalences

• σc1∧… ∧ cn(R) ≡ σc1( … σcn(R))• This symbol means equivalence.• So you can replace σc1( … σcn(R)) with σc1∧… ∧ cn(R) • And you can replace σc1∧… ∧ cn(R) with σc1( … σcn(R))• If you have several conditions connected by “AND” in

a select operator, then you can apply them one at a time.

Page 30: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

30

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 59Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Example: σc1∧… ∧ cn(R) ≡ σc1( … σcn(R))

SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid = S.sid AND

R.bid = 100 ANDS.rating > 5;

Reserves Sailors

sid=sid

rating > 5

sname

Reserves Sailors

sid=sid

bid=100 rating > 5

sname bid=100

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 60Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Relational Algebra Equivalence

σc(R⋈S) ≡ σc(R)⋈S

Given a select operation following a join, if the select condition applies ONLY to one of the tables (R, in this example), then you can introduce a new select operator (before the join operator) with that condition.

This is called pushing down a SELECT.

Page 31: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

31

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 61Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Example: σc(R⋈S) ≡ σc(R)⋈SSELECT S.snameFROM Reserves R, Sailors SWHERE R.sid = S.sid AND

R.bid = 100 ANDS.rating > 5;

Reserves Sailors

sid=sid

rating > 5

sname

bid=100

This applies only tothe Sailors table!

Reserves

Sailors

sid=sid

rating > 5

sname

bid=100

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 62Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Example: σc(R⋈S) ≡ σc(R)⋈S (cont.)SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid = S.sid AND

R.bid = 100 ANDS.rating > 5;

Reserves

Sailors

sid=sid

rating > 5

sname

bid=100

What are the advantages of “pushing” aselect past a join operator?

What are the disadvantages of “pushing” aselect past a join operator?

Page 32: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

32

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 63Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Relational Algebra Equivalences

• Selections:σc1∧… ∧ cn(R) ≡ σc1( … σcn(R)) Selects Cascadeσc1(σc2(R)) ≡ σc2(σc1(R)) Selects Commute

• Projections:πa (R) ≡ πa (πa1 (… πan(R))) If each ai contains a.

Only last project matters

• Joins:R⋈(S⋈T) ≡ (R⋈S)⋈T Joins are AssociativeR⋈S ≡ S⋈R Joins Commute

Try to prove that: R⋈(S⋈T) ≡ (T⋈R)⋈T

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 64Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Some Rel. Algebra Equivalences have Constraints

σc1∧… ∧ cn(R) ≡ σc1( … σcn(R))Is this always true? No matter what the

conditions are … no matter what table R we use?

What about this one?πa(σc(R)) ≡ σc(πa(R))

Is this always true? No matter what the condition is? No matter what project list is used?

Page 33: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

33

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 65Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Equivalences with 2 or More Operations

• Projection commutes with a selectionPROVIDED that the selection uses attributes that are retained by the projection (c’s attrs ⊆ a):πa(σc(R)) ≡ σc(πa(R))

• A cross-product can be converted to a joinPROVIDED that the selection condition involves attributes of the tables involved in a cross-productσc(R × S) ≡ R⋈cS

• A selection can be pushed past a cross product (or join)PROVIDED the select uses just attributes of Rσc(R⋈S) ≡ σc(R)⋈S

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 66Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Equivalences with 2 or More Operations

• Join distributes over the various set operators (union, intersection, difference. For example, with union:(Q⋈S) ∪ (R⋈S) ≡ (Q ∪ R)⋈S

Page 34: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

34

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 67Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Query Tree - Equivalent to SQL(without any algorithms selected)

SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid = S.sid AND

R.bid = 100 ANDS.rating > 5;

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

RA Tree:SQL Query:

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 68Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Choosing algorithms for each operator(algorithms shown in red)

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

(Page-Oriented Nested Loops join)

(On-the-fly)

(On-the-fly)

RA Tree: One Possible Plan:

Page 35: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

35

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 69Some slides adapted from R. Ramakrishnan, with permission Lecture 7

“On the fly”

On the fly means that we evaluate the operator in memory - while we have thetuple available.

“On the fly” induces no I/O cost!

Relation on left is assumed to be the outer relation - for any algorithm that uses nested loops.

Reserves Sailors

sid=sid

bid=100 rating > 5

sname

(Page-Oriented Nested

Loops Join)

(On-the-fly)

(On-the-fly)

Plan:

SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid = S.sid AND

R.bid = 100 ANDS.rating > 5;

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 70Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Limitations of “On the fly”

• Can only happen if:– Computation can be done entirely on

tuples in memory– Results do not need to be materialized

• Cannot apply to all operations!

Page 36: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

36

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 71Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Cost of plan 1no index (Sailors – inner loop)

M = # of pages in outer tableN = # of pages in inner table

Cost of page-oriented nested loops join is:

M + M * N1000 + 1000 * 500 = 501,000

And the “on-the-fly” operations have no I/O - so plan cost is 501,000

SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid = S.sid AND

R.bid = 100 ANDS.rating > 5;

Reserves Sailors

sid=sid

sname

(On-the-fly)

(On-the-fly)Plan:

(Page-Oriented Nested

Loops Join)

σ bid=100 ∧ rating > 5

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 72Some slides adapted from R. Ramakrishnan, with permission Lecture 7

How can we create other plans?

• One Answer: use relational algebra equivalences to produce new query trees.– Advantage: you are sure that the new tree is

equivalent to the original, because equivalences have proofs

• Or, assign different algorithms to operators• Example: Apply commutativity of join to the

current RA tree– What is the result?– Let’s use the same algorithms for Plan 2

Page 37: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

37

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 73Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Cost of plan 2no index (Reserves: inner loop)

SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid = S.sid AND

R.bid = 100 ANDS.rating > 5;

Sailors as the outer relation rather than Reserves.

M = # of pages in outer tableN = # of pages in inner table

Cost of page-oriented nested loops join is:

500 + 500 * 1000 = 500,500And the “on-the-fly” operations

have no I/O - so plan cost is 500,500

ReservesSailors

sid=sid

σ bid=100 ∧ rating > 5

sname

(On-the-fly)

(On-the-fly)Plan:

(Page-Oriented Nested

Loops Join)

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 74Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Cost of plan 3Push down selects

ReservesSailors

sid=sid

sname (On-the-fly)

Plan:

SELECT S.snameFROM Reserves R, Sailors SWHERE R.sid = S.sid AND

R.bid = 100 ANDS.rating > 5;

(Page-Oriented Nested

Loops Join)

Apply this equivalence:σc(R⋈S) ≡ σc(R)⋈S

To the previous query tree to get an equivalent query tree.

What is the cost of the plan shown?Scanning sailors and reserves cost

M+N I/Os. What about the cost of the join? It

depends on how many reservations are for boat 100 and how many sailors have a rating >5.

How would you find this information?Statistics will help.

σ rating > 5σbid=100 (On-the-fly)

Page 38: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

38

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 75Some slides adapted from R. Ramakrishnan, with permission Lecture 7

To estimate cost, we need table sizes

• For all operators beyond the leaf level of the query plan, the input tables are the result of some earlier query.

• Thus, we need to estimate the size of intermediate results!(This can be difficult. This is one reason why the cost estimates may not be very good. Estimation errors tend to compound.)

• For example, what information would you need to estimate how many reservations are for bid 100 and how many sailors have a rating >5?

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 76Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Estimating Sizes

• Size of table = size of tuple * number of rows returned– Size of tuple = sum of size of columns

• Index entries also have a size, just like tuples

• If table is M bytes and a page is N bytes, it takes M/N I/O operations to perform a full scan of the table

Page 39: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

39

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 77Some slides adapted from R. Ramakrishnan, with permission Lecture 7

DBMS Usually Maintains Some Statistics in the DB Catalog

• Catalogs typically contain at least:– # tuples and # pages for each table.– # distinct key values and # pages for each index.– Index height, low/high key values for each tree index.

• Catalogs are updated periodically - say, once a week or once a month. Perhaps they’re updated during the backup.

• Simplest case: assume that all attribute values are uniformly distributed. Thus if gender was an attribute, the optimizer would assume that half of the rows have the male value and other half have the female value. (This might be grossly inaccurate.)

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 78Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Calculating Selectivities

• Assume that rating values range from 1 to 10, and that bid values range from 1 to 100.

• What percentage of the incoming tuples, to the operator σ bid=100 , will be output?

• What about σ rating > 5 ?• σ bid=100 ∧ rating > 5 ?

Page 40: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

40

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 79Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Doing better than a uniform distribution

• The DBMS might gather more detailed information about how the values of attributes are distributed (e.g., histograms of the values in a field) and store it in the catalog.

Suppose there was an attribute degree-programwith three possible values: “BS CS” “MS CS” “PhD CS” – Then the DBMS might count the values and know that there are

428 “BS CS” values, 98 “MS CS” values and 25 “PhD CS” values.– This allows much better estimate of the reduction factor.

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 80Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Independence of Reduction Factors

• So far, in our quest to compute costs of plans, we have gathered statistics and shown how to use them, assuming uniform distributions. But what about a select operator with terms separated by AND?– Example: σ bid=100 ∧ rating > 5

• We assume that all terms are independent!• Thus, if one attribute is class and the other is number-of-

hours - the query optimizer might assume that class is uniformly distributed over {Fresh, Soph, Jun, Sen} and that number-of-hours is uniformly distributed over {0, 1, …, 205}– But, we know that only class correlates with number-of-hours! Might

even be that number-of-hours → class.• What percentage of the incoming tuples, to the operator

σ bid=100 ∧ rating > 5 , will be output?

Page 41: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

41

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 81Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Enumerating Plans for Multiple Joins• Back to the problem of generating plans.• Are we trying to generate as many plans as possible?

– No, best to generate few, as long as cheap plans are among them.

• In System R: only left-deep join trees are considered.

BA

C

D

BA

C

D

C DBA

This one is left-deep - the other two are not.

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 82Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Queries Over Multiple Relations (Joins)Left-deep trees allow us to

generate all fully pipelined plans.

• Intermediate results not written to temporary files.

• Not all left-deep trees are fully pipelined (e.g., SM join).

• Using only left-deep plans (obviously) restricts the search space. (So optimizer may not find the optimal plan.)

BA

C

D

Page 42: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

42

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 83Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Enumeration of Left-Deep Plans

• Need to consider all possible left-deep plans.– For SPJ queries, these are enumerated by orderings of the tables

• For each ordering, consider the access method for each relation and the join method for each join.

• Enumerated using N passes (if N relations joined):– Pass 1: Find best 1-relation plan for each relation.– Pass 2: Find best way to join result of each 1-relation plan (as outer) to

another relation. (All 2-relation plans.)– Pass N: Find best way to join result of a (N-1)-relation plan (as outer) to

the N’th relation. (All N-relation plans.)• At the end of each pass, retain only:

– Cheapest plan overall, plus– Cheapest plan for each interesting order of the tuples.

• Interesting order: corresponding to a join, ORDER BY or GROUP BY

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 84Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Nested Queries• Nested block is optimized independently,

with the outer tuple considered as providing a selection condition.

• Outer block is optimized with the cost of ‘calling’ the nested block computation taken into account.

• Implicit ordering of these blocks means that some good strategies are not considered.

• The non-nested version of the query is typically optimized better. The optimizer might not find it from the nested version, so you may need to explicitly unnest the query.

SELECT S.snameFROM Sailors SWHERE EXISTS

(SELECT *FROM Reserves RWHERE R.bid=103 AND R.sid=S.sid)

Nested block to optimize:SELECT *FROM Reserves RWHERE R.bid=103

AND S.sid= outer value

Equivalent non-nested query:SELECT S.snameFROM Sailors S, Reserves RWHERE S.sid=R.sid

AND R.bid=103

Page 43: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

43

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 85Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Query Optimizers Don’t (Always) Find the Best Plan

• There are usually more plans than you can consider, even if only left deep plans are considered.

• The optimizer might not even try to generate all possible plans (it won’t be able to consider all of them anyway).

• Sometimes the optimizer will compare the optimization cost to the estimated execution cost and quit early.

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 86Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Physical DB Design

Now that we know how query optimizers work…

How do we choose...the file organizationsandthe indices

for our database?

This is called “physical design” of a database.

This is sort of like query optimization - backwards.

Page 44: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

44

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 87Some slides adapted from R. Ramakrishnan, with permission Lecture 7

We Need to Understand the Workload

• For each query in the workload:

– Which relations does it access?– Which attributes are retrieved?– Which attributes are involved in selection/join conditions?

How selective are these conditions likely to be?

• For each update in the workload:

– Which attributes are involved in selection/join conditions? How selective are these conditions likely to be?

– The type of update (INSERT/DELETE/UPDATE), and the attributes that are affected.

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 88Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Physical Database Design Issues

• Choosing the right indexes• Mapping tables to physical storage• Partitioning tables

Page 45: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

45

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 89Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Choice of Indexes

• Consider the queries, one by one, in order of importance. – Consider the best plan for this query using the current indexes,

and see if a better plan is possible with an additional index. – If so, create it.

• But...before creating an index, we must consider the impact on updates!

It’s a trade-off: indexes make queries faster and updates slower.

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 90Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Issues to Consider in Index Selection

• Attributes mentioned in a WHERE clause are candidates for indices

• Clustering is useful for range queries, equality queries with duplicates, and sorting

• Hash or Tree index?

• Try to choose indexes that benefit as many queries as possible.

• Remember only one index can be clustered per relation!

Page 46: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

46

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 91Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Example 1

• Hash index on E.dno allows us to get matching (inner) Emptuples for each selected (outer) Dept tuple.– What plans could use this index?

• Hash index on D.dname supports ‘Toy’ selection.– Given this, index on D.dno is not needed – why?– What plans could use this hash index?

• What if WHERE included: `` ... AND E.age=25’’ ?– Could retrieve Emp tuples using index on E.age, then join with Dept

tuples satisfying dname selection. – So, if E.age index is already created, is it worth adding an index on

E.dno?

SELECT E.ename, D.mgrFROM Emp E, Dept DWHERE D.dname=‘Toy’ AND E.dno=D.dno

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 92Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Example 2

• What if we build a hash index on D.dno.– Which plans can use that index?

• What index should we build on Emp?– B+ tree on E.sal could be used, OR an index on E.hobby could be used.

Only one of these is needed, and which is better depends upon the selectivity of the conditions.

• As a rule of thumb, equality selections more selective than range selections.

• As both examples indicate, our choice of indexes is guided by the plan(s) that we expect an optimizer to consider for a query. Have to understand optimizers!

SELECT E.ename, D.mgrFROM Emp E, Dept DWHERE E.sal BETWEEN 10000 AND 20000

AND E.hobby=‘Stamps’ AND E.dno=D.dno

Page 47: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

47

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 93Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Examples of Clustering• B+ tree index on E.age can be

used to get qualifying tuples.– How selective is the condition?– Is the index clustered?

• Consider the GROUP BY query.– If many tuples have E.age > 10, using

E.age index and sorting the retrieved tuples may be costly.

– Clustered E.dno index may be better!

• Equality queries and duplicates:– Clustering on E.hobby helps!

SELECT E.dnoFROM Emp EWHERE E.age>40

SELECT E.dno, COUNT (*)FROM Emp EWHERE E.age>10GROUP BY E.dno

SELECT E.dnoFROM Emp EWHERE E.hobby=Stamps

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 94Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Index-Only Plans

• A number of queries can be answered without retrieving any tuples from one or more of the relations involved if a suitable indexis available.

SELECT D.mgrFROM Dept D, Emp EWHERE D.dno=E.dno

SELECT E.dno, COUNT(*)FROM Emp EGROUP BY E.dno

SELECT E.dno, MIN(E.sal)FROM Emp EGROUP BY E.dno

<E.dno>

<E.dno>

<E.dno, E.sal>Tree index!

Page 48: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

48

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 95Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Index rules of thumb

• Don’t use indexes on small tables (<200 rows)

• Don’t use indexes on columns with few values (T/F, state)

• For most systems, indexes on primary keys and foreign keys are sufficient

• Don’t forget to add indexes when the schema changes!

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 96Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Storage structure

chunk (segment, container)

extent1

extentn

Disk1dbspace (tablespace, database)

Diski

Disk sharing

page (block) pageN…Tables

Disk allocation

Logical grouping

Page 49: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

49

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 97Some slides adapted from R. Ramakrishnan, with permission Lecture 7

The order in which you create chunks matters

• The system does maintenance tasks (for example, page cleaning during checkpoints) that cause it to visit each chunk in turn

• If you create chunks in a round-robin fashion across disks, can parallelize these tasks

Chunk 3Chunk 1

Chunk 4Chunk 2

Disk 1 Disk 2

Chunk 3 Chunk 4

Chunk 2Chunk 1

Disk 1 Disk 2

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 98Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Where you put your tables matters

• If you have multiple disks (and you should have), then you should distribute your data to take advantage of it

• Keep your system data and your application data separate

Page 50: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

50

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 99Some slides adapted from R. Ramakrishnan, with permission Lecture 7

How much space you give your tables matters

• Most major RDBMS systems allow you to specify a default extent size space for tables

• A single extent is contiguous space

• If you can fit your whole table into one extent, then your clustering will be a lot more effective

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 100Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Tuning the Conceptual Schema

The choice of conceptual schema should be guided by the workload, in addition to redundancy issues:– We might denormalize or we might add fields to a

relation. – We must take care to avoid the problems caused by

redundancy!– We’ll cover these next lecture– We might also consider decompositions (partitioning)– Workloads may change over time

Page 51: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

51

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 101Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Partitioning of one table into several

• Relation is replaced by a collection of relations that are projections. (Vertical partitioning)

• Sometimes, might want to replace relation by a collection of relations that are selections. (Horizontal partitioning)– Each new relation has same schema as the original, but a subset

of the rows.– Collectively, new relations contain all rows of the original.

Typically, the new relations are disjoint.– Many database vendors provide specific syntax for horizontal

partitioning

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 102Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Tuning the Conceptual Schema: Vertical decomposition using the project operator

Coursec# cname instructor room days

Course2c# cname

Course1c# room days

Course3c# instructor

Page 52: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

52

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 103Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Tuning the Conceptual Schema:Horizontal partition/decomposition using the

select operatorCoursec# cname instructor room days

Undergraduate-Coursec# cname instructor room days

Graduate-Coursec# cname instructor room days

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 104Some slides adapted from R. Ramakrishnan, with permission Lecture 7

How you partition your data matters

• Most vendors allow you to partition– In a round-robin fashion– By expression

• Round-robin is useful if:– The majority of the rows

will be examined during queries

– You expect the values to be evenly distributed

Row 1Row 2Row 3

Row 5Row 4

Row 6

Page 53: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

53

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 105Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Partitioning by expression

• Partitioning by expression is useful if:– Non-overlapping fragments can be created on a

single column• The database will eliminate entire fragments during

query

– Data access is not evenly distributed• Regularly access data can be divided among multiple

databases• Rarely accessed data can be held in a single fragment

– The order of your expression matters!

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 106Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Partitioning by expression (cont.)

col1 > 3000 in dbspace1col1 > 2000 and col1 <= 3000 in dbspace2col1 <= 2000 in dbspace3

• Search for col1>2888• Search for col1=3345• What if this were

round-robin?

5635412988

31113345

2931

Page 54: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

54

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 107Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Masking Conceptual Schema Changes

• The replacement of Contracts by LargeContracts and SmallContracts can be masked by the view.

• However, queries with the condition val>10000 must be asked wrt LargeContracts for efficient execution: so users concerned with performance have to be aware of the change.

CREATE VIEW Contracts(cid, sid, jid, did, pid, qty, val)AS SELECT * FROM LargeContractsUNIONSELECT *FROM SmallContracts

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 108Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Tuning Queries and Views

• If a query runs slower than expected, check if an index needs to be re-built, or if statistics are too old.

• Sometimes, the DBMS may not be executing the plan you had in mind. Common areas of weakness:

– Selections involving null values.– Selections involving arithmetic or string expressions.– Selections involving OR conditions.– Lack of evaluation features like index-only strategies or

certain join methods or poor size estimation.

• Check the plan that is being used! Then adjust the choice of indexes or rewrite the query/view.

Page 55: Lecture 8 - Computer Action Teamweb.cecs.pdx.edu/~lmd/cs386-586/Handouts/lecture8.pdfspinning (say, 120rps). • one head reads/writes at any one time. • to read a record: • position

55

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 109Some slides adapted from R. Ramakrishnan, with permission Lecture 7

Rewriting SQL Queries• Complicated by interaction of:

– NULLs, duplicates, aggregation, subqueries.• Guideline: Use only one “query block”, if

possible.

SELECT DISTINCT *FROM Sailors SWHERE S.sname IN

(SELECT Y.snameFROM YoungSailors Y)

SELECT DISTINCT S.*FROM Sailors S,

YoungSailors YWHERE S.sname = Y.sname

=

CS386/586 Introduction to Databases, © Lois Delcambre 1998-2005 Slide 110Some slides adapted from R. Ramakrishnan, with permission Lecture 7

More Guidelines for Query Tuning• Minimize the use of DISTINCT: don’t need it if duplicates are

acceptable, or if answer contains a key. • Minimize the use of GROUP BY and HAVING:

SELECT MIN (E.age)FROM Employee EGROUP BY E.dnoHAVING E.dno=102

SELECT MIN (E.age)FROM Employee EWHERE E.dno=102

Consider DBMS use of index when writing arithmetic expressions: E.age=2*D.age will benefit from index on E.age, but might not benefit from index on D.age!