09 Rewrites

Relational Algebra Equivalencies

Feb 19, 2014

��1

Static Hashing

��2

012

...

N-1

h(k) % N

k

...

......

...

...

Primary Bucket Pages Overflow Pages(Contiguous) (Linked List)

Problem

��3

Overflow Pages are Expensive

Resizing a Hash Table is Slow

What do we do about it?

Extendible Hashing

��4

4,12,32,16

1,5,21,13

10

15,7,19

00011011

gd = 2A (ld=2)

B (ld=2)

C (ld=2)

D (ld=2)

Directory Data Pages

The directory and data pages have an associated “depth” (global/local)

To look up a value use the last gd bits of the key’s hash value as an index into the dir

Extendible Hashing

��5

4,12,32,16

1,5,21,13

10

15,7,19

00011011

A (ld=2)

B (ld=2)

C (ld=2)

D (ld=2)

Insert 20 (h(20) = 1100)

gd = 3

(Need to Split Bucket A)

0

100101110111

000

4,12,20

32,16

A2 (ld=3)

A (ld=3)

Dir entries not being split point to the same bucket

Extendible Hashing

��6

1,5,21,13

10

15,7,19

00011011

A (ld=2)

B (ld=2)

C (ld=2)

D (ld=2)

Insert 31 (h(31) = 1001)

gd = 3

(Need to Split Bucket B)

0

100101110111

000

4,12,20

32,16

A2 (ld=3)

A (ld=3)

Don’t need to double dir when splitting bucket w/ ld < gd

1,21

5,13,31

B2 (ld=3)

B (ld=3)

Linear Hashing

��7

0000000101001110010111011110001001

Next = 2Buckets Existing

at the start of this round

bucket to be split

Buckets already split this round

‘Split image’ buckets created this round

Level = 3 (23 = 8 Entries)

Linear Hashing:Lookups

��8

Next = 2


overflow overflow

overflowh(k) % 2Level = 4 (100)

0000000101001110010111011110001001

Next ≤ 4

Lookup K

Use entry 4

Linear Hashing:Lookups

��9

Next = 2


overflow overflow

overflowh(k) % 2Level = 1 (001)

0000000101001110010111011110001001

1 < Next

Lookup K

Use entry h(k) % 2(Level+1)

1 (1001) or 9 (1001)

10100010

Linear Hashing:Splitting

��10

Next = 2


overflow overflow

overflow

0000000101001110010111011110001001

Split Next

Increment Next

Partition on bit Level

Hashing Algorithms

• Indirection: Use pointers to avoid bulk copies of the entire hash table

• Extendible Hashing

• Partial: Incrementally grow the hash table.

• Linear Hashing

• Continuous: Allow bucket boundaries to move.

• Consistent Hashing

��11

Consistent Hashing

• Insight: Make split/merge faster by making bin boundaries deterministic, but random.

• Used mostly in distributed data-stores

• (Amazon, Facebook, …)

• Minimal applications to file-based storage.

��12

(‘Chord: A Scalable Peer-to-peer…’, Stoica et al.)

Consistent Hashing

��13

0 232-1

Consistent Hashing

��14

0232-1

231-1

230-1

229-1Modular Arithmetic (mod 232)

!

(232-1)+1 = 0

Numbers form a ‘Ring’

Consistent Hashing

��15

Assign each bucket a random point on the ring

A

B

Each bucket contains values that hash to ring

positions between its point and its predecessor

Hash Range of A

Hash Range of B

Consistent Hashing

��16

A

B

C

B

C

A

Consistent Hashing

��17

• Splits/Merges are cheap.

• At most 2 buckets are affected.

• No need for page duplication.

• Mapping hash value to bucket is expensive.

• Need to have a lookup mechanism/directory.

• Chord: Decentralized lookup mechanism.

Group Work

��18

Static Hashing vs

Extendible Hashing vs

Linear Hashing vs

Consistent Hashing

What advantages does each algorithm have What is a situation in which you might want to use each?

Recap: Using Indexes

��19

�(R.A>1)^(R.B<R.C)

Using Indexes 1 Selection Predicate + File Scan = Potential Index Scan

�(R.B<R.C)

IndexScan(R,R.A,Range(1,1))FileScan(R)

Recap: Using Indexes

��20

Using Indexes 2 Join + 2xFile Scan = Potential Index Nested Loop

FileScan(S)FileScan(R)

./R.A<S.B IndexNLJ(S, S.B,Range(R.A,1))

FileScan(R)

Index-Nested Loop

��21

foreach hA, . . .i 2 R

IndexNLJ(S, S.B,Range(R.A,1))

FileScan(R)

foreach hB, . . .i 2 IndexScan(S, S.B,Range(A,1))

emit hA, . . . , B, . . .i

Equivalencies

��22

≠(No Beard) (Beard)

=(Leonard Nimoy) (Zachary Quinto)

(good vs evil)

(different actor, same character)

Optimization

��23

• Rewriting Queries into Equivalent Forms

• Some rewrites are always good ideas

• Pushing down selections

• Pushing down projections

• Some rewrites might be good ideas

• Join reordering

• How do we get to these equivalent forms?

RA Equivalencies

��24

�c1(�c2(R)) ⌘ �c2(�c1(R))

�c1^...^cn(R) ⌘ �c1(. . . (�cn(R)))

R ./ (S ./ T ) ⌘ (R ./ S) ./ T

(R ./ S) ⌘ (S ./ R)

(Combinable)

(Commutative)

(Idempotent)

(Associative)

(Commutative)

Try It: Show that R ./ (S ./ T ) ⌘ (T ./ R) ./ S

Selection

Projection

Join (and Cross Product)

⇡a1(R) ⌘ ⇡a1(. . . (⇡an(R)))

Intermission��25

RA Equivalencies

��26

Selection commutes with Projection (but only if attribute set a and condition c are compatible)

a must include all columns referenced by c

⇡a(�c(R)) ⌘ �c(⇡a(R))

⇡a(�c(R)) ⌘ ⇡a(�c(⇡a[cols(c)(R)))Show that

RA Equivalencies

��27

Selection combines with Cross Product to form a Join as per the definition of Join

(Note: This only helps if we have a join algorithm for conditions like c)

�c(R⇥ S) ⌘ R ./c S

Show that

�R.B=S.B^R.A>3(R⇥ S) ⌘ �R.A>3(R ./R.B=S.B S)

RA Equivalencies

��28

Selection commutes with Cross Product (but only if condition c references attributes of R exclusively)

�c(R⇥ S) ⌘ (�c(R))⇥ S

Show that�c(R ./c0 S) ⌘ (�c(R)) ./c0 S

�R.B=S.B^R.A>3(R⇥ S) ⌘ (�R.A>3(R)) ./R.B=S.B S

RA Equivalencies

��29

Projection commutes (distributes) over Cross Product (where a1 and a2 are the attributes in a from R and S respectively)

⇡a(R⇥ S) ⌘ (⇡a1(R))⇥ (⇡a2(S))

Show that⇡a(R ./c S) ⌘ (⇡a1(R)) ./c (⇡a2(S))

(under what condition)How can we work around this limitation?

⇡a((⇡a1[(cols(c)\cols(R))(R)) ./c (⇡a2[(cols(c)\cols(S))(S)))

RA Equivalencies

��30

Union and Intersections are Commutative and Associative !

Selection and Projection both commute with both Union and Intersection

Enumerating Possible Plans

��31

• There are many algorithms suitable for processing a specific data set:

1.How do we identify feasible algorithms.

2.How do we choose between them

09 Rewrites

Documents

Transcript of 09 Rewrites