09 Rewrites

31
Relational Algebra Equivalencies Feb 19, 2014 1

Transcript of 09 Rewrites

Page 1: 09 Rewrites

Relational Algebra Equivalencies

Feb 19, 2014

���1

Page 2: 09 Rewrites

Static Hashing

���2

012

...

N-1

h(k) % N

k

...

......

...

...

Primary Bucket Pages Overflow Pages(Contiguous) (Linked List)

Page 3: 09 Rewrites

Problem

���3

Overflow Pages are Expensive

Resizing a Hash Table is Slow

What do we do about it?

Page 4: 09 Rewrites

Extendible Hashing

���4

4,12,32,16

1,5,21,13

10

15,7,19

00011011

gd = 2A (ld=2)

B (ld=2)

C (ld=2)

D (ld=2)

Directory Data Pages

The directory and data pages have an associated “depth” (global/local)

To look up a value use the last gd bits of the key’s hash value as an index into the dir

Page 5: 09 Rewrites

Extendible Hashing

���5

4,12,32,16

1,5,21,13

10

15,7,19

00011011

A (ld=2)

B (ld=2)

C (ld=2)

D (ld=2)

Insert 20 (h(20) = 1100)

gd = 3

(Need to Split Bucket A)

0

100101110111

000

4,12,20

32,16

A2 (ld=3)

A (ld=3)

Dir entries not being split point to the same bucket

Page 6: 09 Rewrites

Extendible Hashing

���6

1,5,21,13

10

15,7,19

00011011

A (ld=2)

B (ld=2)

C (ld=2)

D (ld=2)

Insert 31 (h(31) = 1001)

gd = 3

(Need to Split Bucket B)

0

100101110111

000

4,12,20

32,16

A2 (ld=3)

A (ld=3)

Don’t need to double dir when splitting bucket w/ ld < gd

1,21

5,13,31

B2 (ld=3)

B (ld=3)

Page 7: 09 Rewrites

Linear Hashing

���7

0000000101001110010111011110001001

Next = 2Buckets Existing

at the start of this round

bucket to be split

Buckets already split this round

‘Split image’ buckets created this round

Level = 3 (23 = 8 Entries)

Page 8: 09 Rewrites

Linear Hashing:Lookups

���8

Next = 2

Level = 3 (23 = 8 Entries)

overflow overflow

overflowh(k) % 2Level = 4 (100)

0000000101001110010111011110001001

Next ≤ 4

Lookup K

Use entry 4

Page 9: 09 Rewrites

Linear Hashing:Lookups

���9

Next = 2

Level = 3 (23 = 8 Entries)

overflow overflow

overflowh(k) % 2Level = 1 (001)

0000000101001110010111011110001001

1 < Next

Lookup K

Use entry h(k) % 2(Level+1)

1 (1001) or 9 (1001)

Page 10: 09 Rewrites

10100010

Linear Hashing:Splitting

���10

Next = 2

Level = 3 (23 = 8 Entries)

overflow overflow

overflow

0000000101001110010111011110001001

Split Next

Increment Next

Partition on bit Level

Page 11: 09 Rewrites

Hashing Algorithms

• Indirection: Use pointers to avoid bulk copies of the entire hash table

• Extendible Hashing

• Partial: Incrementally grow the hash table.

• Linear Hashing

• Continuous: Allow bucket boundaries to move.

• Consistent Hashing

���11

Page 12: 09 Rewrites

Consistent Hashing

• Insight: Make split/merge faster by making bin boundaries deterministic, but random.

• Used mostly in distributed data-stores

• (Amazon, Facebook, …)

• Minimal applications to file-based storage.

���12

(‘Chord: A Scalable Peer-to-peer…’, Stoica et al.)

Page 13: 09 Rewrites

Consistent Hashing

���13

0 232-1

Page 14: 09 Rewrites

Consistent Hashing

���14

0232-1

231-1

230-1

229-1Modular Arithmetic (mod 232)

!

(232-1)+1 = 0

Numbers form a ‘Ring’

Page 15: 09 Rewrites

Consistent Hashing

���15

Assign each bucket a random point on the ring

A

B

Each bucket contains values that hash to ring

positions between its point and its predecessor

Hash Range of A

Hash Range of B

Page 16: 09 Rewrites

Consistent Hashing

���16

A

B

C

B

C

A

Page 17: 09 Rewrites

Consistent Hashing

���17

• Splits/Merges are cheap.

• At most 2 buckets are affected.

• No need for page duplication.

• Mapping hash value to bucket is expensive.

• Need to have a lookup mechanism/directory.

• Chord: Decentralized lookup mechanism.

Page 18: 09 Rewrites

Group Work

���18

Static Hashing vs

Extendible Hashing vs

Linear Hashing vs

Consistent Hashing

What advantages does each algorithm have What is a situation in which you might want to use each?

Page 19: 09 Rewrites

Recap: Using Indexes

���19

�(R.A>1)^(R.B<R.C)

Using Indexes 1 Selection Predicate + File Scan = Potential Index Scan

�(R.B<R.C)

IndexScan(R,R.A,Range(1,1))FileScan(R)

Page 20: 09 Rewrites

Recap: Using Indexes

���20

Using Indexes 2 Join + 2xFile Scan = Potential Index Nested Loop

FileScan(S)FileScan(R)

./R.A<S.B IndexNLJ(S, S.B,Range(R.A,1))

FileScan(R)

Page 21: 09 Rewrites

Index-Nested Loop

���21

foreach hA, . . .i 2 R

IndexNLJ(S, S.B,Range(R.A,1))

FileScan(R)

foreach hB, . . .i 2 IndexScan(S, S.B,Range(A,1))

emit hA, . . . , B, . . .i

Page 22: 09 Rewrites

Equivalencies

���22

≠(No Beard) (Beard)

=(Leonard Nimoy) (Zachary Quinto)

(good vs evil)

(different actor, same character)

Page 23: 09 Rewrites

Optimization

���23

• Rewriting Queries into Equivalent Forms

• Some rewrites are always good ideas

• Pushing down selections

• Pushing down projections

• Some rewrites might be good ideas

• Join reordering

• How do we get to these equivalent forms?

Page 24: 09 Rewrites

RA Equivalencies

���24

�c1(�c2(R)) ⌘ �c2(�c1(R))

�c1^...^cn(R) ⌘ �c1(. . . (�cn(R)))

R ./ (S ./ T ) ⌘ (R ./ S) ./ T

(R ./ S) ⌘ (S ./ R)

(Combinable)

(Commutative)

(Idempotent)

(Associative)

(Commutative)

Try It: Show that R ./ (S ./ T ) ⌘ (T ./ R) ./ S

Selection

Projection

Join (and Cross Product)

⇡a1(R) ⌘ ⇡a1(. . . (⇡an(R)))

Page 25: 09 Rewrites

Intermission���25

Page 26: 09 Rewrites

RA Equivalencies

���26

Selection commutes with Projection (but only if attribute set a and condition c are compatible)

a must include all columns referenced by c

⇡a(�c(R)) ⌘ �c(⇡a(R))

⇡a(�c(R)) ⌘ ⇡a(�c(⇡a[cols(c)(R)))Show that

Page 27: 09 Rewrites

RA Equivalencies

���27

Selection combines with Cross Product to form a Join as per the definition of Join

(Note: This only helps if we have a join algorithm for conditions like c)

�c(R⇥ S) ⌘ R ./c S

Show that

�R.B=S.B^R.A>3(R⇥ S) ⌘ �R.A>3(R ./R.B=S.B S)

Page 28: 09 Rewrites

RA Equivalencies

���28

Selection commutes with Cross Product (but only if condition c references attributes of R exclusively)

�c(R⇥ S) ⌘ (�c(R))⇥ S

Show that�c(R ./c0 S) ⌘ (�c(R)) ./c0 S

�R.B=S.B^R.A>3(R⇥ S) ⌘ (�R.A>3(R)) ./R.B=S.B S

Page 29: 09 Rewrites

RA Equivalencies

���29

Projection commutes (distributes) over Cross Product (where a1 and a2 are the attributes in a from R and S respectively)

⇡a(R⇥ S) ⌘ (⇡a1(R))⇥ (⇡a2(S))

Show that⇡a(R ./c S) ⌘ (⇡a1(R)) ./c (⇡a2(S))

(under what condition)How can we work around this limitation?

⇡a((⇡a1[(cols(c)\cols(R))(R)) ./c (⇡a2[(cols(c)\cols(S))(S)))

Page 30: 09 Rewrites

RA Equivalencies

���30

Union and Intersections are Commutative and Associative !

Selection and Projection both commute with both Union and Intersection

Page 31: 09 Rewrites

Enumerating Possible Plans

���31

• There are many algorithms suitable for processing a specific data set:

1.How do we identify feasible algorithms.

2.How do we choose between them