Space-time tradeoffs

21
Design and Analysis of Algorithms - Chapter 7 1 Space-time tradeoffs Space-time tradeoffs For many problems some extra space really For many problems some extra space really pays off (extra space in tables - breathing pays off (extra space in tables - breathing room) room) input enhancement input enhancement non comparison-based sorting non comparison-based sorting auxiliary tables (shift tables for pattern auxiliary tables (shift tables for pattern matching) matching) prestructuring prestructuring hashing hashing indexing schemes (eg, B-trees) indexing schemes (eg, B-trees) tables of information that do all the work tables of information that do all the work dynamic programming dynamic programming

description

Space-time tradeoffs. For many problems some extra space really pays off (extra space in tables - breathing room) input enhancement non comparison-based sorting auxiliary tables (shift tables for pattern matching) prestructuring hashing indexing schemes (eg, B-trees) - PowerPoint PPT Presentation

Transcript of Space-time tradeoffs

Page 1: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 1

Space-time tradeoffsSpace-time tradeoffsFor many problems some extra space really pays off For many problems some extra space really pays off

(extra space in tables - breathing room)(extra space in tables - breathing room) input enhancementinput enhancement

• non comparison-based sortingnon comparison-based sorting• auxiliary tables (shift tables for pattern matching)auxiliary tables (shift tables for pattern matching)

prestructuringprestructuring• hashinghashing• indexing schemes (eg, B-trees)indexing schemes (eg, B-trees)

tables of information that do all the worktables of information that do all the work• dynamic programming dynamic programming

Page 2: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 2

Sorting by CountingSorting by CountingAlgorithm ComparisonCountingSort(A[0..n-1])Algorithm ComparisonCountingSort(A[0..n-1])for i for i 0 to n-1 do Count[i] 0 to n-1 do Count[i]00for i for i 0 to n-2 do 0 to n-2 do

for j for j i+1 to n-1 do i+1 to n-1 do if A[i] <A[j] then Count[j] if A[i] <A[j] then Count[j] Count[j]+1 Count[j]+1else Count[i] else Count[i] Count[i]+1 Count[i]+1

for i for i 0 to n-1 do S[Count[i]] 0 to n-1 do S[Count[i]] A[i] A[i] Example: 62 31 84 96 19 47 Example: 62 31 84 96 19 47 EfficiencyEfficiency

Page 3: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 3

Sorting by Counting (2)Sorting by Counting (2)Algorithm DistributionCountingSort(A[0..n-1])Algorithm DistributionCountingSort(A[0..n-1])for j for j 0 to u-l do D[j] 0 to u-l do D[j]00for i for i 0 to n-1 do D[A[i]-l] 0 to n-1 do D[A[i]-l] D[A[i]-l] + 1 D[A[i]-l] + 1for j for j 1 to u-l do D[j] 1 to u-l do D[j] D[j-1]+D[j] D[j-1]+D[j] for i for i n-1 down to 0 do n-1 down to 0 do

j j A[i]-l; S[D[j]-1] A[i]-l; S[D[j]-1] A[i]; D[j] A[i]; D[j] D[j]-1 D[j]-1 Example: 13 11 12 13 12 12 Example: 13 11 12 13 12 12 EfficiencyEfficiency

Page 4: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 4

String matchingString matching patternpattern: a string of : a string of mm characters to search for characters to search for texttext: a (long) string of : a (long) string of nn characters to search in characters to search in

Brute force algorithm:Brute force algorithm:1.1. Align pattern at beginning of textAlign pattern at beginning of text2.2. moving from left to right, compare each character of moving from left to right, compare each character of

pattern to the corresponding character in text untilpattern to the corresponding character in text until– all characters are found to match (successful search); orall characters are found to match (successful search); or– a mismatch is detecteda mismatch is detected

3.3. while pattern is not found and the text is not yet exhausted, while pattern is not found and the text is not yet exhausted, realign pattern one position to the right and repeat step 2.realign pattern one position to the right and repeat step 2.

Page 5: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 5

String searching - HistoryString searching - History 1970: Cook shows (using finite-state machines) that problem 1970: Cook shows (using finite-state machines) that problem

can be solved in time proportional to can be solved in time proportional to n n++mm 1976 Knuth and Pratt find algorithm based on Cook’s idea; 1976 Knuth and Pratt find algorithm based on Cook’s idea;

Morris independently discovers same algorithm in attempt Morris independently discovers same algorithm in attempt to avoid “backing up” over textto avoid “backing up” over text

At about the same time Boyer and Moore find an algorithm At about the same time Boyer and Moore find an algorithm that examines only a fraction of the text in most cases (by that examines only a fraction of the text in most cases (by comparing characters in pattern and text from right to left, comparing characters in pattern and text from right to left, instead of left to right)instead of left to right)

1980 Another algorithm proposed by Rabin and Karp 1980 Another algorithm proposed by Rabin and Karp virtually always runs in time proportional to virtually always runs in time proportional to nn++mm and has and has the advantage of extending easily to two-dimensional the advantage of extending easily to two-dimensional pattern matching and being almost as simple as the brute-pattern matching and being almost as simple as the brute-force method.force method.

Page 6: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 6

Horspool’s AlgorithmHorspool’s Algorithm A simplified version of Boyer-Moore A simplified version of Boyer-Moore

algorithm that retains key insights:algorithm that retains key insights:• compare pattern characters to text from compare pattern characters to text from

right to leftright to left• given a pattern, create a shift table that given a pattern, create a shift table that

determines how much to shift the pattern determines how much to shift the pattern when a mismatch occurs (when a mismatch occurs (input input enhancementenhancement))

Page 7: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 7

How far to shift?How far to shift?Look at first (rightmost) character in text that was compared. Three cases:Look at first (rightmost) character in text that was compared. Three cases:

• The character is not in the patternThe character is not in the pattern .....c...................... .....c...................... ((cc not in pattern) not in pattern) BAOBABBAOBAB

• The character is in the pattern (but not at rightmost position)The character is in the pattern (but not at rightmost position) .....O...................... .....O...................... ((OO occurs once in pattern) occurs once in pattern) BAOBABBAOBAB

.....A...................... .....A...................... ((AA occurs twice in pattern) occurs twice in pattern) BAOBABBAOBAB

• The rightmost characters produced a matchThe rightmost characters produced a match .....B...................... .....B...................... BAOBABBAOBAB Shift Table:Shift Table: Stores number of characters to shift by depending on first Stores number of characters to shift by depending on first

character comparedcharacter compared

Page 8: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 8

Shift tableShift table Constructed by scanning pattern before search beginsConstructed by scanning pattern before search begins

All entries are initialized to length of pattern.All entries are initialized to length of pattern. For For cc occurring in pattern, update table entry to distance occurring in pattern, update table entry to distance

of rightmost occurrence of of rightmost occurrence of cc from end of pattern from end of pattern

Algorithm ShiftTable(P[0..m-1])Algorithm ShiftTable(P[0..m-1])for ifor i0 to size-1 do Table[i]0 to size-1 do Table[i]mmfor jfor j0 to m-2 do Table[P[j]]0 to m-2 do Table[P[j]]m-1-jm-1-jreturn Table return Table

Page 9: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 9

Shift tableShift table Example for pattern Example for pattern BAOBAB:BAOBAB:

Then:Then:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Page 10: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 10

The AlgorithmThe Algorithm Horspool Matching(P[0..m-1,T[0..n-1]])Horspool Matching(P[0..m-1,T[0..n-1]])

ShiftTable(P[0..m-1])ShiftTable(P[0..m-1])i i m-1 m-1while i<=n-1 dowhile i<=n-1 dokk00while k<=m-1 and P[m-1-k]=T[i-k] dowhile k<=m-1 and P[m-1-k]=T[i-k] dok k k+1 k+1if k=m return i-m+1if k=m return i-m+1else i else i i+Table[T[i]] i+Table[T[i]]return -1return -1

Page 11: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 11

Boyer-Moore algorithmBoyer-Moore algorithm Based on same two ideas:Based on same two ideas:

• compare pattern characters to text from right to compare pattern characters to text from right to leftleft

• given a pattern, create a shift table that given a pattern, create a shift table that determines how much to shift the pattern when determines how much to shift the pattern when a mismatch occurs (a mismatch occurs (input enhancementinput enhancement))

Uses additional shift table with same idea applied Uses additional shift table with same idea applied to the number of matched charactersto the number of matched characters

Page 12: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 12

The bad-symbol shiftThe bad-symbol shift Based on the Horspool idea of using the extra tableBased on the Horspool idea of using the extra table However, this table is computed differentlyHowever, this table is computed differently

• If c, the text character corresponding to the last pattern If c, the text character corresponding to the last pattern character, is not in the pattern, then shift in the same character, is not in the pattern, then shift in the same way by m characters (actually c is a bad symbol)way by m characters (actually c is a bad symbol)

• If the mismatching character (the bad symbol) does not If the mismatching character (the bad symbol) does not appear in the pattern, then shift to overpass itappear in the pattern, then shift to overpass it

• If the mismathcing character (the bad symbol) appears If the mismathcing character (the bad symbol) appears in the pattern, then shift to align the bad symbol to the in the pattern, then shift to align the bad symbol to the same text character (lying to the left of the mismatching same text character (lying to the left of the mismatching position).position).

Page 13: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 13

The bad-symbol shift - exampleThe bad-symbol shift - exampleThe bad symbol IS NOT in the patternThe bad symbol IS NOT in the pattern......SSER...................... ER...................... BARBARBBERER BARBERBARBER shift 4 positionsshift 4 positions

The bad symbol IS in the patternThe bad symbol IS in the pattern......AAER...................... ER...................... BBAARRBBERER BARBERBARBER shift 2 positions, shift 2 positions,

This shift is given by: d=max[t1(c)-k,1], where t1 is the This shift is given by: d=max[t1(c)-k,1], where t1 is the Horspool table, k the distance between the bad Horspool table, k the distance between the bad symbol from the end of the patternsymbol from the end of the pattern

Page 14: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 14

The good-suffix shift - exampleThe good-suffix shift - example What happens if a matched suffix appears What happens if a matched suffix appears

again in the pattern (eg. ABagain in the pattern (eg. ABRARACADABCADABRARA)) Important to find another suffix with a Important to find another suffix with a

different previous character. Calculate the different previous character. Calculate the shift as the distance between two shift as the distance between two occurrences of the suffix.occurrences of the suffix.

Also, important to find the longest prefix of Also, important to find the longest prefix of size l<k that matches the suffix of size l. size l<k that matches the suffix of size l. Calculate the shift as the distance between Calculate the shift as the distance between the suffix and the prefix.the suffix and the prefix.

Page 15: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 15

The good-suffix shift – example The good-suffix shift – example (2)(2)

KK patternpattern d2d2

11 ABCBAABCBABB 22

22 ABCBABCBABAB 44

33 ABCABCBABBAB 44

44 ABABCBABCBAB 44

55 AABCBABBCBAB 44

KK patternpattern d2d2

11 BAOBABAOBABB 22

22 BAOBBAOBABAB 55

33 BAOBAOBABBAB 55

44 BABAOBABOBAB 55

55 BBAOBABAOBAB 55

Page 16: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 16

Final rule for Boyer-MooreFinal rule for Boyer-Moore Calculate shift as Calculate shift as

d1d1 if k=0if k=0 d = d =

max(d1,d2)max(d1,d2) if k>0if k>0where d1=max(t1(c)-k)where d1=max(t1(c)-k)

ExampleExampleBESS_KNEW_ABOUT_BAOBABSBESS_KNEW_ABOUT_BAOBABSBAOBABBAOBAB

{

Page 17: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 17

HashingHashing A very efficient method for implementing a A very efficient method for implementing a

dictionary, i.e., dictionary, i.e., a set with the operations: a set with the operations:– insert insert – find find – deletedelete

Applications:Applications:– databasesdatabases– symbol tablessymbol tables

Page 18: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 18

Hash tables and hash Hash tables and hash functionsfunctions

Hash table:Hash table: an array with indices that correspond to an array with indices that correspond to bucketsbuckets

Hash function:Hash function: determines the bucket for each record determines the bucket for each record Example:Example: student records, key=SSN. Hash function: student records, key=SSN. Hash function:

hh((kk) = ) = kk mod mod mm((k k is a key andis a key and m m is the number of buckets) is the number of buckets)

• if if mm=1000, where is record with SSN= 315-17-4251 stored?=1000, where is record with SSN= 315-17-4251 stored? Hash function must:Hash function must:

• be easy to computebe easy to compute• distribute keys evenly throughout the tabledistribute keys evenly throughout the table

Page 19: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 19

CollisionsCollisions If If hh((k1k1)) = h = h((kk2) then there is a 2) then there is a collision. collision. Good hash functions result in fewer collisions.Good hash functions result in fewer collisions. Collisions can never be completely eliminated.Collisions can never be completely eliminated. Two types handle collisions differently: Two types handle collisions differently:

• Open hashing - bucket points to linked list of all keys hashing to it.

• Closed hashing – one key per bucket, in case of collision, find another bucket for one of the keys

– linear probing: use next bucket – double hashing: use second hash function to compute increment

Page 20: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 20

Open hashingOpen hashing If hash function distributes keys uniformly, If hash function distributes keys uniformly,

average length of linked list will be average length of linked list will be n/mn/m Average number of probes S = 1+Average number of probes S = 1+αα/2, U = /2, U = αα Worst-case is still linear!Worst-case is still linear! Open hashing still works if Open hashing still works if n>mn>m..

Page 21: Space-time tradeoffs

Design and Analysis of Algorithms - Chapter 7 21

Closed hashingClosed hashing Does not work if Does not work if n>m.n>m. Avoids pointers.Avoids pointers. Deletions are Deletions are notnot straightforward. straightforward. Number of probes to insert/find/delete a key depends Number of probes to insert/find/delete a key depends

on load factor on load factor αα = = nn//m m (hash table density) (hash table density) – successful search: (½) (1+ 1/(1- successful search: (½) (1+ 1/(1- αα))))– unsuccessful search: (½) (1+ 1/(1- unsuccessful search: (½) (1+ 1/(1- αα)²))²)

As the table gets filled (As the table gets filled (αα approaches 1), number of approaches 1), number of probes increases dramatically: probes increases dramatically: