Search Trees: BSTs and B-Trees David Kauchak cs302 Spring 2013.
String Algorithms David Kauchak cs302 Spring 2012.
-
Upload
joleen-sutton -
Category
Documents
-
view
228 -
download
1
description
Transcript of String Algorithms David Kauchak cs302 Spring 2012.
![Page 1: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/1.jpg)
String AlgorithmsDavid Kauchak
cs302Spring 2012
![Page 2: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/2.jpg)
Strings Let Σ be an alphabet, e.g. Σ = ( , a, b, c, …,
z) A string is any member of Σ*, i.e. any
sequence of 0 or more members of Σ ‘this is a string’ Σ* ‘this is also a string’ Σ* ‘1234’ Σ*
![Page 3: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/3.jpg)
String operations Given strings s1 of length n and s2 of length m Equality: is s1 = s2? (case sensitive or insensitive)
Running time O(n) where n is length of shortest string
‘this is a string’ = ‘this is a string’
‘this is a string’ ≠ ‘this is another string’
‘this is a string’ =? ‘THIS IS A STRING’
![Page 4: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/4.jpg)
String operations Concatenate (append): create string s1s2
Running time (assuming we generate a new string) Θ(n+m)
‘this is a’ . ‘ string’ → ‘this is a string’
![Page 5: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/5.jpg)
String operations Substitute: Exchange all occurrences of a
particular character with another character
Running time Θ(n)
Substitute(‘this is a string’, ‘i’, ‘x’) → ‘thxs xs a strxng’
Substitute(‘banana’, ‘a’, ‘o’) → ‘bonono’
![Page 6: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/6.jpg)
String operations Length: return the number of
characters/symbols in the string
Running time O(1) or Θ(n) depending on implementation
Length(‘this is a string’) → 16
Length(‘this is another string’) → 24
![Page 7: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/7.jpg)
String operations Prefix: Get the first j characters in the string
Running time Θ(j)
Suffix: Get the last j characters in the string
Running time Θ(j)
Prefix(‘this is a string’, 4) → ‘this’
Suffix(‘this is a string’, 6) → ‘string’
![Page 8: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/8.jpg)
String operations Substring – Get the characters between i and j inclusive
Running time Θ(j – i + 1)
Prefix: Prefix(S, i) = Substring(S, 1, i) Suffix: Suffix(S, i) = Substring(S, i+1, length(n))
Substring(‘this is a string’, 4, 8) → ‘s is ’
![Page 9: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/9.jpg)
Edit distance (aka Levenshtein distance)Edit distance between two strings is the minimum number of insertions, deletions and substitutions required to transform string s1 into string s2
Insertion:
ABACED ABACCED DABACCED
Insert ‘C’ Insert ‘D’
![Page 10: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/10.jpg)
Edit distance (aka Levenshtein distance)
Deletion:
ABACED
Edit distance between two strings is the minimum number of insertions, deletions and substitutions required to transform string s1 into string s2
![Page 11: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/11.jpg)
Edit distance (aka Levenshtein distance)
Deletion:
ABACED BACED
Delete ‘A’
Edit distance between two strings is the minimum number of insertions, deletions and substitutions required to transform string s1 into string s2
![Page 12: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/12.jpg)
Edit distance (aka Levenshtein distance)
Deletion:
ABACED BACED BACE
Delete ‘A’ Delete ‘D’
Edit distance between two strings is the minimum number of insertions, deletions and substitutions required to transform string s1 into string s2
![Page 13: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/13.jpg)
Edit distance (aka Levenshtein distance)
Substitution:
ABACED ABADED ABADES
Sub ‘D’ for ‘C’ Sub ‘S’ for ‘D’
Edit distance between two strings is the minimum number of insertions, deletions and substitutions required to transform string s1 into string s2
![Page 14: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/14.jpg)
Edit distance examples
Edit(Kitten, Mitten) = 1
Operations:
Sub ‘M’ for ‘K’ Mitten
![Page 15: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/15.jpg)
Edit distance examples
Edit(Happy, Hilly) = 3
Operations:
Sub ‘a’ for ‘i’ Hippy
Sub ‘l’ for ‘p’ Hilpy
Sub ‘l’ for ‘p’ Hilly
![Page 16: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/16.jpg)
Edit distance examples
Edit(Banana, Car) = 5
Operations:
Delete ‘B’ anana
Delete ‘a’ nana
Delete ‘n’ naa
Sub ‘C’ for ‘n’ Caa
Sub ‘a’ for ‘r’ Car
![Page 17: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/17.jpg)
Edit distance examples
Edit(Simple, Apple) = 3
Operations:
Delete ‘S’ imple
Sub ‘A’ for ‘i’ Ample
Sub ‘m’ for ‘p’ Apple
![Page 18: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/18.jpg)
Edit distance
Why might this be useful?
![Page 19: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/19.jpg)
Is edit distance symmetric? that is, is Edit(s1, s2) = Edit(s2, s1)?
Why? sub ‘i’ for ‘j’ → sub ‘j’ for ‘i’ delete ‘i’ → insert ‘i’ insert ‘i’ → delete ‘i’
Edit(Simple, Apple) =? Edit(Apple, Simple)
![Page 20: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/20.jpg)
Calculating edit distance
X = A B C B D A B
Y = B D C A B A
Ideas?
![Page 21: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/21.jpg)
Calculating edit distance
X = A B C B D A ?
Y = B D C A B ?
After all of the operations, X needs to equal Y
![Page 22: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/22.jpg)
Calculating edit distance
X = A B C B D A ?
Y = B D C A B ?
Operations: Insert
Delete
Substitute
![Page 23: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/23.jpg)
Insert
X = A B C B D A ?
Y = B D C A B ?
![Page 24: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/24.jpg)
Insert
X = A B C B D A ?
Y = B D C A B ?
Edit
),(1),( 1...1...1 mn YXEditYXEdit
![Page 25: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/25.jpg)
Delete
X = A B C B D A ?
Y = B D C A B ?
![Page 26: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/26.jpg)
Delete
X = A B C B D A ?
Y = B D C A B ?
),(1),( ...11...1 mn YXEditYXEdit
Edit
![Page 27: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/27.jpg)
Substition
X = A B C B D A ?
Y = B D C A B ?
![Page 28: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/28.jpg)
Substition
X = A B C B D A ?
Y = B D C A B ?
Edit
),(1),( 1...11...1 mn YXEditYXEdit
![Page 29: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/29.jpg)
Anything else?
X = A B C B D A ?
Y = B D C A B ?
![Page 30: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/30.jpg)
Equal
X = A B C B D A ?
Y = B D C A B ?
![Page 31: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/31.jpg)
Equal
X = A B C B D A ?
Y = B D C A B ?
Edit
),(),( 1...11...1 mn YXEditYXEdit
![Page 32: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/32.jpg)
Combining results
),(),( 1...11...1 mn YXEditYXEdit
),(1),( 1...11...1 mn YXEditYXEdit
),(1),( ...11...1 mn YXEditYXEdit
),(1),( 1...1...1 mn YXEditYXEditInsert:
Delete:
Substitute:
Equal:
![Page 33: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/33.jpg)
Combining results
titutionequal/subs),(),(deletion),(1insertion)1
min),(
1...11...1
...11...1
1...11
mnmn
mn
m...n
YXEdityxDiffYXEdit
,YEdit(XYXEdit
![Page 34: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/34.jpg)
Running time
Θ(nm)
![Page 35: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/35.jpg)
Variants Only include insertions and deletions
What does this do to substitutions?
Include swaps, i.e. swapping two adjacent characters counts as one edit
Weight insertion, deletion and substitution differently
Weight specific character insertion, deletion and substitutions differently
Length normalize the edit distance
![Page 36: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/36.jpg)
String matchingGiven a pattern string P of length m and a string S of length n, find all locations where P occurs in S
P = ABA
S = DCABABBABABA
![Page 37: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/37.jpg)
String matching
P = ABA
S = DCABABBABABA
Given a pattern string P of length m and a string S of length n, find all locations where P occurs in S
![Page 38: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/38.jpg)
Uses grep/egrep search find java.lang.String.contains()
![Page 39: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/39.jpg)
Naive implementation
![Page 40: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/40.jpg)
Is it correct?
![Page 41: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/41.jpg)
Running time?
What is the cost of the equality check? Best case: O(1) Worst case: O(m)
![Page 42: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/42.jpg)
Running time?
Best case Θ(n) – when the first character of the pattern does
not occur in the string Worst case
O((n-m+1)m)
![Page 43: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/43.jpg)
Worst case
P = AAAA
S = AAAAAAAAAAAAA
![Page 44: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/44.jpg)
Worst case
P = AAAA
S = AAAAAAAAAAAAA
![Page 45: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/45.jpg)
Worst case
P = AAAA
S = AAAAAAAAAAAAA
![Page 46: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/46.jpg)
Worst case
P = AAAA
S = AAAAAAAAAAAAA
repeated work!
![Page 47: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/47.jpg)
Worst case
P = AAAA
S = AAAAAAAAAAAAA
Ideally, after the first match, we’d know to just check the next character to see if it is an ‘A’
![Page 48: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/48.jpg)
Patterns
Which of these patterns will have that problem?
P = ABAB
P = ABDC
P = BAA
P = ABBCDDCAABB
![Page 49: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/49.jpg)
Patterns
Which of these patterns will have that problem?
P = ABAB
P = ABDC
P = BAA
P = ABBCDDCAABB
If the pattern has a suffix that is also a prefix then we will have this problem
![Page 50: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/50.jpg)
Finite State Automata (FSA) An FSA is defined by 5 components
Q is the set of states
q0 q1 q2 qn…
![Page 51: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/51.jpg)
Finite State Automata (FSA) An FSA is defined by 5 components
Q is the set of states
q0 is the start state A Q, is the set of accepting states where |A| > 0 Σ is the alphabet (e.g. {A, B} is the transition function from Q x Σ to Q
q0 q1 q2 qn…
Q Σ Q
q0 A q1
q0 B q2
q1 A q1…q0 q1 q2
A
B
A …
q7
![Page 52: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/52.jpg)
FSA operation
q0 q3q1A Bq2 A
B A
BB
A
An FSA starts at state q0 and reads the characters of the input string one at a time.
If the automaton is in state q and reads character a, then it transitions to state (q,a).
If the FSA reaches an accepting state (q A), then the FSA has found a match.
![Page 53: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/53.jpg)
FSA operation
q0 q3q1A Bq2 A
B A
BB
A
What pattern does this represent?
P = ABA
![Page 54: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/54.jpg)
FSA operation
q0 q3q1A Bq2 A
B A
BB
A
P = ABA
S = BABABBABABA
![Page 55: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/55.jpg)
FSA operation
q0 q3q1A Bq2 A
B A
BB
A
P = ABA
S = BABABBABABA
![Page 56: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/56.jpg)
FSA operation
q0 q3q1A Bq2 A
B A
BB
A
P = ABA
S = BABABBABABA
![Page 57: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/57.jpg)
FSA operation
q0 q3q1A Bq2 A
B A
BB
A
P = ABA
S = BABABBABABA
![Page 58: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/58.jpg)
FSA operation
q0 q3q1A Bq2 A
B A
BB
A
P = ABA
S = BABABBABABA
![Page 59: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/59.jpg)
FSA operation
q0 q3q1A Bq2 A
B A
BB
A
P = ABA
S = BABABBABABA
![Page 60: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/60.jpg)
FSA operation
q0 q3q1A Bq2 A
B A
BB
A
P = ABA
S = BABABBABABA
![Page 61: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/61.jpg)
FSA operation
q0 q3q1A Bq2 A
B A
BB
A
P = ABA
S = BABABBABABA
![Page 62: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/62.jpg)
FSA operation
q0 q3q1A Bq2 A
B A
BB
A
P = ABA
S = BABABBABABA
![Page 63: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/63.jpg)
FSA operation
q0 q3q1A Bq2 A
B A
BB
A
P = ABA
S = BABABBABABA
![Page 64: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/64.jpg)
FSA operation
q0 q3q1A Bq2 A
B A
BB
A
P = ABA
S = BABABBABABA
![Page 65: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/65.jpg)
FSA operation
q0 q3q1A Bq2 A
B A
BB
A
P = ABA
S = BABABBABABA
![Page 66: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/66.jpg)
Suffix function The suffix function σ(x,y) is the length of the
longest suffix of x that is a prefix of y
)(max),( ...1...1 imimiyxyx
σ(abcdab, ababcd) = ?
![Page 67: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/67.jpg)
Suffix function The suffix function σ(x,y) is the length of the
longest suffix of x that is a prefix of y
)(max),( ...1...1 imimiyxyx
σ(abcdab, ababcd) = 2
![Page 68: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/68.jpg)
Suffix function The suffix function σ(x,y) is the index of the
longest suffix of x that is a prefix of y
σ(daabac, abacac) = ?
)(max),( ...1...1 imimiyxyx
![Page 69: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/69.jpg)
Suffix function
σ(daabac, abacac) = 4
The suffix function σ(x,y) is the length of the longest suffix of x that is a prefix of y
)(max),( ...1...1 imimiyxyx
![Page 70: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/70.jpg)
Suffix function
σ(dabb, abacd) = ?
The suffix function σ(x,y) is the length of the longest suffix of x that is a prefix of y
)(max),( ...1...1 imimiyxyx
![Page 71: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/71.jpg)
Suffix function
σ(dabb, abacd) = 0
The suffix function σ(x,y) is the length of the longest suffix of x that is a prefix of y
)(max),( ...1...1 imimiyxyx
![Page 72: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/72.jpg)
Building a string matching automata Given a pattern P = p1, p2, …, pm, we’d like to build
an FSA that recognizes P in strings
P = ababaca
Ideas?
![Page 73: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/73.jpg)
Building a string matching automata
Q = q1, q2, …, qm corresponding to each symbol, plus a q0 starting state
the set of accepting states, A = {qm} vocab Σ all symbols in P, plus one more
representing all symbols not in P The transition function for q Q and a Σ is
defined as: (q, a) = σ(p1…qa, P)
P = ababaca
![Page 74: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/74.jpg)
Transition function
(q, a) = σ(p1…qa, P)P = ababaca
state a b c Pq0 ? a
q1 b
q2 a
q3 b
q4 a
q5 c
q6 a
q7
σ(a, ababaca)
![Page 75: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/75.jpg)
Transition function
(q, a) = σ(p1…qa, P)P = ababaca
state a b c Pq0 1 ? a
q1 b
q2 a
q3 b
q4 a
q5 c
q6 a
q7
σ(b, ababaca)
![Page 76: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/76.jpg)
Transition function
(q, a) = σ(p1…qa, P)P = ababaca
state a b c Pq0 1 0 ? a
q1 b
q2 a
q3 b
q4 a
q5 c
q6 a
q7
σ(b, ababaca)
![Page 77: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/77.jpg)
Transition function
(q, a) = σ(p1…qa, P)P = ababaca
state a b c Pq0 1 0 0 a
q1 b
q2 a
q3 b
q4 a
q5 c
q6 a
q7
σ(b, ababaca)
![Page 78: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/78.jpg)
Transition function
(q, a) = σ(p1…qa, P)P = ababaca
state a b c Pq0 1 0 0 a
q1 b
q2 a
q3 b
q4 a
q5 c
q6 a
q7
q0 q1A
B,C
![Page 79: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/79.jpg)
Transition function
(q, a) = σ(p1…qa, P)P = ababaca
state a b c Pq0 1 0 0 a
q1 1 2 0 b
q2 3 0 0 a
q3 ? b
q4 a
q5 c
q6 a
q7
We’ve seen ‘aba’ so far
σ(abaa, ababaca)
![Page 80: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/80.jpg)
Transition function
(q, a) = σ(p1…qa, P)P = ababaca
state a b c Pq0 1 0 0 a
q1 1 2 0 b
q2 3 0 0 a
q3 1 b
q4 a
q5 c
q6 a
q7
We’ve seen ‘aba’ so far
σ(abaa, ababaca)
![Page 81: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/81.jpg)
Transition function
(q, a) = σ(p1…qa, P)P = ababaca
state a b c Pq0 1 0 0 a
q1 1 2 0 b
q2 3 0 0 a
q3 1 4 0 b
q4 5 0 0 a
q5 1 ? c
q6 a
q7
We’ve seen ‘ababa’ so far
![Page 82: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/82.jpg)
Transition function
(q, a) = σ(p1…qa, P)P = ababaca
state a b c Pq0 1 0 0 a
q1 1 2 0 b
q2 3 0 0 a
q3 1 4 0 b
q4 5 0 0 a
q5 1 ? c
q6 a
q7
We’ve seen ‘ababa’ so far
σ(ababab, ababaca)
![Page 83: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/83.jpg)
Transition function
(q, a) = σ(p1…qa, P)P = ababaca
state a b c Pq0 1 0 0 a
q1 1 2 0 b
q2 3 0 0 a
q3 1 4 0 b
q4 5 0 0 a
q5 1 4 c
q6 a
q7
We’ve seen ‘ababa’ so far
σ(ababab, ababaca)
![Page 84: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/84.jpg)
Transition function
(q, a) = σ(p1…qa, P)P = ababaca
state a b c Pq0 1 0 0 a
q1 1 2 0 b
q2 3 0 0 a
q3 1 4 0 b
q4 5 0 0 a
q5 1 4 6 c
q6 7 0 0 a
q7 1 2 0
![Page 85: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/85.jpg)
Matching runtime
Once we’ve built the FSA, what is the runtime? Θ(n) - Each symbol causes a state transition and we
only visit each character once What is the cost to build the FSA?
How many entries in the table? Ω(m|Σ|)
How long does it take to calculate the suffix function at each entry? Naïve: O(m)
Overall naïve: O(m2|Σ|) Overall fast implementation O(m|Σ|)
![Page 86: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/86.jpg)
Rabin-Karp algorithm
P = ABA
S = BABABBABABA
- Use a function T to that computes a numerical representation of P- Calculate T for all m symbol sequences of S and compare
![Page 87: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/87.jpg)
P = ABA
S = BABABBABABA
Hash P
T(P)
Rabin-Karp algorithm- Use a function T to that computes a numerical representation of P- Calculate T for all m symbol sequences of S and compare
![Page 88: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/88.jpg)
P = ABA
S = BABABBABABAHash m symbol sequences and compare
T(P)
Rabin-Karp algorithm- Use a function T to that computes a numerical representation of P- Calculate T for all m symbol sequences of S and compare
T(BAB)=
![Page 89: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/89.jpg)
P = ABA
S = BABABBABABAHash m symbol sequences and compare
T(P)
match
Rabin-Karp algorithm- Use a function T to that computes a numerical representation of P- Calculate T for all m symbol sequences of S and compare
T(ABA)=
![Page 90: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/90.jpg)
P = ABA
S = BABABBABABAHash m symbol sequences and compare
T(P)
Rabin-Karp algorithm- Use a function T to that computes a numerical representation of P- Calculate T for all m symbol sequences of S and compare
T(BAB)=
![Page 91: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/91.jpg)
P = ABA
S = BABABBABABAHash m symbol sequences and compare
T(P)
…
Rabin-Karp algorithm- Use a function T to that computes a numerical representation of P- Calculate T for all m symbol sequences of S and compare
T(BAB)=
![Page 92: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/92.jpg)
Rabin-Karp algorithm
P = ABA
S = BABABBABABA
For this to be useful/efficient, what needs to be true about T?
T(P)
…T(BAB)
=
![Page 93: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/93.jpg)
Rabin-Karp algorithm
P = ABA
S = BABABBABABA
For this to be useful/efficient, what needs to be true about T?
T(P)
…
Given T(si…i+m-1) we must be able to efficiently calculate T(si+1…i+m)
T(BAB)=
![Page 94: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/94.jpg)
Calculating the hash function For simplicity, assume Σ = (0, 1, 2, …, 9). (in general we
can use a base larger than 10). A string can then be viewed as a decimal number
How do we efficiently calculate the numerical representation of a string?
T(‘9847261’) = ?
![Page 95: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/95.jpg)
Horner’s rule)))10(10...(10(10)( 1221...1 ppppppT mmmm
9847261
9 * 10 = 90
(90 + 8)*10 = 980
(980 + 4)*10 = 9840
(9840 + 7)*10 = 98470
… = 9847621
![Page 96: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/96.jpg)
Horner’s rule)))10(10...(10(10)( 1221...1 ppppppT mmmm
9847261
9 * 10 = 90
(90 + 8)*10 = 980
(980 + 4)*10 = 9840
(9840 + 7)*10 = 98470
… = 9847621
Running time?
Θ(m)
![Page 97: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/97.jpg)
Calculating the hash on the string
963801572348267m = 4
Given T(si…i+m-1) how can we efficiently calculate T(si+1…i+m)?
T(si…i+m-1)
miim
miimii sssTsT
)10)((10)( 11......1
![Page 98: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/98.jpg)
Calculating the hash on the string
963801572348267m = 4
Given T(si…i+m-1) how can we efficiently calculate T(si+1…i+m)?
T(si…i+m-1)
miim
miimii sssTsT
)10)((10)( 11......1
subtract highest order digit
801
![Page 99: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/99.jpg)
Calculating the hash on the string
963801572348267m = 4
Given T(si…i+m-1) how can we efficiently calculate T(si+1…i+m)?
T(si…i+m-1)shift digits up
8010
miim
miimii sssTsT
)10)((10)( 11......1
![Page 100: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/100.jpg)
Calculating the hash on the string
963801572348267m = 4
T(si…i+m-1)add in the lowest digit
8015
Given T(si…i+m-1) how can we efficiently calculate T(si+1…i+m)?
miim
miimii sssTsT
)10)((10)( 11......1
![Page 101: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/101.jpg)
Calculating the hash on the string
963801572348267m = 4
T(si…i+m-1)
Running time?
- Θ(m) for s1…m
- O(1) for the rest
miim
miimii sssTsT
)10)((10)( 11......1
Given T(si…i+m-1) how can we efficiently calculate T(si+1…i+m)?
![Page 102: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/102.jpg)
Algorithm so far… Is it correct?
Each string has a unique numerical value and we compare that with each value in the string
Running time Preprocessing:
Θ(m) Matching
Θ(n-m+1)
Is there any problem with this analysis?
![Page 103: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/103.jpg)
Algorithm so far… Is it correct?
Each string has a unique numerical value and we compare that with each value in the string
Running time Preprocessing:
Θ(m) Matching
Θ(n-m+1)
How long does the check T(P) = T(si…i+m-1) take?
![Page 104: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/104.jpg)
Modular arithmetics The run time assumptions we made were
assuming arithmetic operations were constant time, which is not true for large numbers
To keep the numbers small, we’ll use modular arithmetics, i.e. all operations are performed mod q a+b = (a+b) mod q a*b = (a*b) mod q …
![Page 105: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/105.jpg)
Modular arithmetics If T(A) = T(B), then T(A) mod q = T(B) mod q
In general, we can apply mods as many times as we want and we will not effect the result
What is the downside to this modular approach? Spurious hits: if T(A) mod q = T(B) mod q that
does not necessarily mean that T(A) = T(B) If we find a hit, we must check that the actual
string matches the pattern
![Page 106: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/106.jpg)
Runtime Preprocessing
Θ(m) Running time
Best case: Θ(n-m+1) – No matches and no spurious hits
Worst case Θ((n-m+1)m)
![Page 107: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/107.jpg)
Average case running time Assume v valid matches in the string What is the probability of a spurious hit?
As with hashing, assume a uniform mapping onto values of q:
What is the probability under this assumption?
1…qΣ*
![Page 108: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/108.jpg)
Average case running time Assume v valid matches in the string What is the probability of a spurious hit?
As with hashing, assume a uniform mapping onto values of q:
What is the probability under this assumption? 1/q
1…qΣ*
![Page 109: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/109.jpg)
Average case running time How many spurious hits?
n/q Average case running time:
O(n-m+1) + O(m(v+n/q))
iterate over the positions
checking matches and spurious hits
![Page 110: String Algorithms David Kauchak cs302 Spring 2012.](https://reader034.fdocuments.net/reader034/viewer/2022052418/5a4d1aec7f8b9ab05997b7c2/html5/thumbnails/110.jpg)
Matching running times
Algorithm Preprocessing time Matching time
Naïve 0 O((n-m+1)m)FSA Θ(m|Σ|) Θ(n)Rabin-Karp Θ(m) O(n)+O(m(v+n/q))Knuth-Morris-Pratt Θ(m) Θ(n)