1 The wide window string matching algorithm Longtao He, Binxing Fang, Jie Sui Theoretical Computer...
-
date post
20-Dec-2015 -
Category
Documents
-
view
213 -
download
0
Transcript of 1 The wide window string matching algorithm Longtao He, Binxing Fang, Jie Sui Theoretical Computer...
1
The wide window string matching algorithmLongtao He, Binxing Fang, Jie Sui
Theoretical Computer ScienceVolume: 332, Issue: 1-3,
February 28, 2005, pp. 391-404
Professor R.C.T LeeSpeaker K.W. Liu
Department of Computer ScienceNational Chi Nan University
2
Given : a text string T = t1t2t3…tn .
a pattern string P = p1p2p3…pm. where |P|≤|T|.Output: All occurrence(s) of the pattern string within the text string.
T = a b a b a b a b a a b b a a a b b a b a b a b
P = a a b b a a a b
String Matching Problem
ExampleT = ababababaabbaabbabababaP = aabbaabb
3
ExampleT = ababababaabbaabbabababaP = aabbaabb
a b a b a b a b a a b b a a a b b a b a b a b
a a b b a a a b
a a b b a a a b
a a b b a a a b
a a b b a a a b
Traditional Method
4
In this talk, we shall provide three ideas:1. The wide-window method2. The convolution method3. The bit pattern (modified convolution) method.
5
2|P|-1
|P||P|-1
T2T1
▫Open a window with size 2|P|-1. ▫Divide it into two parts:
•The first one denoted as T1 is with size |P|-1•The second part denoted as T2 is with size |P|
T
Basic Idea of the Wide Window Approach
Since |T1|<|P| , some suffix of P must be in T2 if it exists.
6
2|P|-1
|P||P|-1
▫Find all prefixes of T2 which are also suffixes of P.▫Let r denote the length of such a longest prefix. ▫We can be sure that one part of T2 can be ignored as shown.
T2T1
r
Can be ignored.
T
7
▫For every prefix of T2 which is a suffix of P, we should find whether there exists a suffix in T1 which is also a prefix of P.
2|P|-1
|P||P|-1
T2T1
Tr
8
n-suffix : Given a string S, n-suffix of S is the suffix of S whose length is n. -1< n < |S|+1 Example: S = abcde
0-suffix of S = ε 1-suffix of S = e 2-suffix of S = de 3-suffix of S = cde
n-prefix : Given a string S, n-prefix of S is the prefix of S whose length is n. -1< n < |S|+1 Example: S = abcde
0-prefix of S = ε 1-prefix of S = a 2-prefix of S = ab 3-prefix of S = abc
Definition:
9
Given: T = aababcbdceaP = abcbd
Let us produce a wide window whose length is |P| - 1 + |P| = 2|P| - 1
In this case, |P|=5 , 2|P| - 1 = 9
T =aababcbdcea
a a b a b c b d c e aT =
|P|-1 |P|
T2T1
An Example of the Wide Window Approach
10
We first find all prefixes of T2 which are equal to some suffixes of P. In this case, we obtain bcbd whose length is 4.
|P|-4 = 5-4 = 1
If the 1-suffix of T1 is the 1-prefix of P, we have found a matching. 1-suffix of T1 = a
1-prefix of P = a
∴1-suffix of T1 = 1-prefix of P. Thus we conclude that a matching is found.
a a b a b c b d c e a
a b c b d
T =
P =
T2T1
11
Given: T = ababaP = aba
Let us produce a wide window whose length is |P| - 1 + |P| = 2|P| - 1
In this case, |P| = 3 , 2|P| - 1 = 5
T = ababa
a b a b aT =
|P|-1 |P|
T2T1
Another Example
12
We first find all prefixes of T2 which are equal to some suffixes of P. In our case, we obtain aba and a where lengths are 3 and 1. |P| - 3 = 3 - 3 = 0 (۞ the whole P is equal to T2
۞ one matching is found )|P| - 1 = 3 – 1 = 2
a b a b aT =
|P|-1 |P|
T2T1
a b a
a b aP =
P =
If the 2-suffix of T1 is the 1-pre
fix of P, we have found a matc
hing.
2-suffix of T1 = ab
2-prefix of P = ab
∴2-suffix of T1 = 2-prefix of P.
Thus we conclude that two m
atchings are found.
13
Question: How can we find a suffix of a string S1 to be a
prefix of S2?
Answer : We use the convolution method.
14
Convolution Method
T = aabc , P = ab = baP
a a b c
b a
1 1 0 0
0 0 1 0
0 1 2 0 0
T= a a b cP= a b
0
T= a a b cP= a b
1 0
T= a a b cP= a b
1 1
T= a a b cP= a b
0 0
T= a a b cP= a b
0
15
We may use the convolution method to find all prefixes of T2 which are
equal to some suffixes of P.
T2 = bcbdc , P = abcbd
= cdbcb
a b c b d
c d b c b
0 1 0 1 0
0 0 1 0 0
0 1 0 1 0
0 0 0 0 1
0 0 1 0 0
0 0 1 1 0 4 0 1 0
+
=
P =
A 4-suffix of P equal to a prefix of T2. The unused region to find matching!
If any zero appears in thecolumn, we can not get a matching.
2T
2T
16
a b c b d
c d b c b
0 1 0 1 0
0 0 1 0 0
0 1 0 1 0
0 0 0 0 1
0 0 1 0 0
0 0 1 1 0 4 0 1 0
+
=
P =
P a b c b dT2 b c b d c
0
P a b c b dT2 b c b d c
1 0
P a b c b dT2 b c b d c
0 0 0
P a b c b dT2 b c b d c
1 1 1 1
P a b c b dT2 b c b d c
0 0 0 0 0
2T
May be ignored.
No further sliding to the left is needed.
17
a b c b d
c d b c b
0 1 0 1 0
0 0 1 0 0
0 1 0 1 0
0 0 0 0 1
0 0 1 0 0
0 0 1 1 0 4 0 1 0
+
=
P =
The unused region to find matching!
0 1 0 1 0
0 1 0 0
0 1 0
0 1
0
0 1 0 0 0
A 4-suffix of P equal to a prefix of T2
&
We may also use the logic operator (AND &) to find all prefixes of T2 which are equal to some suffixes of P. T2 = bcbdc P = abcbd
2T
18
We may use the convolution method to find all suffixes of T1 w
hich are equal to some prefixes of P.
T1 = aaba , P = abcbd
= dbcba
d b c b a
a a b a
0 0 0 0 1
0 1 0 1 0
0 0 0 0 1
0 0 0 0 1
0 0 0 1 1 2 0 1
+
T1 =
=
A 1-prefix of P equal to a suffix of T1.
The unused region to find matching!
If any zero appears in the column, we can not get a matching.
P
P
19
d b c b a
a a b a
0 0 0 0 1
0 1 0 1 0
0 0 0 0 1
0 0 0 0 1
0 0 0 1 1 2 0 1
+
T1 =
=P a b c b dT1 a a b a
1
P a b c b dT1 a a b a
0 0
P a b c b dT1 a a b a
1 1 0
P a b c b dT1 a a b a
1 0 0 0
May be ignored. No further sliding to the right is needed.
P
20
We may use the logic operator (AND &) to find all suffixes of T1 which are equal to some prefixes of P. T1 = aaba P = abcbd
d b c b a
a a b a
0 0 0 0 1
0 1 0 1 0
0 0 0 0 1
0 0 0 0 1
0 0 0 1 1 2 0 1
+
T1 =
=
The unused region to find matching!
0 0 0 1
0 1 0
0 1
1
0 0 0 1
&
P
A 1-prefix of P equals to a suffix of T1.
∴1-suffix of T1 = 1-prefix of P. Thus we conclude that a matching is found.
21
The Bit Pattern Approach
Let us consider the following case:
T = bcbdc
P = abcbd
Our job is to determine whether there is a prefix in T w
hich is a suffix of P. Indeed, in this case, we have 4-prefix of
T (bcbd) which is also the 4-suffix of P.
As indicated before , we may use convolution.
22
P= a b c b d
= c d b c b
0 1 0 1 0
0 0 1 0 0
0 1 0 1 0
0 0 0 0 1
0 0 1 0 0
0 0 0 0 0 1 0 0 0
T
Convolution
A 4-suffix of T is a 4-prefix of P.
AND OPERATION
V1
V2
V3
V4
V5
What are the vectors V1,V2,…,V5?
23
Given a string S = s1s2…sn and a character α, the α-bit pattern of S
is defined as b1b2…bn where bi=1 if si = α and bi=0 if otherwise.
Example:
S = abcbd
a-bit pattern of S = 1 0 0 0 0
b-bit pattern of S = 0 1 0 1 0
c-bit pattern of S = 0 0 1 0 0
d-bit pattern of S = 0 0 0 0 1
24
T = b c b d c , P = a b c b d
P= a b c b d = c d b c b
0 1 0 1 00 0 1 0 0
0 1 0 1 00 0 0 0 1
0 0 1 0 00 0 0 0 0 1 0 0 0
T
AND OPERATION
V1
V2
V3
V4
V5
We can now observe that
1. V1 = b-bit pattern of P as we are comparing T[1] = b with P,
2. V2 = c-bit pattern of P as we are comparing T[2] = c with P,
3. V3 = b-bit pattern of P as we are comparing T[3] = b with P,
4. V4 = d-bit pattern of P as we are comparing T[4] = d with P,
5. V5 = c-bit pattern of P as we are comparing T[5] = c with P.and
25
T = bcbdc
P = abcbd
(1)T[1]=b. We want to decide whether P[5] = T[1] = b.
b-bit vector of P = 0 1 0 1 0
The last bit is 0 ≠1.
T[1] ≠ P[5]
Besides, we know that T[1] = P[2] = P[4]
26
T = b c b d c
P = a b c b d
(2) T[2] = c. We want to decide whether T[1]T[2] = bc= P[4]P[5].
c-bit pattern of P = 0 0 1 0 0
AND-operation of T[1]-bit pattern of P and T[2]-bit pattern of P in the followi
ng way:
0 1 0 1 00 0 1 0 00 0 1 0 0 0
ignore
ignore
The last bit is 0. The 2-prefix of T ≠ the 2-suffix of P.
What does 0 1 0 0 mean ?
It means that T[1] = P[2] = b and T[2] = P[3] = c.
We keep this result 0 1 0 0.
Last bit
T[2] = P[3]
T[1] = P[2]
The result of comparing T[1] and P[5] can be ignored from now on.
27
T = b c b d c
P = a b c b d
Resulting vector = 0 1 0 0
(3) T[3] = b. We want to decide whether T[1]T[2]T[3] = P[3]P[4]P[5].
b-bit pattern of P = 0 1 0 1 0
We only take the last 3 bits, namely 0 1 0 because we are interested in
P[3]P[4]P[5].
0 1 0 00 1 00 1 0 0
ignore
The last bit is 0. The 3-prefix of T ≠ the 3-suffix of P.
0 1 0 means that T[1] = P[2] = b , T[2] = P[3] (previously obtained) and T[3] = P[4]
We keep the resulting vector 010.
Result
AND-operation
Last bit
28
T = b c b d c
P = a b c b d
Resulting vector = 0 1 0
(4) T[4] = d. We want to decide whether T[1]T[2]T[3]T[4] = P[2]P[3]P[4]P[5].
d-bit pattern of P = 0 0 0 0 1
We only take the last 2 bits, namely 0 1.
0 1 00 10 1 0
ignore
The last bit is 1. The 4-prefix of T = the 4-suffix of P.
0 1 means that T[1] = P[2] = b , T[2] = P[3]=c , T[3] = P[4]=b (previously obtained)
and T[4] = P[5] = d
Result
AND-operation
Last bit
29
T = b c b d c
P = a b c b d
Resulting vector = 0 1
(4) T[5] = c. We want to decide whether
T[1]T[2]T[3]T[4]T[5] = P[1]P[2]P[3]P[4]P[5].
c-bit pattern of P = 0 0 1 0 0
We only take the last 1 bits, namely 0.
0 100 1
ignore
The last bit is 0. The 5-prefix of T ≠ the 5-suffix of P.
0 means that T[1] = P[2] = b , T[2] = P[3] , T[3] = P[4] ,T[4] = P[5] (previously
obtained)
Result
AND-operation
30
The Logic Operator (AND &)1 & 1 = 11 & 0 = 0
0 & 1 = 0 0 & 0 = 0
Bit Pattern Of String - BPS Given a string S which is composed of n characters. S = abcabcabc S is composed of 3 characters which are a, b and c.BPS means to make bit patterns where each pattern represents each character appeared position in string.
a-bit pattern = 1 0 0 1 0 0 1 0 0 b-bit pattern = 0 1 0 0 1 0 0 1 0 c-bit pattern = 0 0 1 0 0 1 0 0 1
Definition:
31
ww( T =t1t2…tm , P=p1p2…pn)
Preprocessing
Find the character set of P
Build the character_bit pattern of P
the character_rbit pattern of inversed P
Search
For k do
Open a wide window whose length is 2m-1 and its center point is at km
Let the window be denoted as a1a2…a2m-1
Let a1a2…am-1 be denoted as T1
Let amam+1…a2m-1 be denoted as T2
/*we use modified convolution method to find out the matching*/
Find out all prefixes of T2 which are the suffix of P. (page 33-34)
state 1:
Find out the corresponding prefixes of P which are the suffix of T1 .(page 35-36)
/*each time we can jump the wide window |P|*/
state 2:
End For
m
n...1
The Algorithm
32
Having constructed the character bit pattern of reversed P, we may use the character bit pattern of reversed P to find whether the suffix of T1 is equal to the prefix of P.
Having constructed the character bit pattern of P, we may use the character bit pattern of P to find whether the prefix of T2 is equal to the suffix of P.
33
Suf_bit = 1 … 1 //temporary space for storing the result
|Suf_bit| = | T2 | //the length of tem space is equal to the length of T2
x = 1 // x is the index for reading T2
Read T2 from left to right
//if the reading character of T2 is one of the characters of P.
if the character which belongs to the character set of P
//We use AND-operation to simulate the convolution method //After each simulation, we store the result into the temporary space
Suf_bit[|P|…x] = Suf_bit[|P|….x] & character_bit[(|P|-x+1)...1]
/*check whether the |P|th to xth bit of Suf_bit are zeros, they are all zeros means no more prefix of T2 will be
equal to the suffix of P. Therefore we can skip the remaining reading character from T2. */
if the |P|th to xth bit of Suf_bit are all zero
goto state 1
end if
else //if the reading character of T2 is not one of the characters of P.
Set the |P|th to xth bit of Suf_bit to zero.
goto state 2 // finish the reading from T2
end if
Find out all prefixes of T2 which are the suffix of P.
34
//if the xth bit of suf_bit is 1, x-suffix of T2 is equal to the x-prefix of P if Suf_bit[x] == 1
if x == |P| , //if the length of suffix of T2 is equal to the length of P , we found a matching
we found a matching at km else
we found x-suffix end if
end ifx++ // increase the index for reading next character
Read next character
Fig :: Find out all prefixes of T2 which are the suffix of P.
35
Having constructed the character bit pattern of reversed P, we may use the character bit pattern of reversed P to find whether the suffix of T1 is equal to the prefix of P.
36
state 1:
//if in the previous processing, we did not find any prefix of T2 is equal to the suffix of P //We do not need to find any corresponding suffix of T1 is equal to the prefix of P
if the |P|th to 1st bit of Suf_bit are all zero
goto state 2
else
Pre_bit = 1 …1 //temporary space for storing the result y = |Pre_bit| = | T1 | //the length of tem space is equal to the length of T1
z = 1 // z is the index for reading T1
Read T1 from right to left
//if the reading character of T1 is one of the characters of P
if the character which belongs to the character set of P,
//We use AND-operation to simulate the convolution method Pre_bit[y…z] = Pre_bit[y...z] & character_rbit[(y-z+1)...1]
/* if the (|P|-1)th to yth bit of Pre_bit are zeros, they are all zeros means no more suffix of T1 will be equal to the prefix of P. Therefore we can skip the remaining reading character from T1.
*/
Find out the corresponding prefixes of P which are the suffix of T1 .
37
if the (|P|-1)th to yth bit of Pre_bit are all zero
goto state 2
end if
else //if the reading character of T1 is not one of the characters of P. goto state 2
// finish the reading character from T1
end if
//if the xth bit of Pre_bit is 1, x-suffix of T1 is equal to the x-prefix of P
if (Pre_bit[z] == 1) then
/* if we found a suffix of T1 is equal to the a prefix of P, we need to check the whether th
e corresponding prefix of T2 appeared in the Suf_bit pattern. */
if ( Pre_bit[z] & Suf_bit[|P|-z]) Found a matching at km – y
end if
end if
z++
Read next character
end if
state 2:Fig :: Find out the corresponding prefixes of P which are the suffix of T1 .
38
Example: T = aababcbdc P = abcbd
Let us produce a wide windows where length is |P| - 1 + |P| = 2|P| - 1
In this case, |P|=5 , 2|P| - 1 = 9
a a b a b c b d cT =
|P|-1 |P|
T2T1
39
PreprocessingBuild character bit pattern of PP = abcbdFind all bit patterns of P, P is composed of a, b, c, b, d.The character set of P = {a, b, c, b, d} a_bit = 1 0 0 0 0 b_bit = 0 1 0 1 0 c_bit = 0 0 1 0 0 d_bit = 0 0 0 0 1
Having constructed the character bit pattern of P, we may use the character bit pattern of P to find whether the prefix of T2 is equal to the suffix of P.
40
Having constructed the character bit pattern of P, we may use the character bit pattern of P to find whether the prefix of T2 is equal to the suffix of P.
Build character bit pattern of reversed P
P = abcbd Find all character bit patterns of reversed P, P is composed of a,
b, c, b, d. The character set of P = {a, b, c, b, d} a_rbit = 0 0 0 0 1 b_rbit = 0 1 0 1 0 c_rbit = 0 0 1 0 0 d_rbit = 1 0 0 0 0
41
the character is ‘b’ ∴ Suf_bit[5...1] = Suf_bit[5...1] & b_bit[5...1]
∵ the last bit is ‘0’, no1-suffix of T2
is equal to 1-prefix of P
Suf_bit [5…1] = 0 1 0 1 0
2T
P= a b c b d
= c d b c b
0 1 0 1 0
0 0 1 0 0
0 1 0 1 0
& 0 0 0 0 1
0 0 1 0 0
0 0 0 0 0 1 0 0 0
= 1 1 1 1 1 & 0 1 0 1 0= 0 1 0 1 0
Step 1 T2 =b c b d c
P a b c b dT2 b c b d c
0
42
the character is ‘c’ ∴ Suf_bit[5...2] = Suf_bit[5…2] & c_bit[4…1]
Suf_bit [5…1] = 0 1 0 0 0
2T
P= a b c b d
= c d b c b
0 1 0 1 0
0 0 1 0 0
0 1 0 1 0
& 0 0 0 0 1
0 0 1 0 0
0 0 0 0 0 1 0 0 0
= 0 1 0 1 & 0 1 0 0= 0 1 0 0
Step 2 T2 =b c b d c
P a b c b dT2 b c b d c
1 0
∵ the last bit is ‘0’, no2-suffix of T2
is equal to 2-prefix of P
43
the character is ‘b’ ∴ Suf_bit[5...3] = Suf_bit[5...3] & b_bit[3...1]
Suf_bit [5…1] = 0 1 0 0 0
2T
P= a b c b d
= c d b c b
0 1 0 1 0
0 0 1 0 0
0 1 0 1 0
& 0 0 0 0 1
0 0 1 0 0
0 0 0 0 0 1 0 0 0
= 0 1 0 & 0 1 0= 0 1 0
Step 3 T2 =b c b d c
P a b c b dT2 b c b d c
0 0 0
∵ the last bit is ‘0’, no 3-suffix of T2
is equal to 3-prefix of P
44
the character is ‘d’
∴ Suf_bit[5...4] = Suf_bit[5…4] & c_bit[2...1]
Suf_bit [5…1] = 0 1 0 0 0
2T
P= a b c b d
= c d b c b
0 1 0 1 0
0 0 1 0 0
0 1 0 1 0
& 0 0 0 0 1
0 0 1 0 0
0 0 0 0 0 1 0 0 0
= 0 1 & 0 1= 0 1
Step 4 T2 =b c b d c
P a b c b dT2 b c b d c
1 1 1 1
∵ the last bit is ‘1’, 4-suffix of T2
is equal to 4-prefix of P
45
We have found one suffix which is 4-suffix. The corresponding prefix which we need to find is (|P|-4)-prefix.If we found, we got a matching.
the character is ‘c’ ∴ Suf_bit[5...5] = Suf_bit[5...5] & c_bit[1...1]
∴ Suf-bit [5…1] = 0 1 0 0 0
2T
P= a b c b d
= c d b c b
0 1 0 1 0
0 0 1 0 0
0 1 0 1 0
& 0 0 0 0 1
0 0 1 0 0
0 0 0 0 0 1 0 0 0
= 0 & 0= 0
Step 5 T2 =b c b d c
P a b c b dT2 b c b d c
0 0 0 0 0
∵ the last bit is ‘0’, no5-suffix of T2
is equal to 5-prefix of P
46
Having constructed the character bit pattern of reversed P, we may use the character bit pattern of reversed P to find whether the suffix of T1 is equal to the prefix of P.
47
Pre_bit[4...1] = Pre_bit[4...1] & a_rbit[4...1]
if (Pre_bit[1] == 1) then if ( Pre_bit[1] & Suf_bit[5-1])
Found a matching
∴ Suf-bit [5…1] = 0 1 0 0 0
∴ Pre-bit [4…1] = 0 0 0 1
Check |prefix|+|suffix| = |P| ?
p = d b c b aT1= a a b a
0 0 0 0 10 1 0 1 0
0 0 0 0 1& 0 0 0 0 1
0 0 0 0 0 0 0 1
= 1 1 1 1 & 0 0 0 1= 0 0 0 1
Step 1 T1 =a a b a
P a b c b dT2 a a b a
1
∵ the last bit is ‘0’, 1-prefix of T1
is equal to 1-suffix of P
48
Pre_bit[4…2] = Pre_bit[4...2] & b_rbit[3...1]
if (Pre_bit[2] == 1) then if ( Pre_bit[2] & Suf_bit[5-2]) no need to check
Pre-bit[4..1] = 0 0 0 1
= d b c b aT1= a a b a
0 0 0 0 10 1 0 1 0
0 0 0 0 1& 0 0 0 0 1
0 0 0 0 0 0 0 1
= 0 0 0 & 0 0 0 1= 0 0 0
Step 2 T1 =a a b a
p
P a b c b dT2 a a b a
0 0
∵ the last bit is ‘0’, no 2-prefix of T1
is equal to 2-suffix of P
49
Pre_bit[4...3] = Pre_bit[4...3] & a_rbit[2...1]
if (Pre_bit[3] == 1) then if ( Pre_bit[3] & Suf_bit[5-3]) no need to check
Pre-bit[4..1] = 0 0 0 1
= d b c b aT1= a a b a
0 0 0 0 10 1 0 1 0
0 0 0 0 1& 0 0 0 0 1
0 0 0 0 0 0 0 1
p
= 0 0 & 0 1= 0 0
Step 3 T1 =a a b a
P a b c b dT2 a a b a
1 1 0
∵ the last bit is ‘0’, no 3-prefix of T1
is equal to 3-suffix of P
50
Pre_bit[4...4] = Pre_bit[4…4] & a_rbit[1…1]
if (Pre-bit[4] == 0) then if ( Pre-bit[4] & Suf-bit[5-4]) no need to check
Pre-bit[4..1] = 0 0 0 1
= d b c b aT1= a a b a
0 0 0 0 10 1 0 1 0
0 0 0 0 1& 0 0 0 0 1
0 0 0 0 0 0 0 1
p
= 0 & 0= 0
Step 3 T1 =a a b a
∵ the last bit is ‘0’, no 3-prefix of T1
is equal to 3-suffix of PP a b c b dT1 a a b a
1 0 0 0
51
References
[1] Simple optimal string matching algorithm, C. Allauzen, M. Raffinot, J. Algorithms 36 (1) (2000) 102–116.
[2] A new approach to text searching, R. Baeza-Yates, G.H. Gonnet, Comm. ACM 35 (10) (1992) 74–82.
[3] A fast string searching algorithm, R.S. Boyer, J.S. Moore, Comm. ACM 20 (10) (1977) 62–72.
[4] Handbook of Exact String Matching Algorithms, C. Charras, T. Lecroq, King’s College London Publications, 2004.
[5] A very fast string matching algorithm for small alphabets and long Patterns, C. Charras, T. Lecroq, J.D. Pehoushek,, in: M. Farach-Colton (Ed.), Proc. of the 9thAnn.Symp. on Combinatorial Pattern Matching, Lecture Notes in Computer Science, Vol. 1448, Springer, Piscataway, NJ, USA, 1998, pp. 55–64.
[6] Transducers and repetitions, M. Crochemore, Theoret. Comput. Sci. 45 (1) (1986) 63–86.
52
[7] Off-line serial exact string searching, M. Crochemore, in: A. Apostolico, Z. Galil (Eds.), Pattern Matching Algorithms, OxfordUni versity Press, Oxford, 1997, pp. 1–53, (Chapter 1).
[8] Reducing space for index implementation, M. Crochemore, Theoret. Comput. Sci. 292 (1) (2003) 185–197.
[9] Speeding up two string-matching algorithms, M. Crochemore, A. Czumaj, L. Gasieniec, S. Jarominek, T. Lecroq,W. Plandowski,W. Rytter, Algorithmica 12 (4/5) (1994) 247–267.
[10] Automata for matching patterns, M. Crochemore, C. Hancart, in: G. Rozenberg,A. Salomaa (Eds.), Handbook of Formal Languages, Vol. 2, Linear Modeling: Background and Application, Springer, Berlin, 1997, pp. 399–462 (Chapter 9).
[11] Text algorithms, M. Crochemore,W. Rytter, OxfordUniversity Press, Oxford, 1994, 412pp.
53
[12] Jewels of Stringology, M. Crochemore,W. Rytter, WorldScientific, Singapore, 2002.
[13] Linear nondeterministic dawg string matching algorithm, L. He, B. Fang, in: A. Alberto, M. Massimo (Eds.), String Processing andInformation Retrieval, 11th Internat. Symp. (SPIRE 2004), Lecture Notes in Computer Science, Vol. 3246, Springer, Padova, Italy, 2004, pp. 70–71.
[14] Fast string searching, A. Hume, D. Sunday, Software Pract. Exper. 21 (11) (1991) 1221–1248.
[15] Fast pattern matching in strings, D.E. Knuth, J.H. Morris, V.R. Pratt, SIAM J. Comput. 6 (2) (1977)323–350.
[16] A variation on the Boyer–Moore algorithm, T. Lecroq, Theoret. Comput. Sci. 92 (1) (1992) 119–144.
54
[17] Fast and flexible string matching by combining bit-parallelism and suffix automata, G. Navarro, M. Raffinot, ACM J. Exp. Algorithmics (JEA) 5 (4) (2000) 1–36.
[18] Flexible Pattern Matching in Strings—Practical On-line Search Algorithms for Texts and Biological Sequences, G. Navarro, M. Raffinot, Cambridge University Press, Cambridge, 2002.
[19] Alternative algorithms for bit-parallel string matching, H. Peltola, J. Tarhio, in: M.A. Nascimento, E.S. de Moura, A.L. Oliveira (Eds.), Proc. 10th Internat. Symp. on String Processing and Information Retrieval (SPIRE’03), Lecture Notes in Computer Science, Vol. 2857, Springer, Manaus, Brazil, 2003, pp. 80–94.
[20] Asymptotic estimation of the average number of terminal states in dawgs, M. Raffinot, in: R. Baeza-Yates (Ed.), Proc. 4th SouthAmericanWorkshop on String Processing, Carleton University Press,Valparaiso, Chile, 1997, pp. 140–148.
[21] On the multi backward dawg matching algorithm (MultiBDM), M. Raffinot, in: R. Baeza-Yates (Ed.), Proc.4th South American Workshop on String Processing, Carleton University Press, Valparaiso, Chile, 1997, pp. 149–165.
55
[22] Computing Patterns in Strings, W.F. Smyth, Pearson AddisonWesley, 2003.
[23] A very fast substring search algorithm, D.M. Sunday, Comm. ACM 33 (8) (1990) 132–142.
[24] Average case analysis of the boyer–moore algorithm, T.-H. Tsai, in:http://www.stat.sinica.edu.tw/chonghi/stat.htm, 2003.
[25] The complexity of pattern matching for a random string, A.C.C. Yao, SIAM J. Comput. 8 (3) (1979) 368–387.