1 The wide window string matching algorithm Longtao He, Binxing Fang, Jie Sui Theoretical Computer...

56
1 The wide window string matching algo rithm Longtao He, Binxing Fang, Jie Sui Theoretical Computer Science Volume: 332, Issue: 1-3, February 28, 2005, pp. 391-404 Professor R.C.T Lee Speaker K.W. Liu Department of Computer Science
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    213
  • download

    0

Transcript of 1 The wide window string matching algorithm Longtao He, Binxing Fang, Jie Sui Theoretical Computer...

1

The wide window string matching algorithmLongtao He, Binxing Fang, Jie Sui

Theoretical Computer ScienceVolume: 332, Issue: 1-3,

February 28, 2005, pp. 391-404

Professor R.C.T LeeSpeaker K.W. Liu

Department of Computer ScienceNational Chi Nan University

2

Given : a text string T = t1t2t3…tn .

a pattern string P = p1p2p3…pm. where |P|≤|T|.Output: All occurrence(s) of the pattern string within the text string.

T = a b a b a b a b a a b b a a a b b a b a b a b

P = a a b b a a a b

String Matching Problem

ExampleT = ababababaabbaabbabababaP = aabbaabb

3

ExampleT = ababababaabbaabbabababaP = aabbaabb

a b a b a b a b a a b b a a a b b a b a b a b

a a b b a a a b

a a b b a a a b

a a b b a a a b

a a b b a a a b

Traditional Method

4

In this talk, we shall provide three ideas:1. The wide-window method2. The convolution method3. The bit pattern (modified convolution) method.

5

2|P|-1

|P||P|-1

T2T1

▫Open a window with size 2|P|-1. ▫Divide it into two parts:

•The first one denoted as T1 is with size |P|-1•The second part denoted as T2 is with size |P|

T

Basic Idea of the Wide Window Approach

Since |T1|<|P| , some suffix of P must be in T2 if it exists.

6

2|P|-1

|P||P|-1

▫Find all prefixes of T2 which are also suffixes of P.▫Let r denote the length of such a longest prefix. ▫We can be sure that one part of T2 can be ignored as shown.

T2T1

r

Can be ignored.

T

7

▫For every prefix of T2 which is a suffix of P, we should find whether there exists a suffix in T1 which is also a prefix of P.

2|P|-1

|P||P|-1

T2T1

Tr

8

n-suffix : Given a string S, n-suffix of S is the suffix of S whose length is n. -1< n < |S|+1 Example: S = abcde

0-suffix of S = ε 1-suffix of S = e 2-suffix of S = de 3-suffix of S = cde

n-prefix : Given a string S, n-prefix of S is the prefix of S whose length is n. -1< n < |S|+1 Example: S = abcde

0-prefix of S = ε 1-prefix of S = a 2-prefix of S = ab 3-prefix of S = abc

Definition:

9

Given: T = aababcbdceaP = abcbd

Let us produce a wide window whose length is |P| - 1 + |P| = 2|P| - 1

In this case, |P|=5 , 2|P| - 1 = 9

T =aababcbdcea

a a b a b c b d c e aT =

|P|-1 |P|

T2T1

An Example of the Wide Window Approach

10

We first find all prefixes of T2 which are equal to some suffixes of P. In this case, we obtain bcbd whose length is 4.

|P|-4 = 5-4 = 1

If the 1-suffix of T1 is the 1-prefix of P, we have found a matching. 1-suffix of T1 = a

1-prefix of P = a

∴1-suffix of T1 = 1-prefix of P. Thus we conclude that a matching is found.

a a b a b c b d c e a

a b c b d

T =

P =

T2T1

11

Given: T = ababaP = aba

Let us produce a wide window whose length is |P| - 1 + |P| = 2|P| - 1

In this case, |P| = 3 , 2|P| - 1 = 5

T = ababa

a b a b aT =

|P|-1 |P|

T2T1

Another Example

12

We first find all prefixes of T2 which are equal to some suffixes of P. In our case, we obtain aba and a where lengths are 3 and 1. |P| - 3 = 3 - 3 = 0 (۞ the whole P is equal to T2

۞ one matching is found )|P| - 1 = 3 – 1 = 2

a b a b aT =

|P|-1 |P|

T2T1

a b a

a b aP =

P =

If the 2-suffix of T1 is the 1-pre

fix of P, we have found a matc

hing.

2-suffix of T1 = ab

2-prefix of P = ab

∴2-suffix of T1 = 2-prefix of P.

Thus we conclude that two m

atchings are found.

13

Question: How can we find a suffix of a string S1 to be a

prefix of S2?

Answer : We use the convolution method.

14

Convolution Method

T = aabc , P = ab = baP

a a b c

b a

1 1 0 0

0 0 1 0

0 1 2 0 0

T= a a b cP= a b

0

T= a a b cP= a b

1 0

T= a a b cP= a b

1 1

T= a a b cP= a b

0 0

T= a a b cP= a b

0

15

We may use the convolution method to find all prefixes of T2 which are

equal to some suffixes of P.

T2 = bcbdc , P = abcbd

= cdbcb

a b c b d

c d b c b

0 1 0 1 0

0 0 1 0 0

0 1 0 1 0

0 0 0 0 1

0 0 1 0 0

0 0 1 1 0 4 0 1 0

+

=

P =

A 4-suffix of P equal to a prefix of T2. The unused region to find matching!

If any zero appears in thecolumn, we can not get a matching.

2T

2T

16

a b c b d

c d b c b

0 1 0 1 0

0 0 1 0 0

0 1 0 1 0

0 0 0 0 1

0 0 1 0 0

0 0 1 1 0 4 0 1 0

+

=

P =

P a b c b dT2 b c b d c

0

P a b c b dT2 b c b d c

1 0

P a b c b dT2 b c b d c

0 0 0

P a b c b dT2 b c b d c

1 1 1 1

P a b c b dT2 b c b d c

0 0 0 0 0

2T

May be ignored.

No further sliding to the left is needed.

17

a b c b d

c d b c b

0 1 0 1 0

0 0 1 0 0

0 1 0 1 0

0 0 0 0 1

0 0 1 0 0

0 0 1 1 0 4 0 1 0

+

=

P =

The unused region to find matching!

0 1 0 1 0

0 1 0 0

0 1 0

0 1

0

0 1 0 0 0

A 4-suffix of P equal to a prefix of T2

&

We may also use the logic operator (AND &) to find all prefixes of T2 which are equal to some suffixes of P. T2 = bcbdc P = abcbd

2T

18

We may use the convolution method to find all suffixes of T1 w

hich are equal to some prefixes of P.

T1 = aaba , P = abcbd

= dbcba

d b c b a

a a b a

0 0 0 0 1

0 1 0 1 0

0 0 0 0 1

0 0 0 0 1

0 0 0 1 1 2 0 1

+

T1 =

=

A 1-prefix of P equal to a suffix of T1.

The unused region to find matching!

If any zero appears in the column, we can not get a matching.

P

P

19

d b c b a

a a b a

0 0 0 0 1

0 1 0 1 0

0 0 0 0 1

0 0 0 0 1

0 0 0 1 1 2 0 1

+

T1 =

=P a b c b dT1 a a b a

1

P a b c b dT1 a a b a

0 0

P a b c b dT1 a a b a

1 1 0

P a b c b dT1 a a b a

1 0 0 0

May be ignored. No further sliding to the right is needed.

P

20

We may use the logic operator (AND &) to find all suffixes of T1 which are equal to some prefixes of P. T1 = aaba P = abcbd

d b c b a

a a b a

0 0 0 0 1

0 1 0 1 0

0 0 0 0 1

0 0 0 0 1

0 0 0 1 1 2 0 1

+

T1 =

=

The unused region to find matching!

0 0 0 1

0 1 0

0 1

1

0 0 0 1

&

P

A 1-prefix of P equals to a suffix of T1.

∴1-suffix of T1 = 1-prefix of P. Thus we conclude that a matching is found.

21

The Bit Pattern Approach

Let us consider the following case:

T = bcbdc

P = abcbd

Our job is to determine whether there is a prefix in T w

hich is a suffix of P. Indeed, in this case, we have 4-prefix of

T (bcbd) which is also the 4-suffix of P.

As indicated before , we may use convolution.

22

P= a b c b d

= c d b c b

0 1 0 1 0

0 0 1 0 0

0 1 0 1 0

0 0 0 0 1

0 0 1 0 0

0 0 0 0 0 1 0 0 0

T

Convolution

A 4-suffix of T is a 4-prefix of P.

AND OPERATION

V1

V2

V3

V4

V5

What are the vectors V1,V2,…,V5?

23

Given a string S = s1s2…sn and a character α, the α-bit pattern of S

is defined as b1b2…bn where bi=1 if si = α and bi=0 if otherwise.

Example:

S = abcbd

a-bit pattern of S = 1 0 0 0 0

b-bit pattern of S = 0 1 0 1 0

c-bit pattern of S = 0 0 1 0 0

d-bit pattern of S = 0 0 0 0 1

24

T = b c b d c , P = a b c b d

P= a b c b d = c d b c b

0 1 0 1 00 0 1 0 0

0 1 0 1 00 0 0 0 1

0 0 1 0 00 0 0 0 0 1 0 0 0

T

AND OPERATION

V1

V2

V3

V4

V5

We can now observe that

1. V1 = b-bit pattern of P as we are comparing T[1] = b with P,

2. V2 = c-bit pattern of P as we are comparing T[2] = c with P,

3. V3 = b-bit pattern of P as we are comparing T[3] = b with P,

4. V4 = d-bit pattern of P as we are comparing T[4] = d with P,

5. V5 = c-bit pattern of P as we are comparing T[5] = c with P.and

25

T = bcbdc

P = abcbd

(1)T[1]=b. We want to decide whether P[5] = T[1] = b.

b-bit vector of P = 0 1 0 1 0

The last bit is 0 ≠1.

T[1] ≠ P[5]

Besides, we know that T[1] = P[2] = P[4]

26

T = b c b d c

P = a b c b d

(2) T[2] = c. We want to decide whether T[1]T[2] = bc= P[4]P[5].

c-bit pattern of P = 0 0 1 0 0

AND-operation of T[1]-bit pattern of P and T[2]-bit pattern of P in the followi

ng way:

0 1 0 1 00 0 1 0 00 0 1 0 0 0

ignore

ignore

The last bit is 0. The 2-prefix of T ≠ the 2-suffix of P.

What does 0 1 0 0 mean ?

It means that T[1] = P[2] = b and T[2] = P[3] = c.

We keep this result 0 1 0 0.

Last bit

T[2] = P[3]

T[1] = P[2]

The result of comparing T[1] and P[5] can be ignored from now on.

27

T = b c b d c

P = a b c b d

Resulting vector = 0 1 0 0

(3) T[3] = b. We want to decide whether T[1]T[2]T[3] = P[3]P[4]P[5].

b-bit pattern of P = 0 1 0 1 0

We only take the last 3 bits, namely 0 1 0 because we are interested in

P[3]P[4]P[5].

0 1 0 00 1 00 1 0 0

ignore

The last bit is 0. The 3-prefix of T ≠ the 3-suffix of P.

0 1 0 means that T[1] = P[2] = b , T[2] = P[3] (previously obtained) and T[3] = P[4]

We keep the resulting vector 010.

Result

AND-operation

Last bit

28

T = b c b d c

P = a b c b d

Resulting vector = 0 1 0

(4) T[4] = d. We want to decide whether T[1]T[2]T[3]T[4] = P[2]P[3]P[4]P[5].

d-bit pattern of P = 0 0 0 0 1

We only take the last 2 bits, namely 0 1.

0 1 00 10 1 0

ignore

The last bit is 1. The 4-prefix of T = the 4-suffix of P.

0 1 means that T[1] = P[2] = b , T[2] = P[3]=c , T[3] = P[4]=b (previously obtained)

and T[4] = P[5] = d

Result

AND-operation

Last bit

29

T = b c b d c

P = a b c b d

Resulting vector = 0 1

(4) T[5] = c. We want to decide whether

T[1]T[2]T[3]T[4]T[5] = P[1]P[2]P[3]P[4]P[5].

c-bit pattern of P = 0 0 1 0 0

We only take the last 1 bits, namely 0.

0 100 1

ignore

The last bit is 0. The 5-prefix of T ≠ the 5-suffix of P.

0 means that T[1] = P[2] = b , T[2] = P[3] , T[3] = P[4] ,T[4] = P[5] (previously

obtained)

Result

AND-operation

30

The Logic Operator (AND &)1 & 1 = 11 & 0 = 0

0 & 1 = 0 0 & 0 = 0

Bit Pattern Of String - BPS Given a string S which is composed of n characters. S = abcabcabc S is composed of 3 characters which are a, b and c.BPS means to make bit patterns where each pattern represents each character appeared position in string.

a-bit pattern = 1 0 0 1 0 0 1 0 0 b-bit pattern = 0 1 0 0 1 0 0 1 0 c-bit pattern = 0 0 1 0 0 1 0 0 1

Definition:

31

ww( T =t1t2…tm , P=p1p2…pn)

Preprocessing

Find the character set of P

Build the character_bit pattern of P

the character_rbit pattern of inversed P

Search

For k do

Open a wide window whose length is 2m-1 and its center point is at km

Let the window be denoted as a1a2…a2m-1

Let a1a2…am-1 be denoted as T1

Let amam+1…a2m-1 be denoted as T2

/*we use modified convolution method to find out the matching*/

Find out all prefixes of T2 which are the suffix of P. (page 33-34)

state 1:

Find out the corresponding prefixes of P which are the suffix of T1 .(page 35-36)

/*each time we can jump the wide window |P|*/

state 2:

End For

m

n...1

The Algorithm

32

Having constructed the character bit pattern of reversed P, we may use the character bit pattern of reversed P to find whether the suffix of T1 is equal to the prefix of P.

Having constructed the character bit pattern of P, we may use the character bit pattern of P to find whether the prefix of T2 is equal to the suffix of P.

33

Suf_bit = 1 … 1 //temporary space for storing the result

|Suf_bit| = | T2 | //the length of tem space is equal to the length of T2

x = 1 // x is the index for reading T2

Read T2 from left to right

//if the reading character of T2 is one of the characters of P.

if the character which belongs to the character set of P

//We use AND-operation to simulate the convolution method //After each simulation, we store the result into the temporary space

Suf_bit[|P|…x] = Suf_bit[|P|….x] & character_bit[(|P|-x+1)...1]

/*check whether the |P|th to xth bit of Suf_bit are zeros, they are all zeros means no more prefix of T2 will be

equal to the suffix of P. Therefore we can skip the remaining reading character from T2. */

if the |P|th to xth bit of Suf_bit are all zero

goto state 1

end if

else //if the reading character of T2 is not one of the characters of P.

Set the |P|th to xth bit of Suf_bit to zero.

goto state 2 // finish the reading from T2

end if

Find out all prefixes of T2 which are the suffix of P.

34

//if the xth bit of suf_bit is 1, x-suffix of T2 is equal to the x-prefix of P if Suf_bit[x] == 1

if x == |P| , //if the length of suffix of T2 is equal to the length of P , we found a matching

we found a matching at km else

we found x-suffix end if

end ifx++ // increase the index for reading next character

Read next character

Fig :: Find out all prefixes of T2 which are the suffix of P.

35

Having constructed the character bit pattern of reversed P, we may use the character bit pattern of reversed P to find whether the suffix of T1 is equal to the prefix of P.

36

state 1:

//if in the previous processing, we did not find any prefix of T2 is equal to the suffix of P //We do not need to find any corresponding suffix of T1 is equal to the prefix of P

if the |P|th to 1st bit of Suf_bit are all zero

goto state 2

else

Pre_bit = 1 …1 //temporary space for storing the result y = |Pre_bit| = | T1 | //the length of tem space is equal to the length of T1

z = 1 // z is the index for reading T1

Read T1 from right to left

//if the reading character of T1 is one of the characters of P

if the character which belongs to the character set of P,

//We use AND-operation to simulate the convolution method Pre_bit[y…z] = Pre_bit[y...z] & character_rbit[(y-z+1)...1]

/* if the (|P|-1)th to yth bit of Pre_bit are zeros, they are all zeros means no more suffix of T1 will be equal to the prefix of P. Therefore we can skip the remaining reading character from T1.

*/

Find out the corresponding prefixes of P which are the suffix of T1 .

37

if the (|P|-1)th to yth bit of Pre_bit are all zero

goto state 2

end if

else //if the reading character of T1 is not one of the characters of P. goto state 2

// finish the reading character from T1

end if

//if the xth bit of Pre_bit is 1, x-suffix of T1 is equal to the x-prefix of P

if (Pre_bit[z] == 1) then

/* if we found a suffix of T1 is equal to the a prefix of P, we need to check the whether th

e corresponding prefix of T2 appeared in the Suf_bit pattern. */

if ( Pre_bit[z] & Suf_bit[|P|-z]) Found a matching at km – y

end if

end if

z++

Read next character

end if

state 2:Fig :: Find out the corresponding prefixes of P which are the suffix of T1 .

38

Example: T = aababcbdc P = abcbd

Let us produce a wide windows where length is |P| - 1 + |P| = 2|P| - 1

In this case, |P|=5 , 2|P| - 1 = 9

a a b a b c b d cT =

|P|-1 |P|

T2T1

39

PreprocessingBuild character bit pattern of PP = abcbdFind all bit patterns of P, P is composed of a, b, c, b, d.The character set of P = {a, b, c, b, d} a_bit = 1 0 0 0 0 b_bit = 0 1 0 1 0 c_bit = 0 0 1 0 0 d_bit = 0 0 0 0 1

Having constructed the character bit pattern of P, we may use the character bit pattern of P to find whether the prefix of T2 is equal to the suffix of P.

40

Having constructed the character bit pattern of P, we may use the character bit pattern of P to find whether the prefix of T2 is equal to the suffix of P.

Build character bit pattern of reversed P

P = abcbd Find all character bit patterns of reversed P, P is composed of a,

b, c, b, d. The character set of P = {a, b, c, b, d} a_rbit = 0 0 0 0 1 b_rbit = 0 1 0 1 0 c_rbit = 0 0 1 0 0 d_rbit = 1 0 0 0 0

41

the character is ‘b’ ∴ Suf_bit[5...1] = Suf_bit[5...1] & b_bit[5...1]

∵ the last bit is ‘0’, no1-suffix of T2

is equal to 1-prefix of P

Suf_bit [5…1] = 0 1 0 1 0

2T

P= a b c b d

= c d b c b

0 1 0 1 0

0 0 1 0 0

0 1 0 1 0

& 0 0 0 0 1

0 0 1 0 0

0 0 0 0 0 1 0 0 0

= 1 1 1 1 1 & 0 1 0 1 0= 0 1 0 1 0

Step 1 T2 =b c b d c

P a b c b dT2 b c b d c

0

42

the character is ‘c’ ∴ Suf_bit[5...2] = Suf_bit[5…2] & c_bit[4…1]

Suf_bit [5…1] = 0 1 0 0 0

2T

P= a b c b d

= c d b c b

0 1 0 1 0

0 0 1 0 0

0 1 0 1 0

& 0 0 0 0 1

0 0 1 0 0

0 0 0 0 0 1 0 0 0

= 0 1 0 1 & 0 1 0 0= 0 1 0 0

Step 2 T2 =b c b d c

P a b c b dT2 b c b d c

1 0

∵ the last bit is ‘0’, no2-suffix of T2

is equal to 2-prefix of P

43

the character is ‘b’ ∴ Suf_bit[5...3] = Suf_bit[5...3] & b_bit[3...1]

Suf_bit [5…1] = 0 1 0 0 0

2T

P= a b c b d

= c d b c b

0 1 0 1 0

0 0 1 0 0

0 1 0 1 0

& 0 0 0 0 1

0 0 1 0 0

0 0 0 0 0 1 0 0 0

= 0 1 0 & 0 1 0= 0 1 0

Step 3 T2 =b c b d c

P a b c b dT2 b c b d c

0 0 0

∵ the last bit is ‘0’, no 3-suffix of T2

is equal to 3-prefix of P

44

the character is ‘d’

∴ Suf_bit[5...4] = Suf_bit[5…4] & c_bit[2...1]

Suf_bit [5…1] = 0 1 0 0 0

2T

P= a b c b d

= c d b c b

0 1 0 1 0

0 0 1 0 0

0 1 0 1 0

& 0 0 0 0 1

0 0 1 0 0

0 0 0 0 0 1 0 0 0

= 0 1 & 0 1= 0 1

Step 4 T2 =b c b d c

P a b c b dT2 b c b d c

1 1 1 1

∵ the last bit is ‘1’, 4-suffix of T2

is equal to 4-prefix of P

45

We have found one suffix which is 4-suffix. The corresponding prefix which we need to find is (|P|-4)-prefix.If we found, we got a matching.

the character is ‘c’ ∴ Suf_bit[5...5] = Suf_bit[5...5] & c_bit[1...1]

∴ Suf-bit [5…1] = 0 1 0 0 0

2T

P= a b c b d

= c d b c b

0 1 0 1 0

0 0 1 0 0

0 1 0 1 0

& 0 0 0 0 1

0 0 1 0 0

0 0 0 0 0 1 0 0 0

= 0 & 0= 0

Step 5 T2 =b c b d c

P a b c b dT2 b c b d c

0 0 0 0 0

∵ the last bit is ‘0’, no5-suffix of T2

is equal to 5-prefix of P

46

Having constructed the character bit pattern of reversed P, we may use the character bit pattern of reversed P to find whether the suffix of T1 is equal to the prefix of P.

47

Pre_bit[4...1] = Pre_bit[4...1] & a_rbit[4...1]

if (Pre_bit[1] == 1) then if ( Pre_bit[1] & Suf_bit[5-1])

Found a matching

∴ Suf-bit [5…1] = 0 1 0 0 0

∴ Pre-bit [4…1] = 0 0 0 1

Check |prefix|+|suffix| = |P| ?

p = d b c b aT1= a a b a

0 0 0 0 10 1 0 1 0

0 0 0 0 1& 0 0 0 0 1

0 0 0 0 0 0 0 1

= 1 1 1 1 & 0 0 0 1= 0 0 0 1

Step 1 T1 =a a b a

P a b c b dT2 a a b a

1

∵ the last bit is ‘0’, 1-prefix of T1

is equal to 1-suffix of P

48

Pre_bit[4…2] = Pre_bit[4...2] & b_rbit[3...1]

if (Pre_bit[2] == 1) then if ( Pre_bit[2] & Suf_bit[5-2]) no need to check

Pre-bit[4..1] = 0 0 0 1

= d b c b aT1= a a b a

0 0 0 0 10 1 0 1 0

0 0 0 0 1& 0 0 0 0 1

0 0 0 0 0 0 0 1

= 0 0 0 & 0 0 0 1= 0 0 0

Step 2 T1 =a a b a

p

P a b c b dT2 a a b a

0 0

∵ the last bit is ‘0’, no 2-prefix of T1

is equal to 2-suffix of P

49

Pre_bit[4...3] = Pre_bit[4...3] & a_rbit[2...1]

if (Pre_bit[3] == 1) then if ( Pre_bit[3] & Suf_bit[5-3]) no need to check

Pre-bit[4..1] = 0 0 0 1

= d b c b aT1= a a b a

0 0 0 0 10 1 0 1 0

0 0 0 0 1& 0 0 0 0 1

0 0 0 0 0 0 0 1

p

= 0 0 & 0 1= 0 0

Step 3 T1 =a a b a

P a b c b dT2 a a b a

1 1 0

∵ the last bit is ‘0’, no 3-prefix of T1

is equal to 3-suffix of P

50

Pre_bit[4...4] = Pre_bit[4…4] & a_rbit[1…1]

if (Pre-bit[4] == 0) then if ( Pre-bit[4] & Suf-bit[5-4]) no need to check

Pre-bit[4..1] = 0 0 0 1

= d b c b aT1= a a b a

0 0 0 0 10 1 0 1 0

0 0 0 0 1& 0 0 0 0 1

0 0 0 0 0 0 0 1

p

= 0 & 0= 0

Step 3 T1 =a a b a

∵ the last bit is ‘0’, no 3-prefix of T1

is equal to 3-suffix of PP a b c b dT1 a a b a

1 0 0 0

51

References

[1] Simple optimal string matching algorithm, C. Allauzen, M. Raffinot, J. Algorithms 36 (1) (2000) 102–116.

[2] A new approach to text searching, R. Baeza-Yates, G.H. Gonnet, Comm. ACM 35 (10) (1992) 74–82.

[3] A fast string searching algorithm, R.S. Boyer, J.S. Moore, Comm. ACM 20 (10) (1977) 62–72.

[4] Handbook of Exact String Matching Algorithms, C. Charras, T. Lecroq, King’s College London Publications, 2004.

[5] A very fast string matching algorithm for small alphabets and long Patterns, C. Charras, T. Lecroq, J.D. Pehoushek,, in: M. Farach-Colton (Ed.), Proc. of the 9thAnn.Symp. on Combinatorial Pattern Matching, Lecture Notes in Computer Science, Vol. 1448, Springer, Piscataway, NJ, USA, 1998, pp. 55–64.

[6] Transducers and repetitions, M. Crochemore, Theoret. Comput. Sci. 45 (1) (1986) 63–86.

52

[7] Off-line serial exact string searching, M. Crochemore, in: A. Apostolico, Z. Galil (Eds.), Pattern Matching Algorithms, OxfordUni versity Press, Oxford, 1997, pp. 1–53, (Chapter 1).

[8] Reducing space for index implementation, M. Crochemore, Theoret. Comput. Sci. 292 (1) (2003) 185–197.

[9] Speeding up two string-matching algorithms, M. Crochemore, A. Czumaj, L. Gasieniec, S. Jarominek, T. Lecroq,W. Plandowski,W. Rytter, Algorithmica 12 (4/5) (1994) 247–267.

[10] Automata for matching patterns, M. Crochemore, C. Hancart, in: G. Rozenberg,A. Salomaa (Eds.), Handbook of Formal Languages, Vol. 2, Linear Modeling: Background and Application, Springer, Berlin, 1997, pp. 399–462 (Chapter 9).

[11] Text algorithms, M. Crochemore,W. Rytter, OxfordUniversity Press, Oxford, 1994, 412pp.

53

[12] Jewels of Stringology, M. Crochemore,W. Rytter, WorldScientific, Singapore, 2002.

[13] Linear nondeterministic dawg string matching algorithm, L. He, B. Fang, in: A. Alberto, M. Massimo (Eds.), String Processing andInformation Retrieval, 11th Internat. Symp. (SPIRE 2004), Lecture Notes in Computer Science, Vol. 3246, Springer, Padova, Italy, 2004, pp. 70–71.

[14] Fast string searching, A. Hume, D. Sunday, Software Pract. Exper. 21 (11) (1991) 1221–1248.

[15] Fast pattern matching in strings, D.E. Knuth, J.H. Morris, V.R. Pratt, SIAM J. Comput. 6 (2) (1977)323–350.

[16] A variation on the Boyer–Moore algorithm, T. Lecroq, Theoret. Comput. Sci. 92 (1) (1992) 119–144.

54

[17] Fast and flexible string matching by combining bit-parallelism and suffix automata, G. Navarro, M. Raffinot, ACM J. Exp. Algorithmics (JEA) 5 (4) (2000) 1–36.

[18] Flexible Pattern Matching in Strings—Practical On-line Search Algorithms for Texts and Biological Sequences, G. Navarro, M. Raffinot, Cambridge University Press, Cambridge, 2002.

[19] Alternative algorithms for bit-parallel string matching, H. Peltola, J. Tarhio, in: M.A. Nascimento, E.S. de Moura, A.L. Oliveira (Eds.), Proc. 10th Internat. Symp. on String Processing and Information Retrieval (SPIRE’03), Lecture Notes in Computer Science, Vol. 2857, Springer, Manaus, Brazil, 2003, pp. 80–94.

[20] Asymptotic estimation of the average number of terminal states in dawgs, M. Raffinot, in: R. Baeza-Yates (Ed.), Proc. 4th SouthAmericanWorkshop on String Processing, Carleton University Press,Valparaiso, Chile, 1997, pp. 140–148.

[21] On the multi backward dawg matching algorithm (MultiBDM), M. Raffinot, in: R. Baeza-Yates (Ed.), Proc.4th South American Workshop on String Processing, Carleton University Press, Valparaiso, Chile, 1997, pp. 149–165.

55

[22] Computing Patterns in Strings, W.F. Smyth, Pearson AddisonWesley, 2003.

[23] A very fast substring search algorithm, D.M. Sunday, Comm. ACM 33 (8) (1990) 132–142.

[24] Average case analysis of the boyer–moore algorithm, T.-H. Tsai, in:http://www.stat.sinica.edu.tw/chonghi/stat.htm, 2003.

[25] The complexity of pattern matching for a random string, A.C.C. Yao, SIAM J. Comput. 8 (3) (1979) 368–387.

56

Thank you