Less Than Matching

29
Orgad Keller Modified by Ariel Rosenfeld Less Than Matching

description

Less Than Matching. Orgad Keller Modified by Ariel Rosenfeld. Less Than Matching. Input: A text , a pattern over alphabet with order relation . Output: All locations where Can we use the regular methods?. Transitivity. - PowerPoint PPT Presentation

Transcript of Less Than Matching

Page 1: Less Than Matching

Orgad Keller

Modified by Ariel Rosenfeld

Less Than Matching

Page 2: Less Than Matching

Algorithms 2 2

Less Than Matching

Input: A text , a pattern

over alphabet with order relation . Output: All locations where

Can we use the regular methods?

i

0 1, j i jj m p t

0 1... nT t t 0 1... mP p p

i jt

jp

iT

P

Page 3: Less Than Matching

Algorithms 2 3

Transitivity

Less Than Matching is in fact transitive, but that is not enough for us:

does not imply anything about the relation between and .

,a c b c a b

Page 4: Less Than Matching

Algorithms 2 4

Approach

A good approach for solving Pattern Matching problems is sometimes solving:The problem for a binary alphabet .The problem for a bounded alphabet .The problem for an ubounded alphabet .

In that order.

0,1

Page 5: Less Than Matching

Algorithms 2 5

Binary Alphabet

The only case that prevents a match at location is the case where:

This is equivalent to:

So how can we solve this case?

0 1, 1 0j i jj m p t

i

0 1, 1 1j i jj m p t

Page 6: Less Than Matching

Algorithms 2 6

Binary Alphabet

So if , there is no match at .

We can calculate Then we’ll calculate (P reverse) using FFT.We’ll return all locations where

1

0

0m

j i jj

p t

i

0 1... nT t t RT P

( )[ 1] 0RT P i m i

Page 7: Less Than Matching

Example

P=0101 T=0101001110

PR = 1010 T! = 1010110001

Algorithms 2 7

Page 8: Less Than Matching

Algorithms 2 8

RT P

Page 9: Less Than Matching

Algorithms 2 9

P=0101T=0101001110

Page 10: Less Than Matching

What just happened?

Algorithms 2 10

T= !

PR=

Page 11: Less Than Matching

Complexity

Time:

Algorithms 2 11

( log )O n m

Page 12: Less Than Matching

Algorithms 2 12

Bounded Alphabet

We need reductions to binary alphabet. For each we’ll define:

We notice are binary.

0 1

1

0

...

ii

i

n

tt

t

T t t

0 1

1

0

...

ii

i

m

pp

p

P p p

,T P

Page 13: Less Than Matching

Algorithms 2 13

Bounded Alphabet

Theorem: (less than) matches at location if and only if , (less than) matches at location .

Proof: does not match at iff .

that is true iff , meaning that does not (less than) match at location .

PP T

iT i

P T i, j i jj p t

1 0j i jp t

P

iT

Page 14: Less Than Matching

Algorithms 2 14

Bounded Alphabet

So for each , we’ll run the binary alphabet algorithm on .

We’ll return only the locations that matched in all iterations.

Time:

,T P

(min , log )O m n m

Page 15: Less Than Matching

Algorithms 2 15

Problem

Can be worse than the naïve algorithm. What about unbounded alphabet? We present an improvement on the next

slides.

(min , log )O m n m

Page 16: Less Than Matching

16

The Trick

We’ll split the text into overlapping segments of size like this:

So every match in the text must appear in whole in one of the segments.

n

m

n

m2m

2m 2m 2m 2m 2m 2m

2m2m 2m 2m 2m 2m

2m

Page 17: Less Than Matching

Algorithms 2 17

First, use the segment splitting trick. Therefore we can assume .

For each location in text, we’ll produce a triplet: , where .

For each location in pattern, we’ll produce a triplet: , where .

We now have triplets all together.

Abrahamson-Kosaraju Method

2T m

( , ' ', )a T ii

ip bi

( , ' ', )b P i

3m

it a

Page 18: Less Than Matching

Algorithms 2 18

Abrahamson-Kosaraju Method

We’ll hold all triplets together. Sort all triplets according to symbol. We’ll define a symbol that has more than

triplets as a “frequent symbol”. There are frequent symbols. Put all frequent symbols’ triplets aside.

m

( )O m

Page 19: Less Than Matching

Algorithms 2 19

Abrahamson-Kosaraju Method

Split non-frequent symbols’ triplets to groups of size in the following manner:

2m S m

2 1

3 2

Group 1

1 3

2 4

( , ' ', 4), ( , ' ',7),..., ( , ' ',300) , ( , ' ',3),..., ( , ' ', 200) ,

( , ' ',5),..., ( , ' ',1000) , ( , ' ',5),..., ( , ' ',150)

m m

m m

a T a T a P b T b T

d P d T g P g T

Group 2

,...

Page 20: Less Than Matching

Algorithms 2 20

Abrahamson-Kosaraju Method

The rule is that there can’t be two triplets of the same symbol in different groups.

2 1

3 2

Group 1

1 3

2 4

( , ' ', 4), ( , ' ',7),..., ( , ' ',300) , ( , ' ',3),..., ( , ' ', 200) ,

( , ' ',5),..., ( , ' ',1000) , ( , ' ',5),..., ( , ' ',150)

m m

m m

a T a T a P b T b T

d P d T g P g T

Group 2

,...

Page 21: Less Than Matching

Algorithms 2 21

Abrahamson-Kosaraju Method

For each such group, choose the symbol of the first triplet in group as the group’s representative.

For instance, on previous example, group 1’s representative is and group 2’s representative is .

There are representatives all together.

ad

( )O m

Page 22: Less Than Matching

Algorithms 2 22

Abrahamson-Kosaraju Method

To sum up: frequent symbols. representatives of non-frequent

symbols. We’ll swap each non-frequent symbol in

pattern and text with its representative. Now our text and pattern are over

sized alphabet.

( )O m

( )O m

( )O m

Page 23: Less Than Matching

Algorithms 2 23

Abrahamson-Kosaraju Method

We want to run our algorithm over the new text and pattern to count the mismatches between symbols of different groups.

But we have a problem:Let’s say is a frequent symbol, but:

1 3

2 4

Group 2

..., ( , ' ',5),..., ( , ' ',1000) , ( , ' ',5),..., ( , ' ',150) ,...

m m

d P d T g P g T

f

Page 24: Less Than Matching

Algorithms 2 24

Abrahamson-Kosaraju Method

The representative of group 2 is , which is smaller than , but the group also contains which is greater than .

1 3

2 4

Group 2

..., ( , ' ',5),..., ( , ' ',1000) , ( , ' ',5),..., ( , ' ',150) ,...

m m

d P d T g P g T

ff

d

g

Page 25: Less Than Matching

Algorithms 2 25

Abrahamson-Kosaraju Method

In that case we’ll split group 2 to two groups with their own representatives.

Since we performed at most such splits, we still have representatives.

1 3

2 4

Group 2.1 Group 2.2

..., ( , ' ',5),..., ( , ' ',1000) , ( , ' ',5),..., ( , ' ',150) ,...

m m

d P d T g P g T

( )O m

( )O m

Page 26: Less Than Matching

Algorithms 2 26

Abrahamson-Kosaraju Method

We can now run our algorithm over the new text and pattern in .

But we still haven’t handled comparisons between two non-frequent symbols that are in the same group.

( log )O mm m

Page 27: Less Than Matching

Algorithms 2 27

Abrahamson-Kosaraju Method

We’ll do so naively in each group:For each triplet in the group

For each triplet of the form in the group, if , then add an error at location

.

Time: ( )O m m

( , ' ', )P j ( , ' ', )T k

i k j

ktjp

iT

P

j kp t

i j

Page 28: Less Than Matching

Algorithms 2 28

Running Time

For one segment:Sorting the triplets and representatives:

.Running the algorithm: .Correcting results (Adding in-group errors):

. Overall for one segment: . Overall for all segments: .

( log )O m m

( log )O mm m

( )O m m

( log )O m m m

( log )O n m m

Page 29: Less Than Matching

Algorithms 2 29

Running Time

We can improve to .Left as an exercise.

( log )O n m m