Speaker: L. C. Chen Advisor: R. C. T. Lee
description
Transcript of Speaker: L. C. Chen Advisor: R. C. T. Lee
![Page 1: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/1.jpg)
1
Approximate string matching using factor automata
J. Holub and B. MelicharTheoretical Computer Science vol.249 p.30
5-311
Speaker: L. C. Chen
Advisor: R. C. T. Lee
![Page 2: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/2.jpg)
2
Problem
• DL(P, X) between strings P and X is the minimum number of edit operations (substitution, insertion and deletion) needed to convert string P to X.
• Given a text T, a pattern P, and an integer k, k≦m≦n, approximate string matching can be defined as determining whether string X occurs in text T such that edit distance DL(P, X) between pattern P and string X is less than or equal to k.
![Page 3: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/3.jpg)
3
An example of Edit Distance
To convert P into T: P = abcde T = bcfeg
P = abcde T = bcfeg
P1 = bcde P2 = bcfef
gDelete a Substitute d
with fInsert
![Page 4: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/4.jpg)
4
Basic definition
• Fac(T): a set contains all the substrings of text T.• A nondeterministic finite automaton (NFA) is a five-t
uple M=(Q, Σ, δ, q0 , F), where Q is a finite set of states, Σ is a finite input alphabet, δ is a mapping from Q×(Σ {ε}) into the set of subsets of ∪ Q, q0 Q is an initial state, and F Q is a set of final states.
• M(Fac(T)): a factor automaton accepts Fac(T).
![Page 5: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/5.jpg)
5
1 2 3 40 5 6 7
8
9
d
a a a
a
b b
b b b
bb
d
d
d
T=aabbabdFac(T)={a,b,d,aa,ab,bb,ba,bd,aab,abb,bba,bab,abd,aabb,abba,bbab,babd aabba,abbab,bbabd,aabbab,abbabd,aabbabd}
Factor automaton
Factor automation M(Fac(T)): a deterministic finite automaton (DFA) accepts all substrings of the given text T.
![Page 6: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/6.jpg)
6
a
d
b
roota
b
b
a
b
d
b
b
b
a
d
a
b
d
b
a
b
d
d
A suffix tree can also be used to recognize all substrings of T=aabbabd,Fac(T)={a,b,d,aa,ab,bb,ba,bd,aab,abb,bba,bab,abd,aabb,abba,bbab,babd aabba,abbab,bbabd,aabbab,abbabd,aabbabd}
![Page 7: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/7.jpg)
7
3,0
3,1a
b 2,01,00,0
2,11,1
b
b
a
P = bab, k=1.The finite automaton M(Lk(P)) accepts Lk(P).Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}.
b baa b
}),(,|{)( kYPDYYPL Lk One matched, 0 error.
One matched, one error.
Three matched, 0 error.
![Page 8: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/8.jpg)
8
3,0
3,1a
b2,01,00,0
2,11,1
b
b
a
b baa b
P = bab, k=1.The finite automaton M(Lk(P)) accepts Lk(P).Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}.
Recognize ab
![Page 9: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/9.jpg)
9
3,0
3,1a
b2,01,00,0
2,11,1
b
b
a
b ba
a b
P = bab, k=1.The finite automaton M(Lk(P)) accepts Lk(P).Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}.
Recognize aab
![Page 10: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/10.jpg)
10
3,0
3,1a
b2,01,00,0
2,11,1
b
b
a
b baa b
P = bab, k=1.The finite automaton M(Lk(P)) accepts Lk(P).Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}.
Recognize bbab
![Page 11: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/11.jpg)
11
Definition
• Let
An automaton for intersection of M1 and M2 is an automaton
).,,,,( and ),,,,( 202222101111 FqΣQMFqΣQM
. and ,)],,(),,([)],,([ where
),],,[,,,(
221122110201
21020121
aQqQqaqaqaqq
FFqqΣQQM
![Page 12: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/12.jpg)
12
da
b{0,0;1,1};0 {1,0;2,1};9
{1,1};7
{1,1;2,1};1
{2,0;3,1};5
{1,1;2,1;3,1};4
{1,1;2,1};7
{2,1};2
{3,1};8
{3,1};3
{3,0};6
{2,1};2 {3,1};6
a
a
ab
b
b
b
b
d
T=aabbabdP = bab, k=1
1 2 3 40 5 6 7
8
9
d
a a a
a
b b
b b b
bb
d
d
d
Intersection of M(Lk(P)) and M(Fac(T)).
3,0
3,1a
b2,01,00,0
2,11,1
b
b
a
b baa b
Solutions : {ba, bab, bb, bbab, aab, ab} (All end with {3,0} or {3,1}.)
![Page 13: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/13.jpg)
13
da
b{0,0;1,1};0 {1,0;2,1};9
{1,1};7
{1,1;2,1};1
{2,0;3,1};5
{1,1;2,1;3,1};4
{1,1;2,1};7
{2,1};2
{3,1};8
{3,1};3
{3,0};6
{2,1};2 {3,1};6
a
a
ab
b
b
b
b
d
T=aabbabdP = bab, k=1
1 2 3 40 5 6 7
8
9
d
a a a
a
b b
b b b
bb
d
d
d
Intersection of M(Lk(P)) and M(Fac(T)).
3,0
3,1a
b2,01,00,0
2,11,1
b
b
a
![Page 14: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/14.jpg)
14
Intersection
da
b{0,0;1,1};0 {1,0;2,1};9
{1,1};7
{1,1;2,1};1
{2,0;3,1};5
{1,1;2,1;3,1};4
{1,1;2,1};7
{2,1};2
{3,1};8
{3,1};3
{3,0};6
{2,1};2 {3,1};6
a
a
ab
b
b
b
b
d
a a b b a b dT
DL(P,ba)=1P=bab
![Page 15: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/15.jpg)
15
Intersection
da
b{0,0;1,1};0 {1,0;2,1};9
{1,1};7
{1,1;2,1};1
{2,0;3,1};5
{1,1;2,1;3,1};4
{1,1;2,1};7
{2,1};2
{3,1};8
{3,1};3
{3,0};6
{2,1};2 {3,1};6
a
a
ab
b
b
b
b
d
a a b b a b dT
DL(P,bab)=0P=bab
![Page 16: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/16.jpg)
16
Intersection
da
b{0,0;1,1};0 {1,0;2,1};9
{1,1};7
{1,1;2,1};1
{2,0;3,1};5
{1,1;2,1;3,1};4
{1,1;2,1};7
{2,1};2
{3,1};8
{3,1};3
{3,0};6
{2,1};2 {3,1};6
a
a
ab
b
b
b
b
d
a a b b a b dT
PP=bab DL(P,bb)=1
![Page 17: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/17.jpg)
17
Intersection
da
b{0,0;1,1};0 {1,0;2,1};9
{1,1};7
{1,1;2,1};1
{2,0;3,1};5
{1,1;2,1;3,1};4
{1,1;2,1};7
{2,1};2
{3,1};8
{3,1};3
{3,0};6
{2,1};2 {3,1};6
a
a
ab
b
b
b
b
d
a a b b a b dT
P=bab DL(P,bbab)=1
![Page 18: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/18.jpg)
18
Intersection
da
b{0,0;1,1};0 {1,0;2,1};9
{1,1};7
{1,1;2,1};1
{2,0;3,1};5
{1,1;2,1;3,1};4
{1,1;2,1};7
{2,1};2
{3,1};8
{3,1};3
{3,0};6
{2,1};2 {3,1};6
a
a
ab
b
b
b
b
d
a a b b a b dT
P=bab DL(P,aab)=1
![Page 19: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/19.jpg)
19
Intersection
da
b{0,0;1,1};0 {1,0;2,1};9
{1,1};7
{1,1;2,1};1
{2,0;3,1};5
{1,1;2,1;3,1};4
{1,1;2,1};7
{2,1};2
{3,1};8
{3,1};3
{3,0};6
{2,1};2 {3,1};6
a
a
ab
b
b
b
b
d
a a b b a b dT
P=bab DL(P,ab)=1
![Page 20: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/20.jpg)
20
Lemma
• The number of automaton is always lower than
. !1)2( 1 kk m-k
![Page 21: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/21.jpg)
21
a
d
b
roota
b
b
a
b
d
b
b
b
a
d
a
b
d
b
a
b
d
d
T=aabbabd P = bab, k=1.The finite automaton M(Lk(P)) accepts Lk(P).Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}.
![Page 22: Speaker: L. C. Chen Advisor: R. C. T. Lee](https://reader036.fdocuments.net/reader036/viewer/2022062305/56815a91550346895dc80808/html5/thumbnails/22.jpg)
22
Thank you!