Special lecture on Information Knowledge Network
Transcript of Special lecture on Information Knowledge Network
Special lecture on Information Knowledge Network-Information retrieval and pattern matching-
The 4th Approximate string matching
Takuya kidaIKN Laboratory,
Division of Computer Science and Information Technology
2018/11/22Special lecture on IKN
Today’s contents
What is the approximation pattern matching?
Dynamic programming approach
NFA-base approachBit parallel simulation (BPR, BPD)
Filtering approachPattern division method (PEX)NFA method (ABNDM)
2
Let’s talk with an intelligent computer!
That’s right!Vermeer!
Both of them werefrom the Netherlands,
weren’t them?
By the way.Did you know Tera-sgees
who was in the same era?
Yes!Rembrandt! I remember!…And who that painter?
He drew many genre paintings in the same era.
Very popular in Japan
Er…,Not Wermer…
Who was that painter in the Baroque era?
He drew a famous picture called
“The Nightwatch”…
Er…,Certainly, was he …
Rembbright?
Velázquez!ヽ(`Д´)ノ #
…Perhaps,Vermeer?It's Rembrandt
Let’s talk with an intelligent computer!
That’s right!Vermeer!
Both of them werefrom the Netherlands,
weren’t them?
By the way.Did you know Tera-sgees
who was in the same era?
Yes!Rembrandt! I remember!…And who that painter?
He drew many genre paintings in the same era.
Very popular in Japan
Er…,Not Wermer…
Velázquez!ヽ(`Д´)ノ #
…Perhaps,Vermeer?
Let’s talk with an intelligent computer!
That’s right!Vermeer!
Both of them werefrom the Netherlands,
weren’t them?
By the way.Did you know Tera-sgees
who was in the same era?
Velázquez!ヽ(`Д´)ノ #
What is the approximation pattern matching?
It is the problem to find positions of substrings in a given text where its edit distance with a given pattern is less than or equal to 𝑘𝑘
Edit distance ed 𝑥𝑥,𝑦𝑦 is defined as the minimum cost 𝑑𝑑 for translating string 𝑥𝑥 into string 𝑦𝑦 with character edit operations: insertion, deletion, and substitution.
MARRIAGE
MASSAGE
CARRIAGE
𝑘𝑘 = 2
𝑑𝑑 = 1
𝑑𝑑 = 3
MARRIAGE
MASS AGE
deletesubstitute
OK
Bad
0 < k < m
ed(MARRIAGE, MASSAGE)=3
Edit distanceHow much do two strings look like?
similarity ⇔ edit distance between strings (dissimilarity)
Variation of edit distanceLevenshtein distance :The costs of all operations are equal to 1.Hamming distance :Only substitution is allowed.Weighted-cost edit distance
:The cost of each operation may differ.Unrestricted-cost edit distance
:The cost is different at each character pair.Damerau distance :The character transposition is also permitted.Indel distance :Substitution is not allowed.
insertion + deletion = indel(from Heikki Hyyrö [SOFSEM2005])
Hereafter, we mainly treat with Levenshtein distance
Application examples
Calculating the similarity between DNAs
Spell checker / Searching with ambiguityOrthographic variation: Carpaccio ⇔ Caravaggio
Retrieval of similar sentencesAn advanced retrieval can be realized by combining with natural language processingA sentence = a sequence of morphemes ≒ a string
Similar music retrievalFinding a similar phrase on MIDI dataRetrieval with humming recognition
Search on OCR dataOCRed data often contains mistakes
Applications to real data miningWeb mining using approximate string matching algorithms (T. Nakato, Kyushu Univ.)
Searching with thesaurus is another related topic
Consider each morpheme as a meta character
Using Dynamic Time Warping (DTW)
Dynamic programming approach
The way of calculating edit distance based on dynamic programming (DP) has been known in the 1960’s. However, the well-known algorithm for pattern matching is shown by Sellers in 1980.
P. H. Sellers, The theory and computation of evolutionary distances: Pattern recognition. Journal of Algorithms, 1(4):359-373,1980.H. Sakoe and S. Chiba, A Dynamic Programming Algorithm Optimization for Spoken Word Recognition, IEEE Trans. on Acoust., Speech and Signal Proc., Vol. ASSP- 26, No. 1, pp. 43-49, 1978.
How to calculate ed(𝑥𝑥,𝑦𝑦):Let 𝑀𝑀𝑖𝑖,𝑗𝑗 = ed(𝑥𝑥 1: 𝑖𝑖 ,𝑦𝑦 1: 𝑗𝑗 ). Then,
𝑀𝑀0,0 ← 0,𝑀𝑀𝑖𝑖,𝑗𝑗 ← min 𝑀𝑀𝑖𝑖−1,𝑗𝑗−1 + 𝛿𝛿 𝑥𝑥 𝑖𝑖 ,𝑦𝑦 𝑗𝑗 ,𝑀𝑀𝑖𝑖−1,𝑗𝑗 + 1,𝑀𝑀𝑖𝑖,𝑗𝑗−1 + 1 .
where 𝛿𝛿 𝑎𝑎, 𝑏𝑏 is defined as 0 if 𝑎𝑎 = 𝑏𝑏, otherwise 1.Efficient recursive formulas for doing the same calculations are:
𝑀𝑀𝑖𝑖,𝑗𝑗 ←𝑀𝑀𝑖𝑖,0 ← 𝑖𝑖, 𝑀𝑀0,𝑗𝑗 ← 𝑗𝑗𝑀𝑀𝑖𝑖−1,𝑗𝑗−1 (if 𝑥𝑥 𝑖𝑖 = 𝑦𝑦 𝑗𝑗 )1 + min 𝑀𝑀𝑖𝑖−1,𝑗𝑗−1,𝑀𝑀𝑖𝑖−1,𝑗𝑗 ,𝑀𝑀𝑖𝑖,𝑗𝑗−1 (otherwise)
i.e., 𝑀𝑀|𝑥𝑥|,|𝑦𝑦| = ed(𝑥𝑥,𝑦𝑦)
Why can we do correct calculation?
Prove by induction. Let 𝑀𝑀0,0 = 0 be for two empty strings. Now we want to obtain ed 𝑥𝑥[1: 𝑖𝑖],𝑦𝑦[1: 𝑗𝑗] = 𝑀𝑀𝑖𝑖,𝑗𝑗. Assume that we have ed 𝑥𝑥[1: 𝑖𝑖′],𝑦𝑦[1: 𝑗𝑗′] for any 𝑖𝑖′ < 𝑖𝑖 and 𝑗𝑗′ < 𝑗𝑗. Then, we consider the cost for translating 𝑥𝑥[1: 𝑖𝑖] into 𝑦𝑦[1: 𝑗𝑗].If 𝑥𝑥 𝑖𝑖 = 𝑦𝑦[𝑗𝑗], then we can simply transform 𝑥𝑥[1: 𝑖𝑖 − 1] into 𝑦𝑦[1: 𝑗𝑗 − 1] with the minimum cost 𝑀𝑀𝑖𝑖−1,𝑗𝑗−1. In this case, it holds 𝑀𝑀𝑖𝑖,𝑗𝑗 = 𝑀𝑀𝑖𝑖−1,𝑗𝑗−1.If 𝑥𝑥[𝑖𝑖] ≠ 𝑦𝑦[𝑗𝑗], then we have three cases:
substituting 𝑥𝑥[𝑖𝑖] with 𝑦𝑦[𝑗𝑗], and change 𝑥𝑥[1: 𝑖𝑖 − 1] to 𝑦𝑦[1: 𝑗𝑗 − 1] with cost 𝑀𝑀𝑖𝑖−1,𝑗𝑗−1deleting 𝑥𝑥[𝑖𝑖], and change 𝑥𝑥 1: 𝑖𝑖 − 1 to 𝑦𝑦[1: 𝑗𝑗] with cost 𝑀𝑀𝑖𝑖−1,𝑗𝑗inserting 𝑦𝑦[𝑖𝑖] at the end of 𝑥𝑥[1: 𝑖𝑖], and change 𝑥𝑥[1: 𝑖𝑖] to 𝑦𝑦[1: 𝑗𝑗 − 1] with cost 𝑀𝑀𝑖𝑖,𝑗𝑗−1
We choose the minimum one among the above.
deletion
𝑥𝑥[1: 𝑖𝑖– 1]
𝑦𝑦[1: 𝑗𝑗– 1] 𝑦𝑦[𝑗𝑗]
𝑀𝑀𝑖𝑖−1,𝑗𝑗
𝑥𝑥[𝑖𝑖]+1
substitution
𝑥𝑥[1: 𝑖𝑖– 1] 𝑥𝑥[𝑖𝑖]
𝑦𝑦[1: 𝑗𝑗– 1] 𝑦𝑦[𝑗𝑗]
𝑀𝑀𝑖𝑖−1,𝑗𝑗−1 +1
insertion
𝑥𝑥[1: 𝑖𝑖– 1]
𝑦𝑦[1: 𝑗𝑗– 1] 𝑦𝑦[𝑗𝑗]
𝑥𝑥[𝑖𝑖]
+1𝑀𝑀𝑖𝑖,𝑗𝑗−1
𝑀𝑀𝑖𝑖−1,𝑗𝑗−1
𝑀𝑀𝑖𝑖,𝑗𝑗−1 𝑀𝑀𝑖𝑖,𝑗𝑗
𝑀𝑀𝑖𝑖−1,𝑗𝑗
+𝛿𝛿(𝑥𝑥[𝑖𝑖],𝑦𝑦[𝑗𝑗]) +1
+1
How to detect the pattern occurrences
a n n e a l i n g0 1 2 3 4 5 6 7 8 9
a 1 0 1 2 3 4 5 6 7 8n 2 1 0 1 2 3 4 5 6 7n 3 2 1 0 1 2 3 4 5 6u 4 3 2 1 1 2 3 4 5 6a 5 4 3 2 2 1 2 3 4 5l 6 5 4 3 3 2 1 2 3 4
𝑀𝑀𝑖𝑖,𝑗𝑗 for ed(annual, annealing)
𝑀𝑀𝑥𝑥 , 𝑦𝑦 = ed(annual, annealing) = 4
𝑀𝑀0,0 a n n e a l i n g0 0 0 0 0 0 0 0 0 0
a 1 0 1 1 1 0 1 1 1 1n 2 1 0 1 2 1 1 2 1 2n 3 2 1 0 1 2 2 2 2 2u 4 3 2 1 1 2 3 3 3 3a 5 4 3 2 2 1 2 3 4 4l 6 5 4 3 3 2 1 2 3 4
Approximate string matching for𝑃𝑃 =annual, 𝑇𝑇 =annealing, 𝑘𝑘 = 2
For any 𝑗𝑗 = 0 …𝑛𝑛, all that we have to do is to set 𝑀𝑀0,𝑗𝑗 = 0
This means that empty string 𝜀𝜀 matches at anywhere in a given text with 0 error
𝑂𝑂(𝑚𝑚𝑛𝑛) time and 𝑂𝑂(𝑚𝑚) space
𝑀𝑀𝑖𝑖−1,𝑗𝑗−1
𝑀𝑀𝑖𝑖,𝑗𝑗−1 𝑀𝑀𝑖𝑖,𝑗𝑗
𝑀𝑀𝑖𝑖−1,𝑗𝑗
+𝛿𝛿(𝑥𝑥[𝑖𝑖],𝑦𝑦[𝑗𝑗]) +1
+1
Improvement the average time complexity
The given pattern seldom occurs in the text!During calculations of each column, values become k+1 before reaching to the bottom (that is, mismatch occurs at the current position).A cell whose value is larger than k+1 does not affect to the final results.If the value of a cell is less than or equal to 𝑘𝑘, we call it active. The average time complexity can be reduced to O(𝑘𝑘𝑛𝑛) by calculating only active cells. (This improved algorithm is called DP)
E. Ukkonen. Finding approximate patterns in strings. Journal of Algorithms, 6(1-3):132-137, 1985.
a n n e a l i n g0 0 0 0 0 0 0 0 0 0
a 1 0 1 1 1 0 1 1 1 1n 2 1 0 1 2 1 1 2 1 2n 3 2 1 0 1 2 2 2 2 2u 4 3 2 1 1 2 3 3 3 3a 5 4 3 2 2 1 2 3 4 4l 6 5 4 3 3 2 1 2 3 4
DP calculation for𝑃𝑃 =annual, 𝑇𝑇 =annealing, 𝑘𝑘 = 2
𝑂𝑂(𝑚𝑚𝑛𝑛) time in the worst case𝑂𝑂(𝑘𝑘𝑛𝑛) time for the average
Pseudo code of DP algorithm
DP (P=p1p2…pm, T=t1t2…tn, k)1 Preprocessing:2 For i∈0…m Do Ci ← i3 lact ← k + 1 /* last active cell */4 Searching:5 For pos∈1...n Do6 pC ← 0, nC ← 07 For i ∈ 1…lact Do8 If pi = tpos Then nC ← pC9 Else10 If pC < nC Then nC ← pC11 If Ci < nC Then nC ← Ci12 nC ← nC + 113 End of if14 pC ← Ci, Ci ← nC15 End of for16 While Clact > k Do lact ← lact – 117 If lact = m Then report an occurrence at pos18 Else lact ← lact + 119 End of for
NFA-base approach
Doing pattern matching by simulating this NFA by translating it to DFA.Originally, this is proposed by Ukkonen[1985]. And several improvements have been proposed so far.Translating to a corresponding DFA increases the number of states to (min(3𝑚𝑚,𝑚𝑚(2𝑚𝑚|∑|)𝑘𝑘)).Therefore, it is not practical when 𝑚𝑚 is large.
An NFA that accepts 𝑃𝑃 = annual with allowing 2 errorsany a ∈ ∑
a n un a l
a n un a l
∑ ∑ ∑ ∑ ∑ ∑∑ ∑ ∑ ∑ ∑ ∑ ∑ε ε ε ε ε ε
a n un a l
∑ ∑ ∑ ∑ ∑ ∑∑ ∑ ∑ ∑ ∑ ∑ ∑ε ε ε ε ε ε
no error
1 error
2 error
Active states after reading 𝑇𝑇 = anneal
E. Ukkonen. Finding approximate patterns in strings. Journal of Algorithms, 6(1-3):132-137, 1985.
ΣΣ Σε
Σ
L
ΣL
match
ins
del
sub
Row-wise bit-parallel for the NFA (BPR)
Pack states of each row to one bit-vector (1 indicates active, 0 indicates non-active) and simulate the move of the whole NFA by a bit-parallel technique.It needs 𝑘𝑘 + 1 bit masks whose length are 𝑚𝑚 bits.The formulas to update the 𝑖𝑖-th row state 𝑅𝑅𝑖𝑖 into new 𝑅𝑅′𝑖𝑖:
R’0 ← ((R0<<1)|0m-11) & B[tj]R’i ← ((Ri<<1)&B[tj])|Ri-1|(Ri-1<<1)|(R’i-1 << 1)|0m-11
no error
1 error
2 error
000000
100011
110111
any a ∈ ∑
a n un a l
a n un a l
∑ ∑ ∑ ∑ ∑ ∑∑ ∑ ∑ ∑ ∑ ∑ ∑ε ε ε ε ε ε
a n un a l
∑ ∑ ∑ ∑ ∑ ∑∑ ∑ ∑ ∑ ∑ ∑ ∑ε ε ε ε ε ε
An NFA that accepts 𝑃𝑃 = annual with allowing 2 errors
Active states after reading 𝑇𝑇 = anneal
𝑂𝑂(𝑘𝑘𝑚𝑚/𝑤𝑤𝑛𝑛) time𝑂𝑂(𝑘𝑘𝑛𝑛) time if 𝑚𝑚 ≦ 𝑤𝑤
Note that the bit order is reversed.
S. Wu and U. Manber. Fast text searching allowing errors. Communications of the ACM, 35(10): 83-91,1992.
Multiple rows can be packed into one vector!
Pseudo code of BPR
BPR (P=p1p2…pm, T=t1t2…tn, k)1 Preprocessing:2 For c∈∑ Do B[c] ← 0m3 For j ∈1…m Do B[pj ] ← B[pj ] | 0m-j 10j-14 Searching:5 For i ∈0...k Do Ri ← 0m-i 1i6 For pos ∈ 1…n Do7 oldR ← R08 newR ← ((oldR<<1)|0m-1 1)&B[tpos]9 R0 ← newR10 For i ∈1...k Do11 newR ← ((Ri<<1)&B[tpos])|oldR|((oldR|newR)<<1)|0m-1112 oldR ← Ri, Ri ← newR13 End of for14 If newR & 10m-1≠0m Then report an occurrence at pos15 End of for
Diagonal-wise bit-parallel for the NFA (BPD)
Pack states diagonally by representing the depth of active states with unary(needing k+1 bits), and combine them into one bit-vector.It needs ‘0’s for representing the boundaries, the total length of the vector becomes (𝑚𝑚− 𝑘𝑘)(𝑘𝑘 + 2) bits.The formulas to update when reading 𝑖𝑖-th character 𝑡𝑡𝑗𝑗:
D’i ← min(Di+1, Di+1+1, g(i-1, tj ))g(i,c) = min({k+1}∪{r|r≧Di and pi+1+r=c})
∑ a n un a l
a n un a l
∑ ∑ ∑ ∑ ∑ ∑∑ ∑ ∑ ∑ ∑ ∑ ∑ε ε ε ε ε ε
a n un a l
∑ ∑ ∑ ∑ ∑ ∑∑ ∑ ∑ ∑ ∑ ∑ ∑ε ε ε ε ε ε
D0 D1 D2 D3 D4
no error
1 error
2 error
R. A. Baeza-Yates and G. Navarro. Faster approximate string matching. Algorithmica, 23(2):127-158, 1999.
D= 0 001 0 0 0k+1 bits k+1 bits k+1 bits k+1 bits
D1111 011 011
D2 D3 D4
Bit-masks like those of Shift-Or
The 1st item is for sub.The 2nd item is for ins.The 3rd item is for match
𝐷𝐷𝑖𝑖 = 3 =[111] if there is no active state
Pseudo code of BPD
BPD (P=p1p2…pm, T=t1t2…tn, k)1 Preprocessing:2 For c∈∑ Do B[c] ← 1m3 For j ∈1…m Do B[pj ] ← B[pj ] & 1m-j 01j-14 For c∈∑ Do5 BB[c] ← 0 sk+1(B[c],0) 0 sk+1(B[c],1)… 0 sk+1(B[c],m-k-1)6 End of for7 Searching:8 D ← (01k+1)m-k9 For pos ∈ 1…n Do10 x ← (D >> (k+2)) | BB[tpos]11 D ← ((D << 1) | (0k+11)m-k)12 & ((D << (k+3)) | (0k+11)m-k-101k+1)13 & (((x + (0k+11)m-k) ∧ x) >> 1) & (01k+1)m-k14 If D & 0(m-k-1)(k+2)010k = 0(m-k)(k+2) Then15 Report an occurrence at pos16 D ← D | 0(m-k-1)(k+2)01k+117 End of If18 End of for
𝐷𝐷𝑖𝑖 + 1𝐷𝐷𝑖𝑖+1 + 1
𝑔𝑔(𝑖𝑖 − 1, tpos)
clean up
Filtering approach: Pattern division method
Idea of filtering approach:It is easier to say “Here is not an occurrence” than “Here is an occurrence”→ Find the candidates rapidly, then look up in detail!This improves the average complexity.Actually, it goes well when the error rate (𝛼𝛼 = 𝑘𝑘/𝑚𝑚) is small.
Pattern division method:Divide a given pattern into k+1 piecesThen, find each piece using a fast multiple pattern matching algorithmWhen finding a piece, run an ordinary approximate string matching algorithm (such as DP) over the neighborhood of the occurrence to check if the pattern matches
Text: ACCCTGTTTAGATCACGGCACTACTGTAAAC
𝑘𝑘 + 1 pieces: TAAAT, CACGG, CATACT
For 𝑘𝑘 = 2Pattern: TAAATCACGGCATACT
S. Wu and U. Manber. Fast text searching allowing errors. Communications of the ACM, 35(10): 83-91,1992.
Multiple Shift-And orSet Horspool
Speeding-up by hierarchical verification (PEX)
Checking candidates hierarchically can reduce the processing time.Assume that 𝑗𝑗 = 𝑘𝑘 + 1 = 2𝑟𝑟. Halve a given pattern with allowing 𝑘𝑘/2 errors for each, and repeat the division recursively till each piece allows 0 errors.Find the pieces using a multiple pattern matching algorithm, and then check the candidates hierarchically.
CreateTree (P=p1p2…pm, k, myParent, idx, plen)1 Create new node2 from(node) ← i3 to(node) ← j4 left ← (k+1)/25 parent(node) ← myParent6 err(node) ← k7 If k = 0 Then leafidx ← node8 Else9 CreateTree(pi…i+left・plen–1, (left・k)/(k+1), node, idx, plen) 10 CreateTree(pi+left・plen…j,((k+1–left)・k)/(k+1),node,idx+left,plen)11 End of If
G. Navarro and R. Baeza-Yates. Very fast and simple approximate string matching. Information Processing Letters, 72:65-70, 1999.
a a a b b b c c c d d da a a b b b c c c d d d
a a a b b b c c c d d d
𝑘𝑘 = 3 errors
𝑘𝑘 = 1 errors
𝑘𝑘 = 0 errors
Make padding when it doesn’t match with 2𝑟𝑟
Pseudo code of PEX
PEX (P=p1p2…pm, T=t1t2…tn, k)1 Preprocessing:2 CreateTree(p, k,θ, 0, m/(k+1) )3 Preprocess multipattern search for4 {pfrom(node)…pto(node) | node = leafi , i∈{0…k} }5 Searching:6 For (pos, i) ∈ output of multipattern search Do7 node ← leafi8 in ← from(node)9 node ← parent(node)10 cand ← TRUE11 While cand = TRUE and node ≠θ Do12 p1 ← pos – (in – from(node)) – err(node)13 p2 ← pos + (to(node) – in + 1) + err(node)14 Verify text area Tp1…p2 for pattern piece pfrom(node)…to(node)15 allowing err(node) errors 16 If pattern piece was not found Then cand ← FALSE17 Else node ← parent(node)18 End of while19 If cand = TRUE Then20 Report the positions where the whole p was found21 End of If22 End of for
Filtering approach: BNDM method (ABNDM)
Construct an NFA that accepts any factor of 𝑃𝑃𝑅𝑅 for a given pattern 𝑃𝑃 with allowing 𝑘𝑘 errors → an extension of BNDM
The NFA can tell if the input is a prefix of 𝑃𝑃𝑅𝑅 with 𝑘𝑘 errors.BNDM runs faster than BM when the alphabet size is small enough.We can quickly extract candidate positions by this NFA.It can skip several text positions like BNDM.
For texts whose alphabet size is small, such as DNA sequence, ABNDM runs faster than PEX
anu nal∑ ∑ ∑ ∑ ∑ ∑
∑ ∑ ∑ ∑ ∑ ∑ ∑ε ε ε ε ε ε
∑ ∑ ∑ ∑ ∑ ∑∑ ∑ ∑ ∑ ∑ ∑ ∑ε ε ε ε ε ε
no error
1 error
2 error
εε ε ε ε ε ε
G. Navarro and R. Baeza-Yates. Very fast and simple approximate string matching. Information Processing Letters, 72:65-70, 1999.
anu nal
anu nal
Pseudo code of ABNDM
ABNDM (P=p1p2…pm, T=t1t2…tn, k)1 Preprocessing:2 For c∈∑ Do B[c] ← 0m3 For j ∈1…m Do B[pj ] ← B[pj ] | 0m-j 10j-14 Searching:5 pos ← 06 While pos ≦ n – (m – k) Do7 j ← m – k – 1, last ← m – k – 18 R0 ← B[tpos+m–k ]9 newR ← 1m10 For i ∈1…k Do Ri ← newR11 While newR ≠ 0m and j ≠ 0 Do12 oldR ← R013 newR ← (oldR << 1) & B[tpos+j ]14 R0 ← newR15 For i ∈1…k Do16 newR ← ((Ri<<1)&B[tpos+j])|oldR|((oldR|newR)<<1)17 oldR ← Ri, Ri ← newR18 End of for19 j ← j – 120 If newR & 10m-1 ≠ 0m Then /* prefix recognized */21 If j > 0 Then last ← j22 Else check a possible occurrence starting at pos+123 End of if24 End of while25 pos ← pos + last26 End of while
SummaryWhat is the approximate string matching?
It is the problem of finding substrings which match to 𝑃𝑃 within 𝑘𝑘 edit distances.Dynamic programming approach:
O 𝑚𝑚𝑛𝑛 time and O(𝑚𝑚) space → can be improved to O(𝑘𝑘𝑛𝑛) for the average (DP)
NFA approach:It constructs an NFA that accepts 𝑃𝑃 with 𝑘𝑘 errors → translate to a corresponding DFA and then simulate itBit-parallel simulation of the NFA:
Row-wise (BPR): O(𝑘𝑘 𝑚𝑚/𝑤𝑤 𝑛𝑛) timeDiagonal-wise (BPD): O( 𝑘𝑘(𝑚𝑚 − 𝑘𝑘)/𝑤𝑤 𝑛𝑛) time
Filtering approach:It finds without checking the most of text in detailPattern division method (PEX), BNDM method (ABNDM)
The next theme:Regular expression matching: for a flexible and convenient keyword searching