資料結構與演算法 ( 下 )
description
Transcript of 資料結構與演算法 ( 下 )
2007/05/282007/05/28 Range Minima QueryRange Minima Query 11
資料結構與演算法 ( 下 )
呂學一 (Hsueh-I Lu)
http://www.csie.ntu.edu.tw/~hil/
2007/05/282007/05/28 Range Minima QueryRange Minima Query 22
Today – Today – 如虎添翼如虎添翼
An fundamental query that significantly strengthens suffix tree– Range Minima Query (RMQ)
前翼 : RMQ for ±sequences. 後翼 : RMQ for general sequences.
Road mapRoad map
+/-RMQ
LCA
RMQLCE
Document listingWildcard matchingFuzzy matching
2007/05/282007/05/28 Range Minima QueryRange Minima Query 44
RMQ: Range Minima RMQ: Range Minima QueryQuery S: a sequence of
numbers. 小 (S, i, j) = k if
– i ≤ k ≤ j, and
– S[k] = min(S[i], S[i+1], …, S[i]).
123456789
S = 340141932
小 (S, 2, 6) = 3 小 (S, 4, 10) = 4 (or
6).
2007/05/282007/05/28 Range Minima QueryRange Minima Query 55
The RMQ challengeThe RMQ challenge
Input: a sequence S of numbers Output: a data structure D for S Time complexity
– Constant query time Each query 小 (S, i, j) for S can be answered
from D and S in O(1) time.
– Linear preprocessing time D can be computed in O(|S|) time.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 66
Naïve approachNaïve approach
Storing the answer of 小 (S, i, j) in a table for all index pairs i and j with 1 ≤ i ≤ j ≤ |S|.
Query time = O(1). Preprocessing time = Ω(|S|2).
2007/05/282007/05/28 Range Minima QueryRange Minima Query 77
Faster PreprocessingFaster Preprocessing
Assumption (without loss of generality)– |S| = 2k for some positive integer k.
Idea: – Precomputing the values of 小 (S, i, j) only
for those indices i and j with j – i + 1 = 1, 2, 4, 8, …, 2k = |S|.
Preprocessing time– O(|S| log |S|).
2007/05/282007/05/28 Range Minima QueryRange Minima Query 88
小小 (S, i, j) still in O(1) (S, i, j) still in O(1) timetime Let k be the (unique)
integer that satisfies 2k ≤ j – i + 1 < 2k+1.
Then, 小 (S, i, j) is – x = 小 (S, i, i + 2k – 1)
or
– y = 小 (S, j – 2k + 1, j).
i ji + 2k – 1j – 2k + 1
2007/05/282007/05/28 Range Minima QueryRange Minima Query 99
As a resultAs a result
RMQ– Input: O(n) numbers– Preprocessing: O(n log n) time– Query: O(1) time
RMQ– Input: O(n/log n) numbers– Preprocessing: O(n) time– Query: O(1) time
2007/05/282007/05/28 Range Minima QueryRange Minima Query 1010
前翼
The RMQ Challenge for ±sequeneces
2007/05/282007/05/28 Range Minima QueryRange Minima Query 1111
±sequeneces±sequeneces
S is a ±sequence if S[i] – S[i – 1] = ±1 for each index i with 2 ≤ i ≤ |S|.
For example, – S = 5 6 5 4 3 2 3 2 3 4 5 6 5 6 7– + - - - - + - + + + + - + +
– S = 3 4 3 2 1 0 -1 -2 -1 0 1 2 1– + - - - - - - + + + + -
2007/05/282007/05/28 Range Minima QueryRange Minima Query 1212
前翼前翼 : The RMQ : The RMQ Challenge for Challenge for ±sequeneces±sequeneces Input: a ±sequence S of numbers Output: a data structure D for S Time complexity
– Constant query time Each query 小 (S, i, j) for S can be answered from
D and S in O(1) time.
– Linear preprocessing time D can be computed in O(|S|) time under the unit-
cost RAM model.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 1313
Unit-Cost RAM modelUnit-Cost RAM model
Operations such as add, minus, comparison on consecutive O(log n) bits can be performed in O(1) time.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 1414
Idea: compressionIdea: compression
Breaking S into blocks of length L = ½ log |S|.– There are B = 2|S|/log |S| blocks.
Let 縮 [t] be the minimum of the t-th block of S.– 縮 [t] = min {S[j] | j = (t – 1) L < j ≤ tL} for t = 1, 2, …,
B.
– Computable in O(|S|) time. RMQ on 縮 : 小 ( 縮 , x, y)
– O(1) query time.
– O(|S|) preprocessing time. (Why?)
Any constant c <
1 is OK.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 1515
小小 (S, i, j) via (S, i, j) via 小小 (( 縮縮 , s, , s, t)t) Suppose S[i] is in the
α-th block of S.– (α–1) L<i ≤ αL.
Suppose S[j] is in the γ-th block of S.– (γ–1) L < j ≤ γL.
β= 小 ( 縮 ,α+1,γ-1).
小 (S, i, j) is one of – 小 (S, i, αL)
– 小 (S, (γ–1)L +1, j)
– 小 (S, (β-1)L+1, βL) Note that each of
these three is a query within a length-L block.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 1616
IllustrationIllustrationji
α β γ
縮
2007/05/282007/05/28 Range Minima QueryRange Minima Query 1717
小小 (S, i, j) within a (S, i, j) within a blockblock It remains to show how to answer 小 (S, i,
j) in O(1) time for any indices i and j such that (t–1)L < i ≤ j ≤ tL for some positive integer t with the help of some linear time preprocessing.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 1818
Difference sequenceDifference sequence
The difference sequence 差 of S is defined as follows: 差 [i] = S[i+1] – S[i].– Since S is a ±sequence, each 差 [i] = ±1.– 小 (S, i, j) can be determined from 差 [i…j].– The number of distinct patterns of a length-L
difference sequence is exactly 2L = |S|½.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 1919
1,1 … 2,1 2,2 2,3 2,4 2,5 … L, L
+ + + + 1 … 2 2 2 2 … L
+ + + – 1 … 2 2 2 2 … L
+ + – + 1 … 2 2 2 2 … L
+ + – – 1 … 2 2 2 2 … L
+ – + + 1 … 2 3 3 3 … L
+ – + – 1 … 2 3 3 3 … L
+ – – + 1 … 2 3 4 4 … L
+ – – – 1 … 2 3 4 5 … L
… 1 … 2 3 … … … L
– – – – 1 … 2 3 4 5 … L
Preprocessing all Preprocessing all patternspatterns
o(|S|) time. – #row = |S|½
– #col = ¼ log2 |S|
– Each entry is computable in O(log |S|) time.
Answering each 小 (S, i, j) takes O(1) time.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 2020
LCA: Lowest Common Ancestor
An application of RMQ for ±sequences
Road mapRoad map
+/-RMQ
LCA
RMQLCE
Document listingWildcard matchingFuzzy matching
2007/05/282007/05/28 Range Minima QueryRange Minima Query 2222
Lowest Common Lowest Common AncestorAncestor T is a rooted tree. 祖 (x, y) is the lowest (i.e., deepest) node
of T that is an ancestor of both node x and node y.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 2323
For example, …For example, …
1
2
3
5 6
4 7
祖 (3,6)
祖 (5,7)
2007/05/282007/05/28 Range Minima QueryRange Minima Query 2424
The challenge for The challenge for 祖祖 (x, (x, y)y) Input: an n-node rooted tree T. Output: a data structure D for T. Requirement:
– D can be computed in O(n) time.– Each query 祖 (x, y) for T can be answered
from D in O(1) time.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 2525
Idea: depth-first Idea: depth-first traversaltraversal 1234567890123
V=1232454642171
L=1232343432121
If V[i]=x and V[j]=y,
then 祖 (x, y)=V[ 小 (L, i, j)]
1
2
3
5 6
4 7
2007/05/282007/05/28 Range Minima QueryRange Minima Query 2626
Idea: depth-first Idea: depth-first traversaltraversal 1234567890123
V=1232454642171
L=1232343432121
1 2 3 4 5 6 7
I=1,2,3,5,6,8,12
祖 (x, y)=V[ 小 (L, I(x), I(y))].
O(n)-time Preprocessing– Computing V and L
– Preprocessing L for queries 小 (L, i, j).
– Precomputing an array I such that V[I[x]] = x for each node x.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 2727
Idea: depth-first Idea: depth-first traversaltraversal 1234567890123
V=1232454642171
L=1232343432121
1 2 3 4 5 6 7
I=1,2,3,5,6,8,12
祖 (x, y)=V[ 小 (L, I(x), I(y))].
Query time is clearly O(1).
1
2
3
5 6
4 7
2007/05/282007/05/28 Range Minima QueryRange Minima Query 2828
ExampleExample 1
2
3
5 6
4 7
祖 (3,6)
祖 (5,7)
1234567890123
V=1232454642171
L=1232343432121
1 2 3 4 5 6 7
I=1,2,3,5,6,8,12
祖 (x, y)=V[ 小 (L, I(x), I(y))].
2007/05/282007/05/28 Range Minima QueryRange Minima Query 2929
LCE: Longest Common Extension
An application of LCA queries 祖 (i, j).
Road mapRoad map
+/-RMQ
LCA
RMQLCE
Document listingWildcard matchingFuzzy matching
2007/05/282007/05/28 Range Minima QueryRange Minima Query 3131
Longest Common Longest Common ExtensionExtension
Suppose A and B are two strings. Let 延 (i, j) be the largest number d + 1
such that A[i…i+d] = B[j…j+d]. Example
– A = a b a b b a– B = b b a a b b b– 延 (1,1) = 0, 延 (2,1) = 1, – 延 (2,2) = 2, 延 (3,4) = 3.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 3232
The challenge for The challenge for 延延 (i, (i, j)j) Input: two strings A and B. Objective: output a data structure D for A
and B in O(|A|+|B|) time such that each query 延 (i, j) can be answered from D in O(1) time.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 3333
Idea: Suffix Tree for Idea: Suffix Tree for A#B$A#B$
x is the i-th leaf y is the (j+|A|+1)-st
leaf. The depth of 祖 (x, y)
is exactly 延 (i, j).
x y
祖 (x, y)
$#A B
A-suffix
B-suffix
2007/05/282007/05/28 Range Minima QueryRange Minima Query 3434
Wildcard Matching
An application of longest common extension 延 (i, j)
Road mapRoad map
+/-RMQ
LCA
RMQLCE
Document listingWildcard matchingFuzzy matching
2007/05/282007/05/28 Range Minima QueryRange Minima Query 3636
Wildcard MatchingWildcard Matching
Input: two strings P and S, – where P has k wildcard characters ‘?’, each
could match any character of S. Output: all occurrences of P in S.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 3737
Naïve algorithmNaïve algorithm
Suppose S has t distinct characters. Naïve algorithm: Construct the suffix tree of S;
For each of tk possibilities of P do
Output the occurrences of P in S;
Time complexity = Ω(|S|+tk|P|).
2007/05/282007/05/28 Range Minima QueryRange Minima Query 3838
Wildcard Matching via Wildcard Matching via longest common longest common extensionextension
Suppose j1 < j2 < … < jk are the indices such that – P[j1] = P[j2] = … = P[jk] = ‘?’.
P matches S[i…i+|P|–1] if and only if– 延 (i, 1) ≥ j1 – 1;– 延 (i+ j1, j1+1) ≥ j2 – j1 – 1;– 延 (i+ j2, j2+1) ≥ j3 – j2 – 1;– …– 延 (i+ jk-1, jk-1+1) ≥ jk – jk-1 – 1; and – 延 (i+ jk, jk+1) ≥ |P| – jk + 1.
1 j1 j2 jk |P|
i
P
S
2007/05/282007/05/28 Range Minima QueryRange Minima Query 3939
O(k|S|) timeO(k|S|) time
O(|P|+|S|) = O(|S|) time: preprocessing for supporting each 延 (i, j) query in O(1) time.
O(|S|) iterations, each takes time O(k).
2007/05/282007/05/28 Range Minima QueryRange Minima Query 4040
Fuzzy Matching
Another application of longest common extension 延 (i, j).
2007/05/282007/05/28 Range Minima QueryRange Minima Query 4141
Fuzzy MatchingFuzzy Matching
Input: an integer k and two strings P and S Output: all “fussy occurrences” of P in S,
where each “fussy occurrence” allows at most k mismatched characters.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 4242
Fuzzy occurrencesFuzzy occurrences
Whether P occurs in S[i…i+|P|-1] with k or fewer errors can be determined by…– j = 延 (i, 1); error = 0;
– while (j < |P|) If (++error > k) then return “no”; j += 1 + 延 (i + j + 1, j + 2);
– return “yes”.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 4343
O(k|S|) timeO(k|S|) time
O(|P|+|S|) = O(|S|) time: preprocessing for supporting each 延 (i, j) query in O(1) time.
O(|S|) iterations, each takes time O(k).
2007/05/282007/05/28 Range Minima QueryRange Minima Query 4444
後翼 : The RMQ (i.e., 小 (S, i, j)) challenge for general sequences
Another application of lowest common ancestor
Road mapRoad map
+/-RMQ
LCA
RMQLCE
Document listingWildcard matchingFuzzy matching
2007/05/282007/05/28 Range Minima QueryRange Minima Query 4646
The RMQ challengeThe RMQ challenge
Input: a sequence S of numbers Output: a data structure D for S Time complexity
– Constant query time Each query 小 (S, i, j) for S can be answered
from D and S in O(1) time.
– Linear preprocessing time D can be computed in O(|S|) time.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 4747
Idea: Minima TreeIdea: Minima Tree
123456789
S=432417363
小 (S,i,j)=祖 (i,j).
5
73
8
64 92
1
2007/05/282007/05/28 Range Minima QueryRange Minima Query 4848
ExerciseExercise
Show how to construct a minima tree for any sequence S of numbers in O(|S|) time.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 4949
Listing source strings that contains a pattern string [Muthukrishnan, SODA’02]
An application of RMQ for general sequences
Road mapRoad map
+/-RMQ
LCA
RMQLCE
Document listingWildcard matchingFuzzy matching
2007/05/282007/05/28 Range Minima QueryRange Minima Query 5151
The problem The problem
Input: – Strings S1, S2, …, Sm, which can be
preprocessed in linear time.– A string P.
Output: – The index j of each Sj that contains P.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 5252
Preliminary attemptsPreliminary attempts
Obtaining the suffix tree for S1#S2#…#Sm$.
– Find all occurrences of P. I.e., exact string matching for S1#S2#…#Sm$ and P. Time = O(|P| + total number of occurences of P).
Obtaining the suffix tree for each Si.
– Determining whether P occurs in Si. I.e., substring problem for each pair Si and P. Time = O(|P|m).
2007/05/282007/05/28 Range Minima QueryRange Minima Query 5353
The challenge The challenge
Input: – Strings S1, S2, …, Sm, which can be preprocessed in
linear time.
– A string P. Output:
– The index j of each Sj that contains P.
Objective– O(|P| + 現 (P)) time, where 現 (P) is the number of
output indices.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 5454
The second attemptThe second attempt
Constructing the suffix tree for S1#S2#…#Sm$.
Keeping the distinct descendant leaf colors for each internal node.
Query time? Preprocessing time?
2007/05/282007/05/28 Range Minima QueryRange Minima Query 5555
The second attemptThe second attempt
Each query takes O(|P|+ 現(P)) time. (Why?)
The preprocessing may need Ω(m| S1#S2#…#Sm$|) time. (Why?)
Q: Any suggestions for resolving this problem?
2007/05/282007/05/28 Range Minima QueryRange Minima Query 5656
1 2 3 4 5 6 7 8
Keeping the list 彩 of leaf colors from left to right.
Each internal keeps the indices of leftmost and rightmost descendant leaves.
彩
4 6 73
8
1 5
2
1,8
5,81,4
2,4
3,46,7
6,8
Compact Compact RepresentationRepresentation
2007/05/282007/05/28 Range Minima QueryRange Minima Query 5757
The challenge of The challenge of listing distinct colorslisting distinct colors Input: a sequence 彩 of colors. Output: a data structure D for 彩 such
that– D is computable in O(| 彩 |) time.– Each 顏 (i, j) = { 彩 (i), …, 彩 (j)} query can
be answered from D in O(| 顏 (i, j)|) time.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 5858
An auxiliary index An auxiliary index arrayarray
1 2 3 4 5 6 7 8
彩
前 0 0 0 2 3 1 5 6
Let 前 [i] = 0 if 彩 [j] ≠ 彩 [i] for all j < i. Let 前 [i] be the largest index j with j < i
such that 彩 [i] = 彩 [j].
2007/05/282007/05/28 Range Minima QueryRange Minima Query 5959
An observationAn observation
1 2 3 4 5 6 7 8
彩
前 0 0 0 2 3 1 5 6
A color c is in 顏 (i, j) if and only there is an index k in [i, j] such that – 彩 [k] = c and 前 [k] < i.
2007/05/282007/05/28 Range Minima QueryRange Minima Query 6060
The algorithm The algorithm 解解 (i, j)(i, j)
Just recursively call 破 (i, j, i);
Subroutine 破 (p, q, 左界 ): – If (p > q) then return;– Let k = 小 ( 前 , p, q);– If (k ≥ 左界 ) then return;– Output 彩 [k];– Call 破 (p, k – 1, 左界 );– Call 破 (k + 1, q, 左界 );
2007/05/282007/05/28 Range Minima QueryRange Minima Query 6161
解解 (3,7) = (3,7) = 破破 (3, 7, 3)…(3, 7, 3)…
1 2 3 4 5 6 7 8
彩
前 0 0 0 2 3 1 5 6
2007/05/282007/05/28 Range Minima QueryRange Minima Query 6262
Time = O(|Time = O(| 顏顏 (i, j)|)(i, j)|)
Why?