資料結構與演算法 ( 下 )

62
2007/05/28 2007/05/28 Range Minima Query Range Minima Query 1 資資資資資資資資 ( 資 ) 呂呂(Hsueh-I Lu) http:// www.csie.ntu.edu.tw/~hil /

description

資料結構與演算法 ( 下 ). 呂學一 (Hsueh-I Lu) http://www.csie.ntu.edu.tw/~hil/. Today – 如虎添翼. An fundamental query that significantly strengthens suffix tree Range Minima Query (RMQ) 前翼 : RMQ for ±sequences. 後翼 : RMQ for general sequences. Road map. Document listing. Wildcard matching - PowerPoint PPT Presentation

Transcript of 資料結構與演算法 ( 下 )

Page 1: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 11

資料結構與演算法 ( 下 )

呂學一 (Hsueh-I Lu)

http://www.csie.ntu.edu.tw/~hil/

Page 2: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 22

Today – Today – 如虎添翼如虎添翼

An fundamental query that significantly strengthens suffix tree– Range Minima Query (RMQ)

前翼 : RMQ for ±sequences. 後翼 : RMQ for general sequences.

Page 3: 資料結構與演算法 ( 下 )

Road mapRoad map

+/-RMQ

LCA

RMQLCE

Document listingWildcard matchingFuzzy matching

Page 4: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 44

RMQ: Range Minima RMQ: Range Minima QueryQuery S: a sequence of

numbers. 小 (S, i, j) = k if

– i ≤ k ≤ j, and

– S[k] = min(S[i], S[i+1], …, S[i]).

123456789

S = 340141932

小 (S, 2, 6) = 3 小 (S, 4, 10) = 4 (or

6).

Page 5: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 55

The RMQ challengeThe RMQ challenge

Input: a sequence S of numbers Output: a data structure D for S Time complexity

– Constant query time Each query 小 (S, i, j) for S can be answered

from D and S in O(1) time.

– Linear preprocessing time D can be computed in O(|S|) time.

Page 6: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 66

Naïve approachNaïve approach

Storing the answer of 小 (S, i, j) in a table for all index pairs i and j with 1 ≤ i ≤ j ≤ |S|.

Query time = O(1). Preprocessing time = Ω(|S|2).

Page 7: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 77

Faster PreprocessingFaster Preprocessing

Assumption (without loss of generality)– |S| = 2k for some positive integer k.

Idea: – Precomputing the values of 小 (S, i, j) only

for those indices i and j with j – i + 1 = 1, 2, 4, 8, …, 2k = |S|.

Preprocessing time– O(|S| log |S|).

Page 8: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 88

小小 (S, i, j) still in O(1) (S, i, j) still in O(1) timetime Let k be the (unique)

integer that satisfies 2k ≤ j – i + 1 < 2k+1.

Then, 小 (S, i, j) is – x = 小 (S, i, i + 2k – 1)

or

– y = 小 (S, j – 2k + 1, j).

i ji + 2k – 1j – 2k + 1

Page 9: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 99

As a resultAs a result

RMQ– Input: O(n) numbers– Preprocessing: O(n log n) time– Query: O(1) time

RMQ– Input: O(n/log n) numbers– Preprocessing: O(n) time– Query: O(1) time

Page 10: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 1010

前翼

The RMQ Challenge for ±sequeneces

Page 11: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 1111

±sequeneces±sequeneces

S is a ±sequence if S[i] – S[i – 1] = ±1 for each index i with 2 ≤ i ≤ |S|.

For example, – S = 5 6 5 4 3 2 3 2 3 4 5 6 5 6 7– + - - - - + - + + + + - + +

– S = 3 4 3 2 1 0 -1 -2 -1 0 1 2 1– + - - - - - - + + + + -

Page 12: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 1212

前翼前翼 : The RMQ : The RMQ Challenge for Challenge for ±sequeneces±sequeneces Input: a ±sequence S of numbers Output: a data structure D for S Time complexity

– Constant query time Each query 小 (S, i, j) for S can be answered from

D and S in O(1) time.

– Linear preprocessing time D can be computed in O(|S|) time under the unit-

cost RAM model.

Page 13: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 1313

Unit-Cost RAM modelUnit-Cost RAM model

Operations such as add, minus, comparison on consecutive O(log n) bits can be performed in O(1) time.

Page 14: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 1414

Idea: compressionIdea: compression

Breaking S into blocks of length L = ½ log |S|.– There are B = 2|S|/log |S| blocks.

Let 縮 [t] be the minimum of the t-th block of S.– 縮 [t] = min {S[j] | j = (t – 1) L < j ≤ tL} for t = 1, 2, …,

B.

– Computable in O(|S|) time. RMQ on 縮 : 小 ( 縮 , x, y)

– O(1) query time.

– O(|S|) preprocessing time. (Why?)

Any constant c <

1 is OK.

Page 15: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 1515

小小 (S, i, j) via (S, i, j) via 小小 (( 縮縮 , s, , s, t)t) Suppose S[i] is in the

α-th block of S.– (α–1) L<i ≤ αL.

Suppose S[j] is in the γ-th block of S.– (γ–1) L < j ≤ γL.

β= 小 ( 縮 ,α+1,γ-1).

小 (S, i, j) is one of – 小 (S, i, αL)

– 小 (S, (γ–1)L +1, j)

– 小 (S, (β-1)L+1, βL) Note that each of

these three is a query within a length-L block.

Page 16: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 1616

IllustrationIllustrationji

α β γ

Page 17: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 1717

小小 (S, i, j) within a (S, i, j) within a blockblock It remains to show how to answer 小 (S, i,

j) in O(1) time for any indices i and j such that (t–1)L < i ≤ j ≤ tL for some positive integer t with the help of some linear time preprocessing.

Page 18: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 1818

Difference sequenceDifference sequence

The difference sequence 差 of S is defined as follows: 差 [i] = S[i+1] – S[i].– Since S is a ±sequence, each 差 [i] = ±1.– 小 (S, i, j) can be determined from 差 [i…j].– The number of distinct patterns of a length-L

difference sequence is exactly 2L = |S|½.

Page 19: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 1919

1,1 … 2,1 2,2 2,3 2,4 2,5 … L, L

+ + + + 1 … 2 2 2 2 … L

+ + + – 1 … 2 2 2 2 … L

+ + – + 1 … 2 2 2 2 … L

+ + – – 1 … 2 2 2 2 … L

+ – + + 1 … 2 3 3 3 … L

+ – + – 1 … 2 3 3 3 … L

+ – – + 1 … 2 3 4 4 … L

+ – – – 1 … 2 3 4 5 … L

… 1 … 2 3 … … … L

– – – – 1 … 2 3 4 5 … L

Preprocessing all Preprocessing all patternspatterns

o(|S|) time. – #row = |S|½

– #col = ¼ log2 |S|

– Each entry is computable in O(log |S|) time.

Answering each 小 (S, i, j) takes O(1) time.

Page 20: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 2020

LCA: Lowest Common Ancestor

An application of RMQ for ±sequences

Page 21: 資料結構與演算法 ( 下 )

Road mapRoad map

+/-RMQ

LCA

RMQLCE

Document listingWildcard matchingFuzzy matching

Page 22: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 2222

Lowest Common Lowest Common AncestorAncestor T is a rooted tree. 祖 (x, y) is the lowest (i.e., deepest) node

of T that is an ancestor of both node x and node y.

Page 23: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 2323

For example, …For example, …

1

2

3

5 6

4 7

祖 (3,6)

祖 (5,7)

Page 24: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 2424

The challenge for The challenge for 祖祖 (x, (x, y)y) Input: an n-node rooted tree T. Output: a data structure D for T. Requirement:

– D can be computed in O(n) time.– Each query 祖 (x, y) for T can be answered

from D in O(1) time.

Page 25: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 2525

Idea: depth-first Idea: depth-first traversaltraversal 1234567890123

V=1232454642171

L=1232343432121

If V[i]=x and V[j]=y,

then 祖 (x, y)=V[ 小 (L, i, j)]

1

2

3

5 6

4 7

Page 26: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 2626

Idea: depth-first Idea: depth-first traversaltraversal 1234567890123

V=1232454642171

L=1232343432121

1 2 3 4 5 6 7

I=1,2,3,5,6,8,12

祖 (x, y)=V[ 小 (L, I(x), I(y))].

O(n)-time Preprocessing– Computing V and L

– Preprocessing L for queries 小 (L, i, j).

– Precomputing an array I such that V[I[x]] = x for each node x.

Page 27: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 2727

Idea: depth-first Idea: depth-first traversaltraversal 1234567890123

V=1232454642171

L=1232343432121

1 2 3 4 5 6 7

I=1,2,3,5,6,8,12

祖 (x, y)=V[ 小 (L, I(x), I(y))].

Query time is clearly O(1).

1

2

3

5 6

4 7

Page 28: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 2828

ExampleExample 1

2

3

5 6

4 7

祖 (3,6)

祖 (5,7)

1234567890123

V=1232454642171

L=1232343432121

1 2 3 4 5 6 7

I=1,2,3,5,6,8,12

祖 (x, y)=V[ 小 (L, I(x), I(y))].

Page 29: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 2929

LCE: Longest Common Extension

An application of LCA queries 祖 (i, j).

Page 30: 資料結構與演算法 ( 下 )

Road mapRoad map

+/-RMQ

LCA

RMQLCE

Document listingWildcard matchingFuzzy matching

Page 31: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 3131

Longest Common Longest Common ExtensionExtension

Suppose A and B are two strings. Let 延 (i, j) be the largest number d + 1

such that A[i…i+d] = B[j…j+d]. Example

– A = a b a b b a– B = b b a a b b b– 延 (1,1) = 0, 延 (2,1) = 1, – 延 (2,2) = 2, 延 (3,4) = 3.

Page 32: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 3232

The challenge for The challenge for 延延 (i, (i, j)j) Input: two strings A and B. Objective: output a data structure D for A

and B in O(|A|+|B|) time such that each query 延 (i, j) can be answered from D in O(1) time.

Page 33: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 3333

Idea: Suffix Tree for Idea: Suffix Tree for A#B$A#B$

x is the i-th leaf y is the (j+|A|+1)-st

leaf. The depth of 祖 (x, y)

is exactly 延 (i, j).

x y

祖 (x, y)

$#A B

A-suffix

B-suffix

Page 34: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 3434

Wildcard Matching

An application of longest common extension 延 (i, j)

Page 35: 資料結構與演算法 ( 下 )

Road mapRoad map

+/-RMQ

LCA

RMQLCE

Document listingWildcard matchingFuzzy matching

Page 36: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 3636

Wildcard MatchingWildcard Matching

Input: two strings P and S, – where P has k wildcard characters ‘?’, each

could match any character of S. Output: all occurrences of P in S.

Page 37: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 3737

Naïve algorithmNaïve algorithm

Suppose S has t distinct characters. Naïve algorithm: Construct the suffix tree of S;

For each of tk possibilities of P do

Output the occurrences of P in S;

Time complexity = Ω(|S|+tk|P|).

Page 38: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 3838

Wildcard Matching via Wildcard Matching via longest common longest common extensionextension

Suppose j1 < j2 < … < jk are the indices such that – P[j1] = P[j2] = … = P[jk] = ‘?’.

P matches S[i…i+|P|–1] if and only if– 延 (i, 1) ≥ j1 – 1;– 延 (i+ j1, j1+1) ≥ j2 – j1 – 1;– 延 (i+ j2, j2+1) ≥ j3 – j2 – 1;– …– 延 (i+ jk-1, jk-1+1) ≥ jk – jk-1 – 1; and – 延 (i+ jk, jk+1) ≥ |P| – jk + 1.

1 j1 j2 jk |P|

i

P

S

Page 39: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 3939

O(k|S|) timeO(k|S|) time

O(|P|+|S|) = O(|S|) time: preprocessing for supporting each 延 (i, j) query in O(1) time.

O(|S|) iterations, each takes time O(k).

Page 40: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 4040

Fuzzy Matching

Another application of longest common extension 延 (i, j).

Page 41: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 4141

Fuzzy MatchingFuzzy Matching

Input: an integer k and two strings P and S Output: all “fussy occurrences” of P in S,

where each “fussy occurrence” allows at most k mismatched characters.

Page 42: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 4242

Fuzzy occurrencesFuzzy occurrences

Whether P occurs in S[i…i+|P|-1] with k or fewer errors can be determined by…– j = 延 (i, 1); error = 0;

– while (j < |P|) If (++error > k) then return “no”; j += 1 + 延 (i + j + 1, j + 2);

– return “yes”.

Page 43: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 4343

O(k|S|) timeO(k|S|) time

O(|P|+|S|) = O(|S|) time: preprocessing for supporting each 延 (i, j) query in O(1) time.

O(|S|) iterations, each takes time O(k).

Page 44: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 4444

後翼 : The RMQ (i.e., 小 (S, i, j)) challenge for general sequences

Another application of lowest common ancestor

Page 45: 資料結構與演算法 ( 下 )

Road mapRoad map

+/-RMQ

LCA

RMQLCE

Document listingWildcard matchingFuzzy matching

Page 46: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 4646

The RMQ challengeThe RMQ challenge

Input: a sequence S of numbers Output: a data structure D for S Time complexity

– Constant query time Each query 小 (S, i, j) for S can be answered

from D and S in O(1) time.

– Linear preprocessing time D can be computed in O(|S|) time.

Page 47: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 4747

Idea: Minima TreeIdea: Minima Tree

123456789

S=432417363

小 (S,i,j)=祖 (i,j).

5

73

8

64 92

1

Page 48: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 4848

ExerciseExercise

Show how to construct a minima tree for any sequence S of numbers in O(|S|) time.

Page 49: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 4949

Listing source strings that contains a pattern string [Muthukrishnan, SODA’02]

An application of RMQ for general sequences

Page 50: 資料結構與演算法 ( 下 )

Road mapRoad map

+/-RMQ

LCA

RMQLCE

Document listingWildcard matchingFuzzy matching

Page 51: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 5151

The problem The problem

Input: – Strings S1, S2, …, Sm, which can be

preprocessed in linear time.– A string P.

Output: – The index j of each Sj that contains P.

Page 52: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 5252

Preliminary attemptsPreliminary attempts

Obtaining the suffix tree for S1#S2#…#Sm$.

– Find all occurrences of P. I.e., exact string matching for S1#S2#…#Sm$ and P. Time = O(|P| + total number of occurences of P).

Obtaining the suffix tree for each Si.

– Determining whether P occurs in Si. I.e., substring problem for each pair Si and P. Time = O(|P|m).

Page 53: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 5353

The challenge The challenge

Input: – Strings S1, S2, …, Sm, which can be preprocessed in

linear time.

– A string P. Output:

– The index j of each Sj that contains P.

Objective– O(|P| + 現 (P)) time, where 現 (P) is the number of

output indices.

Page 54: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 5454

The second attemptThe second attempt

Constructing the suffix tree for S1#S2#…#Sm$.

Keeping the distinct descendant leaf colors for each internal node.

Query time? Preprocessing time?

Page 55: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 5555

The second attemptThe second attempt

Each query takes O(|P|+ 現(P)) time. (Why?)

The preprocessing may need Ω(m| S1#S2#…#Sm$|) time. (Why?)

Q: Any suggestions for resolving this problem?

Page 56: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 5656

1 2 3 4 5 6 7 8

Keeping the list 彩 of leaf colors from left to right.

Each internal keeps the indices of leftmost and rightmost descendant leaves.

4 6 73

8

1 5

2

1,8

5,81,4

2,4

3,46,7

6,8

Compact Compact RepresentationRepresentation

Page 57: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 5757

The challenge of The challenge of listing distinct colorslisting distinct colors Input: a sequence 彩 of colors. Output: a data structure D for 彩 such

that– D is computable in O(| 彩 |) time.– Each 顏 (i, j) = { 彩 (i), …, 彩 (j)} query can

be answered from D in O(| 顏 (i, j)|) time.

Page 58: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 5858

An auxiliary index An auxiliary index arrayarray

1 2 3 4 5 6 7 8

前 0 0 0 2 3 1 5 6

Let 前 [i] = 0 if 彩 [j] ≠ 彩 [i] for all j < i. Let 前 [i] be the largest index j with j < i

such that 彩 [i] = 彩 [j].

Page 59: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 5959

An observationAn observation

1 2 3 4 5 6 7 8

前 0 0 0 2 3 1 5 6

A color c is in 顏 (i, j) if and only there is an index k in [i, j] such that – 彩 [k] = c and 前 [k] < i.

Page 60: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 6060

The algorithm The algorithm 解解 (i, j)(i, j)

Just recursively call 破 (i, j, i);

Subroutine 破 (p, q, 左界 ): – If (p > q) then return;– Let k = 小 ( 前 , p, q);– If (k ≥ 左界 ) then return;– Output 彩 [k];– Call 破 (p, k – 1, 左界 );– Call 破 (k + 1, q, 左界 );

Page 61: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 6161

解解 (3,7) = (3,7) = 破破 (3, 7, 3)…(3, 7, 3)…

1 2 3 4 5 6 7 8

前 0 0 0 2 3 1 5 6

Page 62: 資料結構與演算法 ( 下 )

2007/05/282007/05/28 Range Minima QueryRange Minima Query 6262

Time = O(|Time = O(| 顏顏 (i, j)|)(i, j)|)

Why?