Tree Pattern Matching to Subset Matching in Linear Time

66
Tree Pattern Matching to Subset Matching in Linear Time R. Cole and R. Hariharan

description

Tree Pattern Matching to Subset Matching in Linear Time. R. Cole and R. Hariharan. Tree Pattern Matching. Input: An ordered binary tree T, |T| = n. An ordered binary tree P, |P| = m. Output: All nodes in T where P matches. p. t. Subset Matching. - PowerPoint PPT Presentation

Transcript of Tree Pattern Matching to Subset Matching in Linear Time

Page 1: Tree Pattern Matching to Subset Matching in Linear Time

Tree Pattern Matching to Subset Matching in Linear Time

R. Cole and R. Hariharan

Page 2: Tree Pattern Matching to Subset Matching in Linear Time

Tree Pattern Matching

Input: An ordered binary tree T, |T| = n.

An ordered binary tree P, |P| = m.

Output: All nodes in T where P matches.

t p

Page 3: Tree Pattern Matching to Subset Matching in Linear Time

Subset Matching

Input: A set-string T and a set-string P . Output: All occurrences of P in T.

abc

ac

bc

ac

cef

bf

bT =

ac c bP =

Page 4: Tree Pattern Matching to Subset Matching in Linear Time

History Hoffman and O’Donell, 1982, O(nm). Kosaraju, 1989, O(nm0.75logm). Dubiner, Galil, and Magen, 1994,O(nm0.5logm). Cole and Hariharan, 1997, randomized O(nlog3

m). Indyk, 1998, randomized O(nlogn). Cole, Hariharan, and Indyk, 1999, O(nlog3m). Cole and Hariharan, 2002, O(nlog2m).

Page 5: Tree Pattern Matching to Subset Matching in Linear Time

Period

Def : The period of a string s is the smallest number j such that s[i]=s[i+j].

0 0 1 0 0 1 0 0 1 0 0 1

S =

j = 3

Page 6: Tree Pattern Matching to Subset Matching in Linear Time

非正式用語 (1) 後面的投影片如果說週期為 θ ,意思是以 θ `` 開頭 ”,並且週期為 | θ | 。

| θ | 有時會省略為 θ 。

0 0 1 0 0 1 0 0 1 0 0 1

S =

θ

Yes

0 1 0 0 1 0 0 1 0 0 1 0

S = No

Let θ = 0 0 1, |θ| = 3.

Page 7: Tree Pattern Matching to Subset Matching in Linear Time

s = 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1

Classical Lemma (1)

s: a string with period θ

s =

把 s 切兩半,如果切的地方距離開頭不是 θ的整數倍,

則後面那一半的開頭不會是 θ 。

Ex:

Page 8: Tree Pattern Matching to Subset Matching in Linear Time

Period in linear time

Exercise 1: Design an algorithm to compute the period in linear time.

Page 9: Tree Pattern Matching to Subset Matching in Linear Time

θ-Path

0 0 1 0 0 1 s =

p is a θ-path.

Def: A path p is a θ-path if its string representation has period θ.

p

Let θ = 0 0 1.

Page 10: Tree Pattern Matching to Subset Matching in Linear Time

Maximalθ-Path Def: A θ-path p is maximal if it can not be extended.

θ = 0 0 1.

not

not

Page 11: Tree Pattern Matching to Subset Matching in Linear Time

Maximalθ-Paths in linear time

Exercise 2: Design an linear time algorithm to find all maximalθ-Paths in a tree.

Page 12: Tree Pattern Matching to Subset Matching in Linear Time

非正式用語 (2)

一個 node 的大小 = 以這個 node 為 root 的subtree 的大小。

74

1

1

2

2

1

Page 13: Tree Pattern Matching to Subset Matching in Linear Time

Centroid of a treem = 19

m/2 = 9

Page 14: Tree Pattern Matching to Subset Matching in Linear Time

Spine of a Tree

Spine = centroid 加強版

Page 15: Tree Pattern Matching to Subset Matching in Linear Time

Spine of a Tree

0 1 0 1 0 1 0

Link node = Centroid 的最後一個 node

≥ m/2

= Spine 上最後一個 ≥ m/2 的點

< m/2

Page 16: Tree Pattern Matching to Subset Matching in Linear Time

A Special Case of Tree Pattern Matching

Input: An ordered binary tree T, |T| = n.

An ordered binary tree P, |P| = m. Output: All nodes in T where P matches

Additional constraint: T has only one maximal θ-path, where θ is theperiod of the spine of P.

Page 17: Tree Pattern Matching to Subset Matching in Linear Time

A Special Case of Tree Pattern Matching

T P

Page 18: Tree Pattern Matching to Subset Matching in Linear Time

Reduce to Subset Matching (1)

P

B2

B6

0 0 1 0 0 1 0

Page 19: Tree Pattern Matching to Subset Matching in Linear Time

0

0abce

1

Reduce to Subset Matching (2)

B2B6

c d f

a

b e

0 1 0 0

1abdef

0

Page 20: Tree Pattern Matching to Subset Matching in Linear Time

Reduce to Subset Matching (3)

T

肋 2

肋 6肋 5

0 0 1 0 0 1 0 0 1 0 0

Page 21: Tree Pattern Matching to Subset Matching in Linear Time

1

Reduce to Subset Matching (4)

c

e

d f

a

b

a

b e

c f

0 0 0 0 1 0 0 1 0 0

肋 2

0abcef

Time: O( min{ m, 肋 2} )

Page 22: Tree Pattern Matching to Subset Matching in Linear Time

Reduce to Subset Matching (5)

Total Time: ∑i O( min{m, 肋 i} ) = O(n)

Page 23: Tree Pattern Matching to Subset Matching in Linear Time

How about the general case?

如果找出來的 maxima θ-paths不只一條 該怎麼 reduce呢 ?

暴力法 : 對每一條 maximal θ-paths都用剛才的方法 reduce成 subset matching problem。

Time?

Page 24: Tree Pattern Matching to Subset Matching in Linear Time

Where is the intuition come from?

Truncation lemma: If the first |θ| edges of each maximal θ-paths are removed, then those truncated paths are disjoint。

Page 25: Tree Pattern Matching to Subset Matching in Linear Time

Truncation Lemma (1)

重疊開始的地方和 u 的距離不可能是 θ的整數倍。

θ

θ

θ

p p’

θ

θ

u

Page 26: Tree Pattern Matching to Subset Matching in Linear Time

Truncation Lemma (2)

By Classical Lemma:

< |θ|

p

p’

Page 27: Tree Pattern Matching to Subset Matching in Linear Time

Warm up is over!

接下來要做的事 :

Part 1: 證明 truncated maximal θ-paths 可以在 linear time reduce 成 subset matching。

Part 2: 考慮被砍掉的部分該如何解決。

Page 28: Tree Pattern Matching to Subset Matching in Linear Time

Step 1: Find all maximal θ-paths

T Pθ

Link node

Page 29: Tree Pattern Matching to Subset Matching in Linear Time

Step 2: Filtering(1)

把不符合以下三個 property 的 maximal θ-pahts 過濾掉。

Property 1:

≥ m

Page 30: Tree Pattern Matching to Subset Matching in Linear Time

Step 2: Filtering (2)

Propety 2:

≥ m/2

∵ P

≥ m/2

Page 31: Tree Pattern Matching to Subset Matching in Linear Time

Step 2: Filtering (3)

Propety 3:

≥ m/2

P

≥θ

≥ m/2

≥θ

Page 32: Tree Pattern Matching to Subset Matching in Linear Time

Step 3: Truncation

將過濾後每一條 maximal θ-paths 開頭的 θ條 edges去掉。

Page 33: Tree Pattern Matching to Subset Matching in Linear Time

Step 4: Filtering again

把 truncated maximal θ-paths 再過濾一遍,剩下的這些 paths在之後將簡稱為 truncated paths.

Page 34: Tree Pattern Matching to Subset Matching in Linear Time

Step 5: 一條一條 reduce 成 subset matching

Time: ∑truncated paths ∑i O( min{m, 肋 i} )

Page 35: Tree Pattern Matching to Subset Matching in Linear Time

Analysis of Step 5 (1)

∑truncated paths ∑i O( min{m, 肋 i} )

= ∑O( min{m, 肋 } )

Page 36: Tree Pattern Matching to Subset Matching in Linear Time

Analysis of Step 5 (2)

∑O( min{m, 肋 } )

( 大肋 = 大於或等於 m 的肋骨,小肋 = 小於m 的肋骨 )

=∑O( min{m, 大肋 } ) + ∑O( min{m, 小肋 } )

= O(m * (# 大肋 )) + ∑O( 小肋 )

Page 37: Tree Pattern Matching to Subset Matching in Linear Time

Analysis of Step 5 (3)

O(m * (# 大肋 )) + ∑O( 小肋 )

剩下只需證明Part 1. (# 大肋 ) = O( n/m )

Part 2. ∑O( 小肋 ) = n

Page 38: Tree Pattern Matching to Subset Matching in Linear Time

Analysis of Step 5 (4)

Part 2: ∑O( 小肋 ) = n

大小

∵ 小肋骨 are disjoint

小大

< m

≥ m

Page 39: Tree Pattern Matching to Subset Matching in Linear Time

Marked nodes

Def: A node in t is marked if its left and right subtrees both contain ≥ m nodes.

Page 40: Tree Pattern Matching to Subset Matching in Linear Time

# marked nodes is O(n/m)

≥m ≥m

≥m ≥m

≥m

≥m ≥m

m = 2

Page 41: Tree Pattern Matching to Subset Matching in Linear Time

# marked nodes is O(n/m)

≥m ≥m ≥m ≥m ≥m ≥m ≥m

∵(# external nodes) * m ≤ n ∴ # external nodes ≤ n/m⇒ # marked nodes = # internal nodes ≤ n/m - 1

Page 42: Tree Pattern Matching to Subset Matching in Linear Time

Analysis of Step 5 (5)

Part 1. (# 大肋 ) = O( n/m )

一條 truncated path 上如果有 k > 1 根大肋骨,則有 k-1 個 maked nodes 。

大大

大大

Page 43: Tree Pattern Matching to Subset Matching in Linear Time

Analysis of Step 5 (6)

擁有 k > 1 根大肋骨的 truncated paths 上的大肋骨全部加起來是 O(n/m) 。

剩下的問題 : 有多少條擁有 k = 1 根大肋骨的 truncated paths?

Page 44: Tree Pattern Matching to Subset Matching in Linear Time

Analysis of Step 5 (7)

O(n/m) 條

≥ m/2

Page 45: Tree Pattern Matching to Subset Matching in Linear Time

An observation

擁有 k > 1 根大肋骨的 truncated paths 只有O(n/m) 條。

擁有 k = 1 根大肋骨的 truncated paths 只有O(n/m) 條。

擁有 k = 0 根大肋骨的 truncated paths 只有O(n/m) 條。

所以 truncated paths 只有 O(n/m) 條。

Page 46: Tree Pattern Matching to Subset Matching in Linear Time

Disjoint Lemma

Let C be a set of disjoint θ-paths and these θ-paths satisfy property 1~3. Then there are O(n/m) θ- paths in C.

Pf: 擁有 k > 1 根大肋骨的 θ-paths 只有 O(n/m) 條。 擁有 k = 1 根大肋骨的 θ-paths 只有 O(n/m) 條。 擁有 k = 0 根大肋骨的 θ-paths 只有 O(n/m) 條。

Page 47: Tree Pattern Matching to Subset Matching in Linear Time

Review

Step 1: Finding all maximal θ-paths Step 2: Filtering Step 3: Truncation Step 4: Filtering again Step 5: Reduce to subset mathching

Page 48: Tree Pattern Matching to Subset Matching in Linear Time

θ

θ

θ

θ

How about the removed parts?

P

Time: O(m)

Page 49: Tree Pattern Matching to Subset Matching in Linear Time

The Last Job

Step 1: Finding all maximal θ-paths Step 2: Filtering

only O(n/m) paths left. Step 3: Truncation Step 4: Filtering again Step 5: Reduce to subset mathching

Page 50: Tree Pattern Matching to Subset Matching in Linear Time

Tail Lemma

path 的尾巴不會被其他 path 碰到。

Page 51: Tree Pattern Matching to Subset Matching in Linear Time

Chains

. . .

Page 52: Tree Pattern Matching to Subset Matching in Linear Time

Chain Lemma

0

1

2

3

一條 chain 上

(1) 編號 1, 3, 5 , 7 , … 的 paths 會是 disjoint 。

(2) 編號 0, 2, 4, 6, 8, … 的 paths 會是 disjoint 。

0

1

2

3

Page 53: Tree Pattern Matching to Subset Matching in Linear Time

Truncated-Chains Lemma(1)

Truncated-Chains lemma: If the first two paths of each chains are removed, then those truncated chains are disjoint。

Page 54: Tree Pattern Matching to Subset Matching in Linear Time

Truncated-Chains Lemma(2)ρ

ρ’

Page 55: Tree Pattern Matching to Subset Matching in Linear Time

Truncated-Chains Lemma(3)

Case 1:ρ ρ’

≥ θ

Page 56: Tree Pattern Matching to Subset Matching in Linear Time

Truncated-Chains Lemma(4)

Case 2:ρ ρ’

≥ θ

Page 57: Tree Pattern Matching to Subset Matching in Linear Time

Truncated-Chains Lemma(5)

Case 3:ρ ρ’

≥ θ

Page 58: Tree Pattern Matching to Subset Matching in Linear Time

Truncated-Chains Lemma(6)

Case 4:ρ ρ’

≥ θ

Page 59: Tree Pattern Matching to Subset Matching in Linear Time

Almost Over (1)

Chain lemma: 在一條 chain 上(1) 編號 1, 3, 5, 7 , … 的 paths 會是 disjoint 。(2) 編號 0, 2, 4, 6 , … 的 paths 會是 disjoint 。 Truncated-Chains lemma

去掉編號 0, 1 的 paths 後的 chains 會是 disjoint

⇒(1) 編號 3, 5, 7, … 的 paths 會是 disjoint (2) 編號 2, 4, 6 , … 的 paths 會是 disjoint

Page 60: Tree Pattern Matching to Subset Matching in Linear Time

Almost Over (2)

(1) 編號 3, 5, 7, … 的 paths 會是 disjoint(2) 編號 2, 4, 6 , … 的 paths 會是 disjoint

By Disjoint Lemma:(1) 編號 3, 5, 7, … 的 paths 共 O(n/m) 條(2) 編號 2, 4, 6, … 的 paths 共 O(n/m) 條

Page 61: Tree Pattern Matching to Subset Matching in Linear Time

Almost Over (3)

If #chains = O(n/m), then

編號 0 與 1 的 paths 共 O(2n/m) = O(n/m)條。

Page 62: Tree Pattern Matching to Subset Matching in Linear Time

Over (1)

Maximal Connected

chains

Maximal Connected

chains

Maximal Connected

chains

Page 63: Tree Pattern Matching to Subset Matching in Linear Time

Over (2)

maximal conneted chains 中如果有 k > 1 條 chains 則有 k – 1 個 marked nodes 。

Page 64: Tree Pattern Matching to Subset Matching in Linear Time

Over (3)

擁有 k > 1 條 chains 的 maximal connected chains 的 chains 全部加起來是 O(n/m) 。

Page 65: Tree Pattern Matching to Subset Matching in Linear Time

Over(4)

By Disjont Lemma: 擁有 k = 1 條 chains 的 maximal connected

chains 共 O(n/m) 個。

Page 66: Tree Pattern Matching to Subset Matching in Linear Time

Unproved Lemma

Maximal θ-paths in linear time. (Lemma 2.2 in M. Dubiner, Z. Galil, and E. Magen. Fast Tree Patt

ern Matching, J. ACM, 1994. ) Chain Lemma. (Lemma 5.5 in R. Cole and R. Hariharan. Tree Pattern Matching

to Subset Matching in Linear Time, SIAM J. Computing, 2003.)