Download - Non-projective Dependency Parsing using Spanning Tree Algorithm

Non-projective Dependency Parsing using Spanning Tree Algorithm

R98922004 Yun-Nung Chen資工碩一陳縕儂

1 /39

Reference

Non-projective Dependency Parsing using Spanning Tree Algorithms (HLT/EMNLP 2005) Ryan McDonald, Fernando Pereira, Kiril Ribarov,

Jan Hajic

2 /39

Introduction

3 /39

Example of Dependency Tree Each word depends on exactly one

parent

Projective Words in linear order, satisfying▪ Edges without crossing▪ A word and its descendants form a contiguous

substring of the sentence4 /39

Non-projective Examples

English Most projective, some non-projective

Languages with more flexible word order Most non-projective▪ German, Dutch, Czech

5 /39

Advantage of Dependency Parsing

Related work relation extraction machine translation

6 /39

Main Idea of the Paper

Dependency parsing can be formalized as the search for a maximum spanning tree

in a directed graph

7 /39

Dependency Parsing and Spanning Trees

8 /39

Edge based Factorization (1/3)

sentence: x = x1 … xn

the directed graph Gx = ( Vx , Ex ) given by

dependency tree for x: y the tree Gy = ( Vy , Ey )

Vy = Vx

Ey = {(i, j), there’s a dependency from xi to xj } 9 /39


scores of edges

score of a dependency tree y for sentence x

10 /39


11 /39

x = John hit the ball with the batroot

hit

John ball

the

with

bat

the

y1root

ball

John hit

the

with

bat the

y2

root

John

ball

hit

the

with

bat the

y3

Two Focus Points

1) How to decide weight vector w2) How to find the tree with the

maximum score

12 /39

Maximum Spanning Trees

dependency trees for x= spanning trees for Gx

the dependency tree with maximum score for x = maximum spanning trees for Gx

13 /39

Maximum Spanning Tree Algorithm

14 /39

Chu-Liu-Edmonds Algorithm (1/12)

Input: graph G = (V, E) Output: a maximum spanning tree in

G① greedily select the incoming edge with

highest weight▪ Tree▪ Cycle in G

② contract cycle into a single vertex and recalculate edge weights going into and out the cycle

15 /39


x = John saw Mary

16 /39

saw

root

John

Mary

930

10

20

9

3

30

11

0

Gx


For each word, finding highest scoring incoming edge

17 /39

saw

root

John

Mary

930

10

20

9

3

30

11

0

Gx


If the result includes Tree – terminate and output Cycle – contract and recalculate

18 /39

saw

root

John

Mary

930

10

20

9

3

30

11

0

Gx


Contract and recalculate▪ Contract the cycle into a single node▪ Recalculate edge weights going into and out

the cycle

19 /39

saw

root

John

Mary

930

10

20

9

3

30

11

0

Gx


Outcoming edges for cycle

20 /39

saw

root

John

Mary

930

10

9

3

11

0

Gx

20

30


Incoming edges for cycle

,

21 /39

saw

root

John

Mary

930

10

9

11

0

Gx

20

30


x = root▪ s(root, John) – s(a(John), John) + s(C) = 9-30+50=29▪ s(root, saw) – s(a(saw), saw) + s(C) = 10-20+50=40

22 /39

saw

root

John

Mary

930

10

9

11

0

Gx 40

29

20

30


x = Mary▪ s(Mary, John) – s(a(John), John) + s(C) = 11-

30+50=31▪ s(Mary, saw) – s(a(saw), saw) + s(C) = 0-20+50=30

23 /39

saw

root

John

Mary

930

11

0

Gx

31

40

302

030


24 /39

saw

root

John

Mary

930

Gx

Reserving highest tree in cycle Recursive run the algorithm

31

40

20

30

30


25 /39

saw

root

John

Mary

930

Gx

Finding incoming edge with highest score Tree: terminate and output

31

40

30


26 /39

saw

root

John

Mary

30

Gx

Maximum Spanning Tree of Gx

30

4010

Complexity of Chu-Liu-Edmonds Algorithm

Each recursive call takes O(n2) to find highest incoming edge for each word

At most O(n) recursive calls(contracting n times)

Total: O(n3) Tarjan gives an efficient

implementation of the algorithm with O(n2) for dense graphs

27 /39

Algorithm for Projective Trees

Eisner Algorithm: O(n3) Using bottom-up dynamic

programming Maintain the nested structural constraint

(non-crossing constraint)

28 /39

Online Large Margin Learning

29 /39

Online Large Margin Learning

Supervised learning Target: training weight vectors w

between two features (PoS tag)

Training data: Testing data: x

30 /39

MIRA Learning Algorithm Margin Infused Relaxed Algorithm

(MIRA) dt(x): the set of possible dependency

trees for x

31 /39

keep new vector as close as possible to the old

final weight vector is the average of the weight vectors after each iteration

Single-best MIRA

Using only the single margin constraint

32 /39

Factored MIRA

Local constraints

correct incoming edge for jother incoming edge for j

correct spanning treeincorrect spanning trees

More restrictive than original constraints33 /39

a margin of 1 the number of incorrect edges

Experiments

34 /39

Experimental Setting

Language: Czech More flexible word order than English▪ Non-projective dependency

Feature: Czech PoS tag standard PoS, case, gender, tense

Ratio of non-projective and projective Less than 2% of total edges are non-projective▪ Czech-A: entire PDT▪ Czech-B: including only the 23% of sentences with

non-projective dependency

35 /39

Compared Systems

COLL1999 The projective lexicalized phrase-

structure parser N&N2005

The pseudo-projective parser McD2005

The projective parser using Eisner and 5-best MIRA

Single-best MIRA Factored MIRA

The non-projective parser using Chu-Liu-Edmonds

36 /39

Results of Czech

Czech-A (23% non-projective)

Accuracy Complete

82.8 -

80.0 31.8

83.3 31.3

84.1 32.2

84.4 32.3

37 /39

Czech-B (non-projective)

Accuracy Complete

- -

- -

74.8 0.0

81.0 14.9

81.5 14.3

COLL1999 O(n5)

N&N2005

McD2005 O(n3)

Single-best MIRA O(n2)

Factored MIRA O(n2)

Results of English

English

Accuracy Complete

90.9 37.5

90.2 33.2

90.2 32.3

38 /39

McD2005 O(n3)

Single-best MIRA O(n2)

Factored MIRA O(n2)

English projective dependency trees Eisner algorithm uses the a priori

knowledge that all trees are projective

Thanks for your attention!

39/39