Dependency ParsingDependency ParsingParsing AlgorithmsParsing Algorithms
Peng.Huang
OutlineOutline Introduction
Phrase Structure Grammar Dependency GrammarComparison and Conversion
Dependency ParsingFormal definition
Parsing Algorithms IntroductionDynamic programming Constraint satisfactionDeterministic search
2
IntroductionIntroduction Syntactic Parsing
finding the correct syntactic structure of that sentence in a given formalism/grammar
Two FormalismDependency Parsing (DG)Phrase Structure Grammar (PSG)
3
Phrase Structure Grammar (PSG)Phrase Structure Grammar (PSG)
Breaks sentence into constituents (phrases)Which are then broken into smaller
constituentsDescribes phrase structure, clause
structureE.g.. NP, PP, VP etc..
Structures often recursiveThe clever tall blue-eyed old … man
4
Tree StructureTree Structure
5
Dependency GrammarDependency Grammar Syntactic structure consists of lexical items,
linked by binary asymmetric relations called dependencies
Interested in grammatical relations between individual words (governing & dependent words)
Does not propose a recursive structureRather a network of relations
These relations can also have labels
6
ComparisonComparison
• Dependency structures explicitly represent – Head-dependent relations (directed arcs)– Functional categories (arc labels)– Possibly some structural categories (parts-of-speech)
• Phrase structure explicitly represent – Phrases (non-terminal nodes)– Structural categories (non-terminal labels)– Possibly some functional categories (grammatical
functions)
7
Conversion … PSG to DGConversion … PSG to DG Head of a phrase governs/dominates all the
siblings
Heads are calculated using heuristics
Dependency relations are established between the head of each phrase as the parent and its siblings as children
The tree thus formed is the unlabeled dependency tree
8
PSG to DGPSG to DG
9
Conversion … DG to PSGConversion … DG to PSG Each head together with its dependents (and
their dependents) forms a constituent of the sentence
Difficult to assign the structural categories (NP, VP, S, PP etc…) to these derived constituents
Every projective dependency grammar has a strongly equivalent context-free grammar, but not vice versa [Gaifman 1965]
10
OutlineOutline Introduction
Phrase Structure Grammar Dependency GrammarComparison and Conversion
Dependency ParsingFormal definition
Parsing Algorithms IntroductionDynamic programming Constraint satisfactionDeterministic search
11
Dependency TreeDependency Tree Formal definition
– An input word sequence w1…wn
– Dependency graph D = (W,E) whereW is the set of nodes i.e. word tokens in the input seq E is the set of unlabeled tree edges (wi, wj) (wi, wj є W) (wi, wj) indicates an edge from wi (parent) to wj (child)
Formal definition
• Task of mapping an input string to a dependency graph satisfying certain conditions is called dependency parsing
12
Well-formednessWell-formedness A dependency graph is well-formed iff
Single head: Each word has only one head
Acyclic: The graph should be acyclic
Connected: The graph should be a single tree with all the words in the sentence
• Projective: If word A depends on word B, then all words between A and B are also subordinate to B (i.e. dominated by B)
13
14
Non-projective dependency treeNon-projective dependency tree
Ram saw a dog yesterday which was a Yorkshire Terrier
* Crossing lines
English has very few non-projective cases.
OutlineOutline Introduction
Phrase Structure Grammar Dependency GrammarComparison and Conversion
Dependency ParsingFormal definition
Parsing Algorithms IntroductionDynamic programming Constraint satisfactionDeterministic search
15
Dependency Parsing (P. Mannem)16
Dependency ParsingDependency Parsing
• Dependency based parsers can be broadly categorized into – Grammar driven approaches
• Parsing done using grammars.
– Data driven approaches• Parsing by training on annotated/un-annotated
data.
May 28, 2008
Well-formednessWell-formedness A dependency graph is well-formed iff
Single head: Each word has only one head
Acyclic: The graph should be acyclic
Connected: The graph should be a single tree with all the words in the sentence
• Projective: If word A depends on word B, then all words between A and B are also subordinate to B (i.e. dominated by B)
17
18
Parsing MethodsParsing Methods
• Three main traditions– Dynamic programming
• CYK, Eisner, McDonald
– Constraint satisfaction• Maruyama, Foth et al., Duchier
– Deterministic search• Covington, Yamada and Matsumuto, Nivre
Dynamic ProgrammingDynamic Programming Basic Idea: Treat dependencies as constituents Use, e.g. , CYK parser (with minor modifications)
19
Eisner 1996Eisner 1996 Two novel aspects:
Modified parsing algorithmProbabilistic dependency parsing
Time requirement: O(n3) Modification: Instead of storing subtrees, store
spans Span: Substring such that no interior word links
to any word outside the span Idea: In a span, only the boundary words are
active, i.e. still need a head or a child
20
21
Red
figures
on
the
screen
indicated
falling
stocks_ROOT_
ExampleExample
22
Red figures on the screen indicated falling stocks_ROOT_
ExampleExample
indicated falling stocksRed figures
Spans:
23
Red figures on the screen indicated falling stocks_ROOT_
Assembly of correct parseAssembly of correct parse
Red figures
Start by combining adjacent words to minimal spans
figures on on the
24
Red figures on the screen indicated falling stocks_ROOT_
Assembly of correct parseAssembly of correct parse
Combine spans which overlap in one word; this word must be governed by a word in the left or right span.
on the the screen+ → on the screen
25
Red figures on the screen indicated falling stocks_ROOT_
Assembly of correct parseAssembly of correct parse
Combine spans which overlap in one word; this word must be governed by a word in the left or right span.
figures
on + →on the screen figures on the screen
26
Red figures on the screen indicated falling stocks_ROOT_
Assembly of correct parseAssembly of correct parse
Combine spans which overlap in one word; this word must be governed by a word in the left or right span.
Invalid span
Red figures on the screen
McDonald’s Maximum Spanning TreesMcDonald’s Maximum Spanning Trees
Score of a dependency tree = sum of scores of dependencies
Scores are independent of other dependencies
If scores are available, parsing can be formulated as maximum spanning tree problem
Uses online learning for determining weight vector w
27
McDonald’s AlgorithmMcDonald’s Algorithm
• Score entire dependency tree
• Online Learning
28
29
Parsing MethodsParsing Methods
• Three main traditions– Dynamic programming
• CYK, Eisner, McDonald
– Constraint satisfaction• Maruyama, Foth et al., Duchier
– Deterministic search• Covington, Yamada and Matsumuto, Nivre
Dependency Parsing (P. Mannem)30
Constraint SatisfactionConstraint Satisfaction
• Uses Constraint Dependency Grammar• Grammar consists of a set of boolean
constraints, i.e. logical formulas that describe well-formed trees
• A constraint is a logical formula with variables that range over a set of predefined values
• Parsing is defined as a constraint satisfaction problem
• Constraint satisfaction removes values that contradict constraints
May 28, 2008
31
Parsing MethodsParsing Methods
• Three main traditions– Dynamic programming
• CYK, Eisner, McDonald
– Constraint satisfaction• Maruyama, Foth et al., Duchier
– Deterministic search• Covington, Yamada and Matsumuto, Nivre
Deterministic ParsingDeterministic Parsing
Basic idea– Derive a single syntactic representation
(dependency graph) through a deterministic sequence of elementary parsing actions
– Sometimes combined with backtracking or repair
32
Shift-Reduce Type AlgorithmsShift-Reduce Type Algorithms
• Data structures:– Stack [. . . ,wi ]S of partially processed tokens
– Queue [wj , . . .]Q of remaining input tokens
• Parsing actions built from atomic actions:– Adding arcs (wi → wj , wi ← wj )
– Stack and queue operations
• Left-to-right parsing
33
34
Yamada’s AlgorithmYamada’s Algorithm
Three parsing actions: Shift [. . .]S [wi , . . .]Q
[. . . , wi ]S [. . .]Q
Left [. . . , wi , wj ]S [. . .]Q
[. . . , wi ]S [. . .]Q wi → wj
Right [. . . , wi , wj ]S [. . .]Q
[. . . , wj ]S [. . .]Q wi ← wj
Multiple passes over the input give time complexity O(n2)
35
Yamada and MatsumotoYamada and Matsumoto
• Parsing in several rounds: deterministic bottom-up O(n2)
• Looks at pairs of words
• 3 actions: shift, left, right
• Shift: shifts focus to next word pair
36
Yamada and MatsumotoYamada and Matsumoto
• Right: decides that the right word depends on the left word
Left: decides that the left word depends on the right one
37
Nivre’s AlgorithmNivre’s Algorithm
• Four parsing actions:
Shift [. . .]S [wi , . . .]Q
[. . . , wi ]S [. . .]Q
Reduce [. . . , wi ]S [. . .]Q Эwk : wk → wi
[. . .]S [. . .]Q
Left-Arc [. . . , wi ]S [wj , . . .]Q ¬Эwk : wk → wi
[. . .]S [wj , . . .]Q wi ← wj
Right-Arc [. . . ,wi ]S [wj , . . .]Q ¬Эwk : wk → wj
[. . . , wi , wj ]S [. . .]Q wi → wj
Dependency Parsing (P. Mannem)38
Nivre’s AlgorithmNivre’s Algorithm
• Characteristics:– Arc-eager processing of right-dependents– Single pass over the input gives time worst
case complexity O(2n)
May 28, 2008
39
Red figures on the screen indicated falling stocks_ROOT_
Example
S Q
40
Red figures on the screen indicated falling stocks_ROOT_
ExampleExample
S Q
Shift
41
Red figures on the screen indicated falling stocks_ROOT_
ExampleExample
S Q
Left-arc
42
Red figures on the screen indicated falling stocks_ROOT_
ExampleExample
S Q
Shift
43
Red figures on the screen indicated falling stocks_ROOT_
ExampleExample
S Q
Right-arc
44
Red figures on the screen indicated falling stocks_ROOT_
ExampleExample
S Q
Shift
45
Red figures on the screen indicated falling stocks_ROOT_
ExampleExample
S Q
Left-arc
46
Red figures on the screen indicated falling stocks_ROOT_
ExampleExample
S Q
Right-arc
47
Red figures on the screen indicated falling stocks_ROOT_
ExampleExample
S Q
Reduce
48
Red figures on the screen indicated falling stocks_ROOT_
ExampleExample
S Q
Reduce
49
Red figures on the screen indicated falling stocks_ROOT_
ExampleExample
S Q
Left-arc
50
Red figures
on the screen indicated falling stocks_ROOT_
ExampleExample
S Q
Right-arc
51
Red figures
on the screen indicated falling stocks_ROOT_
ExampleExample
S Q
Shift
52
Red figures on the screen indicated falling stocks_ROOT_
ExampleExample
S Q
Left-arc
53
Red figures on the screen indicated falling stocks_ROOT_
ExampleExample
S Q
Right-arc
54
Red figures on the screen indicated falling stocks_ROOT_
ExampleExample
S Q
Reduce
55
Red figures on the screen indicated
falling stocks_ROOT_
ExampleExample
S Q
Reduce
The EndThe End
•QAQA
56
Top Related