Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted...

75
Dependency Grammars and Parsing CMSC 473/673 UMBC

Transcript of Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted...

Page 1: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Dependency Grammars and Parsing

CMSC 473/673

UMBC

Page 2: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Outline

Review: PCFGs and CKY

Dependency Grammars

Dependency Parsing (Shift-reduce)

From syntax to semantics

Page 3: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Probabilistic Context Free Grammar

Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals

Terminals: the words in the language (the lexicon), e.g., Baltimore

Non-terminals: symbols that can trigger rewrite rules, e.g., S, NP, Noun

(Sometimes) Pre-terminals: symbols that can only trigger lexical rewrites, e.g., Noun

1.0 S NP VP.4 NP Det Noun.3 NP Noun.2 NP Det AdjP.1 NP NP PP

1.0 PP P NP.34 AdjP Adj Noun.26 VP V NP.0003 Noun Baltimore…

Q: What are the distributions? What must sum to 1?

A: P(X Y Z | X)

Page 4: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Probabilistic Context Free Grammar

p(S

NP VP

Noun

Baltimore

Verb NP

is a great city

)=

p(S

NP VP

) *

p( ) *

p( ) *

NP

Noun

p( ) *Noun

Baltimore

VP

Verb NP

p( ) *Verb

is

p( )NP

a great city

product of probabilities of individual rules used in the

derivation

Page 5: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

“Papa ate the caviar with a spoon”S NP VP

NP Det N

NP NP PP

VP V NP

VP VP PP

PP P NP

NP Papa

N caviar

N spoon

V spoon

V ate

P with

Det the

Det a

0 1 2 3 4 5 6 7

Example from Jason Eisner

Entire grammarAssume uniform

weights

First: Let’s find all NPs

(NP, 0, 1): Papa(NP, 2, 4): the caviar(NP, 5, 7): a spoon(NP, 2, 7): the caviar with a spoon

Second: Let’s find all VPs

(VP, 1, 7): ate the caviar with a spoon(VP, 1, 4): ate the caviar

Third: Let’s find all Ss

(S, 0, 7): Papa ate the caviar with a spoon(S, 0, 4): Papa ate the caviar

NP

VP

0

6

1

2

3

4

5

1 2 3 4 5 6 7

(NP, 0, 1) (VP, 1, 7) (S, 0, 7)

start

end

S

Page 6: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

CKY Recognizer

Input: * string of N words* grammar in CNF

Output: True (with parse)/False

Data structure: N*N table TRows indicate span

start (0 to N-1)Columns indicate span

end (1 to N)

T[i][j] lists constituents spanning i j

For Viterbi in HMMs:build table left-to-right

For CKY in trees:1. build smallest-to-largest &2. left-to-right

Page 7: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

CKY RecognizerT = Cell[N][N+1]

for(j = 1; j ≤ N; ++j) {T[j-1][j].add(X for non-terminal X in G if X wordj)

}

for(width = 2; width ≤ N; ++width) {for(start = 0; start < N - width; ++start) {

end = start + widthfor(mid = start+1; mid < end; ++mid) {

for(non-terminal Y : T[start][mid]) {for(non-terminal Z : T[mid][end]) {

T[start][end].add(X for rule X Y Z : G)}

}}

}}

X

Y Z

Y Z

Page 8: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

CKY is Versatile: PCFG TasksTask PCFG algorithm name HMM analog

Find any parse CKY recognizer none

Find the most likely parse (for an observed sequence)

CKY weighted Viterbi Viterbi

Calculate the (log) likelihood of an observed sequence w1, …, wN Inside algorithm Forward algorithm

Learn the grammar parametersInside-outside algorithm (EM)

Forward-backward/Baum-Welch (EM)

Page 9: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Outline

Review: PCFGs and CKY

Dependency Grammars

Dependency Parsing (Shift-reduce)

From syntax to semantics

Page 10: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Structure vs. Word Relations

Constituency trees/analyses: based on structure

Dependency analyses: based on word relations

Page 11: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Remember: (P)CFGs HelpClearly Show Ambiguity

I ate the meal with friends

NP VP

VP NP PP

S

NP VP

S

VP NP

PPNP

Page 12: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

CFGs to Dependencies

I ate the meal with friends

NP VP

VP NP PP

S

NP VP

S

VP NP

PPNP

Page 13: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

CFGs to Dependencies

I ate the meal with friends

NP VP

VP NP PP

S

NP VP

S

VP NP

PPNP

Page 14: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

CFGs to Dependencies

I ate the meal with friends

NP VP

VP NP PP

S

NP VP

S

VP NP

PPNP

Page 15: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

CFGs toLabeled Dependencies

I ate the meal with friends

NP VP

VP NP PP

S

NP VP

S

VP NP

PPNPnsubj dobj nmod

nsubj dobj nmod

Page 16: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Labeled Dependencies

Word-to-word labeled relations

governor (head)

dependentChris ate

nsubj

Page 17: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Labeled Dependencies

Word-to-word labeled relations

governor (head)

dependentChris ate

nsubj

de Marneffe et al., 2014

Page 18: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

http://universaldependencies.org/

Page 19: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

(Labeled) Dependency Parse

Directed graphs

Vertices: linguistic blobs in a sentence

Edges: (labeled) arcs

Page 20: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

(Labeled) Dependency Parse

Directed graphs

Vertices: linguistic blobs in a sentence

Edges: (labeled) arcs

Often directed trees

1. A single root node with no incoming arcs

2. Each vertex except root has exactly one incoming arc

3. Unique path from the root node to each vertex

Page 21: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Projective Dependency Trees

No crossing arcs

SLP3: Figs 14.2, 14.3

✔ Projective

Page 22: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Projective Dependency Trees

No crossing arcs

SLP3: Figs 14.2, 14.3

✔ Projective

✖ Not projective

Page 23: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Projective Dependency Trees

No crossing arcs

SLP3: Figs 14.2, 14.3

✔ Projective

✖ Not projective

non projective parses capture• certain long-range dependencies• free word order

Page 24: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Are CFGs for Naught?

Nope! Simple algorithm from Xia and Palmer (2011)

1. Mark the head child of each node in a phrase structure, using “appropriate” head rules.

2. In the dependency structure, make the head of each non-head child depend on the head of the head-child.

Page 25: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Are CFGs for Naught?

Nope! Simple algorithm from Xia and Palmer (2011)

1. Mark the head child of each node in a phrase structure, using “appropriate” head rules.

2. In the dependency structure, make the head of each non-head child depend on the head of the head-child.

Papa ate the caviar with a spoon

NP V D N P D N

NP NP

PPVP

VP

S

Page 26: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Are CFGs for Naught?

Nope! Simple algorithm from Xia and Palmer (2011)

1. Mark the head child of each node in a phrase structure, using “appropriate” head rules.

2. In the dependency structure, make the head of each non-head child depend on the head of the head-child.

Papa ate the caviar with a spoon

NP V D N P D N

NP NP

PPVP

VP

S

spooncaviar

Page 27: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Are CFGs for Naught?

Nope! Simple algorithm from Xia and Palmer (2011)

1. Mark the head child of each node in a phrase structure, using “appropriate” head rules.

2. In the dependency structure, make the head of each non-head child depend on the head of the head-child.

Papa ate the caviar with a spoon

NP V D N P D N

NP NP

PPVP

VP

S

ate spoon

spooncaviar

Page 28: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Are CFGs for Naught?

Nope! Simple algorithm from Xia and Palmer (2011)

1. Mark the head child of each node in a phrase structure, using “appropriate” head rules.

2. In the dependency structure, make the head of each non-head child depend on the head of the head-child.

Papa ate the caviar with a spoon

NP V D N P D N

NP NP

PPVP

VP

S

ate spoon

spooncaviar

ate

Page 29: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Are CFGs for Naught?

Nope! Simple algorithm from Xia and Palmer (2011)

1. Mark the head child of each node in a phrase structure, using “appropriate” head rules.

2. In the dependency structure, make the head of each non-head child depend on the head of the head-child.

Papa ate the caviar with a spoon

NP V D N P D N

NP NP

PPVP

VP

S

ate spoon

spooncaviar

ate

ate

Page 30: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Dependency Post-Processing(Keep Tree Structure)

Page 31: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Dependency Post-Processing(Get Possible Graph Structure)

Amaranthus, collectively known as amaranth, is a cosmopolitan genus of annual or short-lived perennial plants.

Page 32: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Outline

Review: PCFGs and CKY

Dependency Grammars

Dependency Parsing (Shift-reduce)

From syntax to semantics

Page 33: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

(Some) Dependency Parsing Algorithms

Dynamic Programming

Eisner Algorithm (Eisner 1996)

Transition-based

Shift-reduce, arc standard

Graph-based

Maximum spanning tree

Page 34: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

(Some) Dependency Parsing Algorithms

Dynamic Programming

Eisner Algorithm (Eisner 1996)

Transition-based

Shift-reduce, arc standard

Graph-based

Maximum spanning tree

Page 35: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Shift-Reduce Parsing

Recall from CMSC 331

Bottom-up

Tools: input words, some special root symbol ($), and a stack to hold configurations

Page 36: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Shift-Reduce Parsing

Recall from CMSC 331Bottom-up

Tools: input words, some special root symbol ($), and a stack to hold configurations

Shift:– move tokens onto the stack– match top two elements of the stack against the grammar

(RHS)

Reduce:– If match occurred, place LHS symbol on the stack

Page 37: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Shift-Reduce Dependency Parsing

Tools: input words, some special root symbol ($), and a stack to hold configurations

Shift:– move tokens onto the stack

– decide if top two elements of the stack form a valid (good) grammatical dependency

Reduce:– If there’s a valid relation, place head on the stack

Page 38: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Shift-Reduce Dependency Parsing

Tools: input words, some special root symbol ($), and a stack to hold configurations

Shift:– move tokens onto the stack

– decide if top two elements of the stack form a valid (good) grammatical dependency

Reduce:– If there’s a valid relation, place head on the stack

decide how?Search problem!

Page 39: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Shift-Reduce Dependency Parsing

Tools: input words, some special root symbol ($), and a stack to hold configurations

Shift:– move tokens onto the stack

– decide if top two elements of the stack form a valid (good) grammatical dependency

Reduce:– If there’s a valid relation, place head on the stack

decide how?Search problem!

what is valid?Learn it!

Page 40: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Shift-Reduce Dependency Parsing

Tools: input words, some special root symbol ($), and a stack to hold configurations

Shift:– move tokens onto the stack

– decide if top two elements of the stack form a valid (good) grammatical dependency

Reduce:– If there’s a valid relation, place head on the stack

decide how?Search problem!

what is valid?Learn it!

what are the possible actions?

Page 41: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Shift-Reduce Dependency Parsing

Tools: input words, some special root symbol ($), and a stack to hold configurations

Shift:– move tokens onto the stack

– decide if top two elements of the stack form a valid (good) grammatical dependency

Reduce:– If there’s a valid relation, place head on the stack

decide how?Search problem!

what is valid?Learn it!

what are the possible actions?

Page 42: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Shift-Reduce Actions

Possibility

Assign the current word as the head of some previously seen word

Assign some previously seen word as the head

of the current word

Wait processing the current word; add it for

later

Page 43: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Shift-Reduce Actions

PossibilityActionName

Action Meaning

Assign the current word as the head of some previously seen word

LEFTARC

Assert a head-dependent relation between the word at the top of stack and the word directly

beneath it; remove the lower word from the stack

Assign some previously seen word as the head

of the current wordRIGHTARC

Assert a head-dependent relation between the second word on the stack and the word at the top;

remove the word at the top of the stack

Wait processing the current word; add it for

laterSHIFT

Remove the word from the front of the input buffer and push it onto the stack

Page 44: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Shift-Reduce Actions

PossibilityActionName

Action Meaning

Assign the current word as the head of some previously seen word

LEFTARC

Assert a head-dependent relation between the word at the top of stack and the word directly

beneath it; remove the lower word from the stack

Assign some previously seen word as the head

of the current wordRIGHTARC

Assert a head-dependent relation between the second word on the stack and the word at the top;

remove the word at the top of the stack

Wait processing the current word; add it for

laterSHIFT

Remove the word from the front of the input buffer and push it onto the stack

“Arc standard”

Page 45: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Arc Standard Parsing

state {[root], [words], [] }

Page 46: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Arc Standard Parsing

state {[root], [words], [] }

while state not final {

}

Page 47: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Arc Standard Parsing

state {[root], [words], [] }

while state not final {

t ← ORACLE(state)

state ← APPLY(t, state)

}

Page 48: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Arc Standard Parsing

state {[root], [words], [] }

while state not final {

t ← ORACLE(state)

state ← APPLY(t, state)

}

return state

Page 49: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Arc Standard Parsing

state {[root], [words], [] }

while state ≠ {[root], [], [(deps)]} {

t ← ORACLE(state)

state ← APPLY(t, state)

}

return state

Page 50: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Arc Standard Parsing

state {[root], [words], [] }

while state ≠ {[root], [], [(deps)]} {

t ← ORACLE(state)

state ← APPLY(t, state)

}

return state

PossibilityActionName

Action Meaning

Assign the current word as the head of some previously seen word

LEFTARC

Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove

the lower word from the stack

Assign some previously seen word as the head of

the current wordRIGHTARC

Assert a head-dependent relation between the second word on the stack and the word at the top; remove the

word at the top of the stack

Wait processing the current word; add it for

laterSHIFT

Remove the word from the front of the input buffer and push it onto the stack

Page 51: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Arc Standard Parsing

state {[root], [words], [] }

while state ≠ {[root], [], [(deps)]} {

t ← ORACLE(state)

state ← APPLY(t, state)

}

return state

PossibilityActionName

Action Meaning

Assign the current word as the head of some previously seen word

LEFTARC

Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove

the lower word from the stack

Assign some previously seen word as the head of

the current wordRIGHTARC

Assert a head-dependent relation between the second word on the stack and the word at the top; remove the

word at the top of the stack

Wait processing the current word; add it for

laterSHIFT

Remove the word from the front of the input buffer and push it onto the stack

Page 52: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Arc Standard Parsing

state {[root], [words], [] }

while state ≠ {[root], [], [(deps)]} {

t ← PREDICT(state)

state ← APPLY(t, state)

}

return state

what is valid?Learn it!

PossibilityActionName

Action Meaning

Assign the current word as the head of some previously seen word

LEFTARC

Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove

the lower word from the stack

Assign some previously seen word as the head of

the current wordRIGHTARC

Assert a head-dependent relation between the second word on the stack and the word at the top; remove the

word at the top of the stack

Wait processing the current word; add it for

laterSHIFT

Remove the word from the front of the input buffer and push it onto the stack

Page 53: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Shift-Reduce Dependency Parsing

Tools: input words, some special root symbol ($), and a stack to hold configurations

Shift:– move tokens onto the stack

– decide if top two elements of the stack form a valid (good) grammatical dependency

Reduce:– If there’s a valid relation, place head on the stack

decide how?Search problem!

what is valid?Learn it!

what are the possible actions?

Page 54: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Learning An Oracle (Predictor)

Training data: dependency treebank

Input: configuration

Output: {LEFTARC, RIGHTARC, SHIFT}

t ← ORACLE(state)

Page 55: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Learning An Oracle (Predictor)

Training data: dependency treebank

Input: configuration

Output: {LEFTARC, RIGHTARC, SHIFT}

t ← ORACLE(state)

• Choose LEFTARC if it produces a correct head-dependent relation given the reference parse and the current configuration

Page 56: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Learning An Oracle (Predictor)

Training data: dependency treebank

Input: configuration

Output: {LEFTARC, RIGHTARC, SHIFT}

t ← ORACLE(state)

• Choose LEFTARC if it produces a correct head-dependent relation given the reference parse and the current configuration

• Choose RIGHTARC if • it produces a correct head-dependent relation given the reference parse and

Page 57: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Learning An Oracle (Predictor)

Training data: dependency treebank

Input: configuration

Output: {LEFTARC, RIGHTARC, SHIFT}

t ← ORACLE(state)

• Choose LEFTARC if it produces a correct head-dependent relation given the reference parse and the current configuration

• Choose RIGHTARC if • it produces a correct head-dependent relation given the reference parse and• all of the dependents of the word at the top of the stack have already been

assigned

Page 58: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Learning An Oracle (Predictor)

Training data: dependency treebank

Input: configuration

Output: {LEFTARC, RIGHTARC, SHIFT}

t ← ORACLE(state)

• Choose LEFTARC if it produces a correct head-dependent relation given the reference parse and the current configuration

• Choose RIGHTARC if • it produces a correct head-dependent relation given the reference parse and• all of the dependents of the word at the top of the stack have already been

assigned

Deps Stack Word Buffer Action

--- $ Papa ate the caviar SHIFT

--- Papa $ ate the caviar SHIFT

--- ate Papa $ the caviar LEFTARC

ate->Papa ate $ caviar SHIFT

ate->Papa the ate $ --- SHIFT

ate->Papa caviar the ate $ --- LEFTARC

ate->Papa, caviar-> the caviar ate $ --- RIGHTARC

ate->Papa, caviar-> the, ate->caviar ate $ --- RIGHTARC

ate->Papa, caviar-> the, ate->caviar, $->ate --- --- ---

Page 59: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Learning An Oracle (Predictor)

Training data: dependency treebank

Input: configuration

Output: {LEFTARC, RIGHTARC, SHIFT}

t ← ORACLE(state)

• Choose LEFTARC if it produces a correct head-dependent relation given the reference parse and the current configuration

• Choose RIGHTARC if • it produces a correct head-dependent relation given the reference parse and• all of the dependents of the word at the top of the stack have already been

assigned• Otherwise, choose SHIFT

Page 60: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Training the Predictor

Predict action t give configuration s

t = φ(s)

Page 61: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Training the Predictor

Predict action t give configuration s

t = φ(s)

Extract features of the configuration

Examples: word forms, lemmas, POS, morphological features

Page 62: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Training the Predictor

Predict action t give configuration s

t = φ(s)

Extract features of the configurationExamples: word forms, lemmas, POS,

morphological features

How? Perceptron, Maxent, Support Vector Machines, Multilayer Perceptrons, Neural Networks

Page 63: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Papa ate the caviar

If time permits, work through on board

state {[root], [words], [] }

while state ≠ {[root], [], [(deps)]} {

t ← ORACLE(state)

state ← APPLY(t, state)

}

return state

PossibilityActionName

Action Meaning

Assign the current word as the head of

some previously seen word

LEFTARC

Assert a head-dependent relation between the word at the top of stack and the word directly beneath it; remove the lower word from the

stack

Assign some previously seen word

as the head of the current word

RIGHTARC

Assert a head-dependent relation between the second word on the stack and the word at the top; remove the word at the top of the stack

Wait processing the current word; add it

for laterSHIFT

Remove the word from the front of the input buffer and push it onto the stack

Page 64: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Parse steps for “Papa ate the caviar”

Deps Stack Word Buffer Action

--- $ Papa ate the caviar SHIFT

--- Papa $ ate the caviar SHIFT

--- ate Papa $ the caviar LEFTARC

ate->Papa ate $ caviar SHIFT

ate->Papa the ate $ --- SHIFT

ate->Papa caviar the ate $ --- LEFTARC

ate->Papa, caviar-> the caviar ate $ --- RIGHTARC

ate->Papa, caviar-> the, ate->caviar ate $ --- RIGHTARC

ate->Papa, caviar-> the, ate->caviar, $->ate --- --- ---

Page 65: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Arc Standard Parsing

state {[root], [words], [] }

while state ≠ {[root], [], [(deps)]} {

t ← ORACLE(state)

state ← APPLY(t, state)

}

return state

Q: What is the time complexity?

Page 66: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Arc Standard Parsing

state {[root], [words], [] }

while state ≠ {[root], [], [(deps)]} {

t ← ORACLE(state)

state ← APPLY(t, state)

}

return state

Q: What is the time complexity?

A: Linear

Page 67: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Arc Standard Parsing

state {[root], [words], [] }

while state ≠ {[root], [], [(deps)]} {

t ← ORACLE(state)

state ← APPLY(t, state)

}

return state

Q: What is the time complexity?

A: Linear

Q: What’s potentially problematic?

Page 68: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Arc Standard Parsing

state {[root], [words], [] }

while state ≠ {[root], [], [(deps)]} {

t ← ORACLE(state)

state ← APPLY(t, state)

}

return state

Q: What is the time complexity?

A: Linear

Q: What’s potentially problematic?

A: This is a greedy algorithm

Page 69: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Becoming Less Greedy

Beam search

Breadth-first search strategy (CMSC 471/671)

At each stage, keep K options open

Page 70: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Evaluation

Exact Match (per-sentence accuracy)

Unlabeled Attachment Score (UAS)

Labeled Attachment Score (LS, LAS)

Recall/Precision/F1 for particular relation types

Page 71: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Outline

Review: PCFGs and CKY

Dependency Grammars

Dependency Parsing (Shift-reduce)

From syntax to semantics

Page 72: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

Parsing as a Core NLP Problem

sentence 1

sentence 2

sentence 3

sentence 4

Parser

Grammar

Evaluation score

Other NLP task (entity coref., MT, Q&A, …)

independent operations

Gold (correct) reference trees

Page 73: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

From Dependencies to Shallow Semantics

Page 74: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

From Syntax to Shallow Semantics

Angeli et al. (2015)

“Open Information Extraction”

Page 75: Dependency Grammars and Parsing€¦ · Probabilistic Context Free Grammar Set of weighted (probabilistic) rewrite rules, comprised of terminals and non-terminals Terminals: the words

From Syntax to Shallow Semantics

http://corenlp.run/ (constituency & dependency)

https://github.com/hltcoe/predpatt

http://openie.allenai.org/

http://www.cs.rochester.edu/research/knext/browse/ (constituency trees)

http://rtw.ml.cmu.edu/rtw/

Angeli et al. (2015)

“Open Information Extraction”

a sampling of efforts