10. Lexicalized and Probabilistic Parsing -Speech and Language Processing-
Probabilistic and Lexicalized Parsing
description
Transcript of Probabilistic and Lexicalized Parsing
![Page 1: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/1.jpg)
Probabilistic and Lexicalized Parsing
CS 4705
![Page 2: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/2.jpg)
Probabilistic CFGs: PCFGs
• Weighted CFGs– Attach weights to rules of CFG– Compute weights of derivations– Use weights to choose preferred parses
• Utility: Pruning and ordering the search space, disambiguate, Language Model for ASR
• Parsing with weighted grammars: find the parse T’ which maximizes the weights of the derivations in the parse tree for all the possible parses of S
• T’(S) = argmaxT∈τ(S) W(T,S)
• Probabilistic CFGs are one form of weighted CFGs
![Page 3: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/3.jpg)
Rule Probability
• Attach probabilities to grammar rules• Expansions for a given non-terminal sum to 1
R1: VP V .55
R2: VP V NP .40
R3: VP V NP NP .05• Estimate probabilities from annotated corpora
– E.g. Penn Treebank– P(R1)=counts(R1)/counts(VP)
![Page 4: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/4.jpg)
Derivation Probability
• For a derivation T= {R1…Rn}:
– Probability of the derivation: • Product of probabilities of rules expanded in tree
– Most likely probable parse: – Probability of a sentence:
• Sum over all possible derivations for the sentence
• Note the independence assumption: Parse probability does not change based on where the rule is expanded.
)(maxarg* TPTT
n
iiRPTP
1
)()(
T
STPSP )|()(
![Page 5: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/5.jpg)
One Approach: CYK Parser
• Bottom-up parsing via dynamic programming– Assign probabilities to constituents as they
are completed and placed in a table– Use the maximum probability for each
constituent type going up the tree to S• The Intuition:
– We know probabilities for constituents lower in the tree, so as we construct higher level constituents we don’t need to recompute these
![Page 6: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/6.jpg)
CYK (Cocke-Younger-Kasami) Parser
• Bottom-up parser with top-down filtering• Uses dynamic programming to store intermediate results
(cf. Earley algorithm for top-down case)• Input: PCFG in Chomsky Normal Form
– Rules of form Aw or ABC; no ε• Chart: array [i,j,A] to hold probability that non-terminal A
spans input i-j
– Start State(s): (i,i+1,A) for each Awi+1
– End State: (1,n,S) where n is the input size– Next State Rules: (i,k,B) (k,j,C) (i,j,A) if ABC
• Maintain back-pointers to recover the parse
![Page 7: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/7.jpg)
Structural Ambiguity
• S NP VP• VP V NP• NP NP PP• VP VP PP• PP P NP
• NP John | Mary | Denver• V -> called• P -> from
John called Mary from Denver
S
VP PP
NP VP
V NP NPP
John called Mary from Denver
S
NP
NP VP
V NP PP
PJohn called Mary
from Denver
NP
![Page 8: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/8.jpg)
Example
John called Mary from Denver
![Page 9: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/9.jpg)
Base Case: Aw
NP
P Denver
NP from
V Mary
NP called
John
![Page 10: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/10.jpg)
Recursive Cases: ABC
NP
P Denver
NP from
X V Mary
NP called
John
![Page 11: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/11.jpg)
NP
P Denver
VP NP from
X V Mary
NP called
John
![Page 12: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/12.jpg)
NP
X P Denver
VP NP from
X V Mary
NP called
John
![Page 13: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/13.jpg)
PP NP
X P Denver
VP NP from
X V Mary
NP called
John
![Page 14: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/14.jpg)
PP NP
X P Denver
S VP NP from
V Mary
NP called
John
![Page 15: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/15.jpg)
PP NP
X X P Denver
S VP NP from
X V Mary
NP called
John
![Page 16: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/16.jpg)
NP PP NP
X P Denver
S VP NP from
X V Mary
NP called
John
![Page 17: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/17.jpg)
NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
![Page 18: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/18.jpg)
VP NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
![Page 19: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/19.jpg)
VP NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
![Page 20: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/20.jpg)
VP1
VP2
NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
![Page 21: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/21.jpg)
S VP1
VP2
NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
![Page 22: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/22.jpg)
S VP NP PP NP
X X X P Denver
S VP NP from
X V Mary
NP called
John
![Page 23: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/23.jpg)
Problems with PCFGs• Probability model just based on rules in the derivation.• Lexical insensitivity:
– Doesn’t use words in any real way– But structural disambiguation is lexically driven
• PP attachment often depends on the verb, its object, and the preposition
• I ate pickles with a fork. • I ate pickles with relish.
• Context insensitivity of the derivation– Doesn’t take into account where in the derivation a rule is used
• Pronouns more often subjects than objects • She hates Mary. • Mary hates her.
• Solution: Lexicalization– Add lexical information to each rule– I.e. Condition the rule probabilities on the actual words
![Page 24: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/24.jpg)
An example: Phrasal Heads
• Phrasal heads can ‘take the place of’ whole phrases, defining most important characteristics of the phrase
• Phrases generally identified by their heads– Head of an NP is a noun, of a VP is the main verb, of a
PP is preposition
• Each PFCG rule’s LHS shares a lexical item with a non-terminal in its RHS
![Page 25: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/25.jpg)
Increase in Size of Rule Set in Lexicalized CFG
• If R is the number of binary branching rules in CFG and ∑ is the lexicon, O(2*|∑|*|R|)
• For unary rules: O(|∑|*|R|)
![Page 26: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/26.jpg)
Example (correct parse)
Attribute grammar
![Page 27: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/27.jpg)
Example (less preferred)
![Page 28: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/28.jpg)
Computing Lexicalized Rule Probabilities
• We started with rule probabilities as before– VP V NP PP P(rule|VP)
• E.g., count of this rule divided by the number of VPs in a treebank
• Now we want lexicalized probabilities– VP(dumped) V(dumped) NP(sacks)
PP(into)• i.e., P(rule|VP ^ dumped is the verb ^ sacks is the
head of the NP ^ into is the head of the PP)
– Not likely to have significant counts in any treebank
![Page 29: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/29.jpg)
Exploit the Data You Have
• So, exploit the independence assumption and collect the statistics you can…
• Focus on capturing– Verb subcategorization
• Particular verbs have affinities for particular VPs
– Objects’ affinity for their predicates• Mostly their mothers and grandmothers• Some objects fit better with some predicates than
others
![Page 30: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/30.jpg)
Verb Subcategorization
• Condition particular VP rules on their heads– E.g. for a rule r VP -> V NP PP
• P(r|VP) becomes P(r ^ V=dumped | VP ^ dumped)
– How do you get the probability?• How many times was rule r used with dumped,
divided by the number of VPs that dumped appears in, in total
• How predictive of r is the verb dumped?
– Captures affinity between VP heads (verbs) and VP rules
![Page 31: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/31.jpg)
Example (correct parse)
![Page 32: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/32.jpg)
Example (less preferred)
![Page 33: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/33.jpg)
Affinity of Phrasal Heads for Other Heads: PP Attachment
• Verbs with preps vs. Nouns with preps• E.g. dumped with into vs. sacks with into
– How often is dumped the head of a VP which includes a PP daughter with into as its head relative to other PP heads or… what’s P(into|PP,dumped is mother VP’s head))
– Vs…how often is sacks the head of an NP with a PP daughter whose head is into relative to other PP heads or… P(into|PP,sacks is mother’s head))
![Page 34: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/34.jpg)
But Other Relationships do Not Involve Heads (Hindle & Rooth ’91)
• Affinity of gusto for eat is greater than for spaghetti; and affinity of marinara for spaghetti is greater than for ate
Vp (ate) Vp(ate)
Vp(ate) Pp(with)
Pp(with)
Np(spag)
npvvAte spaghetti with marinaraAte spaghetti with gusto
np
![Page 35: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/35.jpg)
Log-linear models for Parsing
• Why restrict to the conditioning to the elements of a rule?– Use even larger context…word sequence, word
types, sub-tree context etc.
• Compute P(y|x); where fi(x,y) tests properties of context and i is weight of feature
• Use as scores in CKY algorithm to find best parse
Yy
yxf
yxf
ii
ii
e
exyP
),(*
),(*
)|(
![Page 36: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/36.jpg)
Supertagging: Almost parsing
Poachers now control the underground trade
NP
N
poachers
N
NN
tradeS
NP
VP
V
NP
N
poachers
::
S
SAdv
now
VP
VPAdv
now
VP
AdvVP
now
::
S
S
VP
V
NP
control
S
NP
VP
V
NP
control
S
NP
VP
V
NP
control
S
NP
NPDet
the
NP
NP
N
trade
N
NN
poachers
S
NP
VP
V
NP
N
trade
N
NAdj
underground
S
NP
VP
V
NP
Adj
underground
S
NP
VP
V
NP
Adj
underground
S
NP
:
![Page 37: Probabilistic and Lexicalized Parsing](https://reader035.fdocuments.net/reader035/viewer/2022062321/5681399b550346895da134be/html5/thumbnails/37.jpg)
Summary
• Parsing context-free grammars– Top-down and Bottom-up parsers– Mixed approaches (CKY, Earley parsers)
• Preferences over parses using probabilities– Parsing with PCFG and PCKY algorithms
• Enriching the probability model– Lexicalization– Log-linear models for parsing– Super-tagging