December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure...

36
December 2004 CSA3050: PCFGs 1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

Transcript of December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure...

Page 1: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 1

CSA305: Natural Language Algorithms

Probabilistic

Phrase Structure Grammars

(PCFGs)

Page 2: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 2

Handling Ambiguities

• The Earley Algorithm is equipped to represent ambiguities efficiently but not to resolve them.

• Methods available for resolving ambiguities include:– Semantics (choose parse that makes sense).

– Statistics: (choose parse that is most likely).

• Probabilistic context-free grammars (PCFGs) offer a solution.

Page 3: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 3

PCFG• A PCFG is a 5-tuple (NT,T,P,S,D) where D

is a function that assigns probabilities to each rule p P

• A PCFG augments each rule with a conditional probability.A [p]

• Formally this is the probability of a given expansion given LHS non-terminal.

Page 4: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 4

Example PCFG

Page 5: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 5

Example PCFG Fragment

• S NP VP [.80]S Aux NP VP [.15] S VP [.05]

• Sum of conditional probabilities for a givenANT = 1

• PCFG can be used to estimate probabilities of each parse-tree for sentence S.

Page 6: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 6

Probability of a Parse Tree

• For sentence S, the probability assigned by a PCFG to a parse tree T is given by

P(r(n))

where n is a node of T and r(n) is the production rule that produced n

• i.e. the product of the probabilities of all the rules r used to expand each node n in T.

nT

Page 7: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

CSA3050: PCFGs

Ambiguous Sentence

P(TL) = 1.5 x 10-6

P(TR) = 1.7 x 10-6

P(S) = 3.2 x 10-6

Page 8: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 8

The Parsing Problem for PCFGs

• The parsing problem for PCFGs is to produce the most likely parse for a given sentence, i.e. to compute the T spanning the sentence whose probability is maximal.

• CYK algorithm assumes that grammar is in Chomsky Normal Form: – No productions– Rules of the form A B C or A a

Page 9: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 9

CKY Algorithm – Base Case

• Base case: covering input strings with of length 1 (i.e. individual words). In CNF, probability p has to come from that of corresponding rule

• A w [p]

Page 10: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 10

CKY AlgorithmRecursive Case

• Recursive case: input strings of length > 1:

A * wij if and only if there is a ruleA B Cand some 1 ≤ k ≤ j such that B derives wik and C derives wkj

• In this case P(wij) is obtained by multiplying together P(wik) and P(wjk).

• These probabilities in other parts of table• Take max value

Page 11: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 11

Probabilistic CKY Algorithmfor Sentence of Length n

1. for k := 1 to n do2. π[k-1,k,A] := P(A → wk)

3. for i := k-2 downto 0 do4. for j := k-1 downto i+1 do5. π[i,j,A] :=

max [π[i,j,B] * π[j,k,C] * P(A → BC) ] for each A → BC G

6. return max[π(0,n,S)]

Page 12: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 12

w w w w w w

i

k

i+1

k-1

j

Page 13: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 13

Probabilistic Earley

Non Probabilistic CompleterProcedure Completer( (B -> Z . , [j,k]) )

for each (A -> X . B Y, [i,j]) in chart[j] do

enqueue( (A -> X B . Y, [i,k]), chart[k] )

Probabilistic CompleterProcedure Completer( (B -> Z . , [j,k],Pjk) )

for each (A -> X . B Y, [i,j],Pij) in chart[j] do

enqueue( (A -> X B . Y, [i,k], Pij*Pjk), chart[k] )

Page 14: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 14

Discovery of ProbabilitiesNormal Rules

• Use a corpus of already parsed sentences.• Example: Penn Treebank (Marcus et al 1993)

– parse trees for 1M word Brown Corpus– skeleton parsing: partial parse, leaving out the “hard”

things (such as PP-attachment)• Parse corpus and take statistics. Has to account for

ambiguity.• P(α→β|α)

= C(α→β)/ΣC(α→γ)= C(α→β)/C(α)

Page 15: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 15

Penn Treebank Example 1

((S (NP (NP Pierre Vinken),

(ADJP (NP 61 years) old, )) will (VP join

(NP the board)(PP as

(NP a nonexecutive director))(NP Nov 29)))

.)

Page 16: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 16

Penn Treebank – Example 2

( (S (NP (DT The) (NNP Fulton) (NNP County) (NNP Grand) (NNP Jury) ) (VP (VBD said) (NP (NNP Friday) ) (SBAR (-NONE- 0) (S (NP (DT an) (NN investigation) (PP (IN of) (NP (NP (NNP Atlanta) ) (POS 's) (JJ recent) (JJ primary) (NN election) ))) (VP (VBD produced) (NP

(OQUOTE OQUOTE) (DT no) (NN evidence) (CQUOTE CQUOTE)

(SBAR (IN that) (S (NP (DT any) (NNS irregularities) ) (VP (VBD took) (NP (NN place) ))))))))))

(PERIOD PERIOD) )

Page 17: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 17

Problems with PCFGs

• Fundamental Independence Assumptions:

• A CFG assumes that the expansion of any one non-terminal is independent of the expansion of any other non-terminal.

• Hence rule probabilities are always multiplied together.

• The FIA is not always realistic, however.

Page 18: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 18

Problems with PCFGs

• Difficulty in representing dependencies between parse tree nodes.– Structural Dependencies between the expansion

of a node N and anything above N in the parse tree, such as M

– Lexical Dependencies between the expansion of a node N and occurrences of particular words in text segments dominated by N.

Page 19: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 19

Tree Dependencies

Mp,s

Nq,r

p q r s

Page 20: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 20

Structural Dependency

• By examination of text corpora, it has been shown (Kuno 1972) that there is a strong tendency (c. 2:1) in English and other languages for subject of a sentence to be a pronoun:– She's able to take her baby to work versus

– Joanna worked until she had a family

• whilst the object tends to be a non-pronoun– All the people signed the confessions versus

– Some laws prohibit it

Page 21: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 21

Expansion sometimes dependson ancestor nodes

S

Pron

NP

N

NP

he saw Mr. Bush

VPsubject object

Page 22: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 22

Dependencies cannot be stated

• These dependencies could be captured if it were possible to say that the probabilities associated with, e.g. NP → PronNP → Ndepend whether NP is subject or object.

• However, this cannot normally be said in a standard PCFG.

Page 23: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 23

Lexical Dependencies

• Consider sentence:"Moscow sent soldiers into Afghanistan."

• Suppose grammar includesNP → NP PPVP → VP PP

• There will typically be 2 parse trees.

Page 24: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 24

PP Attachment Ambiguity

N V N P N

Moscow sent soldiers into Afghanistan

VP NP NPNP

VP

VP PP

S

NP

VP

VP NP

S

N V N P N

Moscow sent soldiers into Afghanistan

NP

NP

PP

67% of PPs attach to NPs33% of PPs attach to VPs

Page 25: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 25

PP Attachment Ambiguity

N V N P N

Moscow sent soldiers from Afghanistan

VP NP NPNP

VP

VP PP

S

NP

VP

VP NP

S

N V N P N

Moscow sent soldiers from Afghanistan

NP

NP

PP

67% of PPs attach to NPs33% of PPs attach to VPs

Page 26: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 26

Lexical Properties

• Raw statistics on the use of these two rules suggest that NP → NP PP (67 %)VP → VP PP (33 % )

• In this case the raw statistics are misleading and yield the wrong conclusion.

• The correct parse should be decided on the basis of the lexical properties of the verb "send into" alone, since we know that the basic pattern for this verb is (NP) send (NP) (PPinto)where the PPinto attaches to the VP.

Page 27: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 27

Lexicalised PCFGs

• Basic idea: each syntactic constituent is associated with a head which is a single word.

• Each non-terminal in a parse tree is annotated with that single word.

• Michael Collins (1999) Head Driven Statistical Models for NL Parsing PhD Thesis (see author’s website).

Page 28: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 28

Lexicalised Tree

Page 29: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 29

Generating Lexicalised Parse Trees

• To generate such a tree, each rule must identify exactly one right hand side constituent to be the head daughter.

• Then the headword of a node is inherited from the headword of the head daughter.

• In the case of a lexical item, the head is clearly itself (though the word might undergo minor inflectional modification).

Page 30: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 30

Finding the Head Constituent

• In some cases this is very easy, e.g.NP[N] → Det N (the man)VP[V] → V NP (... asked John)

• In other cases it isn'tPP[?] → P NP (to London)

• Many modern linguistic theories include a component that define what heads are.

Page 31: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 31

Discovery of ProbabilitiesLexicalised Rules

• Need to establish individual probabilities of, e.g. VP(dumped) → V(dumped) NP(sacks) PP(into)

VP(dumped) → V(dumped) NP(cats) PP(into)

VP(dumped) → V(dumped) NP(hats) PP(into)

VP(dumped) → V(dumped) NP(sacks) PP(above)

• Problem – no corpus big enough to train with this number of rules (nearly all the rules would have zero counts).

• Need to make independence assumptions that allow counts to be clustered.

• Which independence assumptions?

Page 32: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 32

Charniak’s (1997) Approach

• Normal PCFG: probability is conditioned only on syntactic category, i.e. p(r(n)|c(n))

• Charniak’s also conditioned the probability of a given rule expansion on the head of the non-terminal. p(r(n)|c(n),h(n))

• N.B. This approach would pool the statistics of all individual rules on previous slide together, i.e. as

VP(dumped) → V NP PP

Page 33: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 33

Probability of Head

• Now that we have added heads as a conditioning factor, we must also decide how to compute the probability of a head.

• The null assumption, that all heads are equally probable, is unrealistic (different verbs have different frequencies of occurrence).

• Charniak therefore adopted a better assumption: probability of a node n having head h depends on– syntactic category of n and – head of n’s mother.

Page 34: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 34

Including Head of Mother

• So instead of equal probabilities for all heads, we havep(h(n) = wordi | c(n), h(m(n)))

• Relating this to the circled node in our previous figure, we havep(h(n)=sacks| c(n)=NP, h(m(n))=dumped)

Page 35: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 35

Probability of a Complete Parse

STANDARD PCFGFor sentence S, the probability assigned by a PCFG to a parse tree T was given by

P(r(n))

where n is a node of T and r(n) is the production rule that produced n

HEAD DRIVEN PCFGTo include probability of complete parseP(T) =

p(r(n)|c(n),h(n)) * p(h(n)|c(n),

h(m(n)))nT nT

Page 36: December 2004CSA3050: PCFGs1 CSA305: Natural Language Algorithms Probabilistic Phrase Structure Grammars (PCFGs)

December 2004 CSA3050: PCFGs 36

Evaluating Parsers

• LetA = # correct constituents in candidate parse

B = # correct constituents in treebank parse

C = # total constituents in candidate parse

• Labelled Recall = A/B

• Labelled Precision = A/C