Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase...

87
Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington [email protected] April 24, 2017 1 / 87

Transcript of Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase...

Page 1: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Natural Language Processing (CSEP 517):Phrase Structure Syntax and Parsing

Noah Smithc© 2017

University of [email protected]

April 24, 2017

1 / 87

Page 2: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

To-Do List

I Online quiz: due Sunday

I Ungraded mid-quarter survey: due Sunday

I Read: Jurafsky and Martin (2008, ch. 12–14), Collins (2011)

I A3 due May 7 (Sunday)

2 / 87

Page 3: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Finite-State Automata

A finite-state automaton (plural “automata”) consists of:I A finite set of states S

I Initial state s0 ∈ SI Final states F ⊆ S

I A finite alphabet Σ

I Transitions δ : S × Σ→ 2S

I Special case: deterministic FSA defines δ : S × Σ→ S

A string x ∈ Σn is recognizable by the FSA iff there is a sequence 〈s0, . . . , sn〉 suchthat sn ∈ F and

n∧i=1

[[si ∈ δ(si−1, xi)]]

This is sometimes called a path.

3 / 87

Page 4: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Terminology from Theory of Computation

I A regular expression can be:I an empty string (usually denoted ε) or a symbol from ΣI a concatentation of regular expressions (e.g., abc)I an alternation of regular expressions (e.g., ab|cd)I a Kleene star of a regular expression (e.g., (abc)∗)

I A language is a set of strings.

I A regular language is a language expressible by a regular expression.

I Important theorem: every regular language can be recognized by a FSA, and everyFSA’s language is regular.

4 / 87

Page 5: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Proving a Language Isn’t Regular

Pumping lemma (for regular languages): if L is an infinite regular language, then thereexist strings x, y, and z, with y 6= ε, such that xynz ∈ L, for all n ≥ 0.

s0 s sf

x

y

z

If L is infinite and x, y, z do not exist, then L is not regular.

5 / 87

Page 6: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Proving a Language Isn’t RegularPumping lemma (for regular languages): if L is an infinite regular language, then thereexist strings x, y, and z, with y 6= ε, such that xynz ∈ L, for all n ≥ 0.

s0 s sf

x

y

z

If L is infinite and x, y, z do not exist, then L is not regular.

If L1 and L2 are regular, then L1 ∩ L2 is regular.

6 / 87

Page 7: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Proving a Language Isn’t RegularPumping lemma (for regular languages): if L is an infinite regular language, then thereexist strings x, y, and z, with y 6= ε, such that xynz ∈ L, for all n ≥ 0.

s0 s sf

x

y

z

If L is infinite and x, y, z do not exist, then L is not regular.

If L1 and L2 are regular, then L1 ∩ L2 is regular.

If L1 ∩ L2 is not regular, and L1 is regular, then L2 is not regular.7 / 87

Page 8: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Claim: English is not regular.

L1 = (the cat|mouse|dog)∗(ate|bit|chased)∗ likes tuna fish

L2 = English

L1 ∩ L2 = (the cat|mouse|dog)n(ate|bit|chased)n−1 likes tuna fish

L1 ∩ L2 is not regular, but L1 is ⇒ L2 is not regular.

8 / 87

Page 9: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

the cat likes tuna fish

the cat the dog chased likes tuna fish

the cat the dog the mouse scared chased likes tuna fish

the cat the dog the mouse the elephant squashed scared chased likes tuna fish

the cat the dog the mouse the elephant the flea bit squashed scared chased likes tunafish

the cat the dog the mouse the elephant the flea the virus infected bit squashed scaredchased likes tuna fish

9 / 87

Page 10: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Linguistic Debate

10 / 87

Page 11: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Linguistic Debate

Chomsky put forward an argument like the one we just saw.

11 / 87

Page 12: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Linguistic Debate

Chomsky put forward an argument like the one we just saw.

(Chomsky gets credit for formalizing a hierarchy of types of languages: regular,context-free, context-sensitive, recursively enumerable. This was an importantcontribution to CS!)

12 / 87

Page 13: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Linguistic Debate

Chomsky put forward an argument like the one we just saw.

(Chomsky gets credit for formalizing a hierarchy of types of languages: regular,context-free, context-sensitive, recursively enumerable. This was an importantcontribution to CS!)

Some are unconvinced, because after a few center embeddings, the examples becomeunintelligible.

13 / 87

Page 14: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Linguistic Debate

Chomsky put forward an argument like the one we just saw.

(Chomsky gets credit for formalizing a hierarchy of types of languages: regular,context-free, context-sensitive, recursively enumerable. This was an importantcontribution to CS!)

Some are unconvinced, because after a few center embeddings, the examples becomeunintelligible.

Nonetheless, most agree that natural language syntax isn’t well captured by FSAs.

14 / 87

Page 15: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Noun Phrases

What, exactly makes a noun phrase? Examples (Jurafsky and Martin, 2008):

I Harry the Horse

I the Broadway coppers

I they

I a high-class spot such as Mindy’s

I the reason he comes into the Hot Box

I three parties from Brooklyn

15 / 87

Page 16: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Constituents

More general than noun phrases: constituents are groups of words.

Linguists characterize constituents in a number of ways, including:

I where they occur (e.g., “NPs can occur before verbs”)I where they can move in variations of a sentence

I On September 17th, I’d like to fly from Atlanta to DenverI I’d like to fly on September 17th from Atlanta to DenverI I’d like to fly from Atlanta to Denver on September 17th

I what parts can move and what parts can’tI *On September I’d like to fly 17th from Atlanta to Denver

I what they can be conjoined withI I’d like to fly from Atlanta to Denver on September 17th and in the morning

16 / 87

Page 17: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Constituents

More general than noun phrases: constituents are groups of words.

Linguists characterize constituents in a number of ways, including:

I where they occur (e.g., “NPs can occur before verbs”)I where they can move in variations of a sentence

I On September 17th, I’d like to fly from Atlanta to DenverI I’d like to fly on September 17th from Atlanta to DenverI I’d like to fly from Atlanta to Denver on September 17th

I what parts can move and what parts can’tI *On September I’d like to fly 17th from Atlanta to Denver

I what they can be conjoined withI I’d like to fly from Atlanta to Denver on September 17th and in the morning

17 / 87

Page 18: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Constituents

More general than noun phrases: constituents are groups of words.

Linguists characterize constituents in a number of ways, including:

I where they occur (e.g., “NPs can occur before verbs”)I where they can move in variations of a sentence

I On September 17th, I’d like to fly from Atlanta to DenverI I’d like to fly on September 17th from Atlanta to DenverI I’d like to fly from Atlanta to Denver on September 17th

I what parts can move and what parts can’tI *On September I’d like to fly 17th from Atlanta to Denver

I what they can be conjoined withI I’d like to fly from Atlanta to Denver on September 17th and in the morning

18 / 87

Page 19: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Constituents

More general than noun phrases: constituents are groups of words.

Linguists characterize constituents in a number of ways, including:

I where they occur (e.g., “NPs can occur before verbs”)I where they can move in variations of a sentence

I On September 17th, I’d like to fly from Atlanta to DenverI I’d like to fly on September 17th from Atlanta to DenverI I’d like to fly from Atlanta to Denver on September 17th

I what parts can move and what parts can’tI *On September I’d like to fly 17th from Atlanta to Denver

I what they can be conjoined withI I’d like to fly from Atlanta to Denver on September 17th and in the morning

19 / 87

Page 20: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Recursion and Constituents

this is the house

this is the house that Jack built

this is the cat that lives in the house that Jack built

this is the dog that chased the cat that lives in the house that Jack built

this is the flea that bit the dog that chased the cat that lives in the house the Jack built

this is the virus that infected the flea that bit the dog that chased the cat that lives inthe house that Jack built

20 / 87

Page 21: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Not Constituents(Pullum, 1991)

I If on a Winter’s Night a Traveler (by Italo Calvino)

I Nuclear and Radiochemistry (by Gerhart Friedlander et al.)

I The Fire Next Time (by James Baldwin)

I A Tad Overweight, but Violet Eyes to Die For (by G.B. Trudeau)

I Sometimes a Great Notion (by Ken Kesey)

I [how can we know the] Dancer from the Dance (by Andrew Holleran)

21 / 87

Page 22: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Context-Free Grammar

A context-free grammar consists of:I A finite set of nonterminal symbols N

I A start symbol S ∈ NI A finite alphabet Σ, called “terminal” symbols, distinct from NI Production rule set R, each of the form “N → α” where

I The lefthand side N is a nonterminal from NI The righthand side α is a sequence of zero or more terminals and/or nonterminals:α ∈ (N ∪ Σ)∗

I Special case: Chomsky normal form constrains α to be either a single terminalsymbol or two nonterminals

22 / 87

Page 23: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

An Example CFG for a Tiny Bit of EnglishFrom Jurafsky and Martin (2008)

S → NP VP Det → that | this | aS → Aux NP VP Noun → book | flight | meal | moneyS → VP Verb → book | include | preferNP → Pronoun Pronoun → I | she | meNP → Proper-Noun Proper-Noun → Houston | NWANP → Det Nominal Aux → doesNominal → Noun Preposition → from | to | on | nearNominal → Nominal Noun | throughNominal → Nominal PPVP → VerbVP → Verb NPVP → Verb NP PPVP → Verb PPVP → VP PPPP → Preposition NP

23 / 87

Page 24: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Example Phrase Structure Tree

S

Aux

does

NP

Det

this

Noun

flight

VP

Verb

include

NP

Det

a

Noun

meal

The phrase-structure tree represents both the syntactic structure of the sentence andthe derivation of the sentence under the grammar. E.g., VP

Verb NP

corresponds to the

rule VP → Verb NP.

24 / 87

Page 25: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

The First Phrase-Structure Tree(Chomsky, 1956)

Sentence

NP

the man

VP

V

took

NP

the book

25 / 87

Page 26: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Where do natural language CFGs come from?

As evidenced by the discussion in Jurafsky and Martin (2008), building a CFG for anatural language by hand is really hard.

I Need lots of categories to make sure all and only grammatical sentences areincluded.

I Categories tend to start exploding combinatorially.

I Alternative grammar formalisms are typically used for manual grammarconstruction; these are often based on constraints and a powerful algorithmic toolcalled unification.

26 / 87

Page 27: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Where do natural language CFGs come from?

As evidenced by the discussion in Jurafsky and Martin (2008), building a CFG for anatural language by hand is really hard.

I Need lots of categories to make sure all and only grammatical sentences areincluded.

I Categories tend to start exploding combinatorially.

I Alternative grammar formalisms are typically used for manual grammarconstruction; these are often based on constraints and a powerful algorithmic toolcalled unification.

27 / 87

Page 28: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Where do natural language CFGs come from?

As evidenced by the discussion in Jurafsky and Martin (2008), building a CFG for anatural language by hand is really hard.

I Need lots of categories to make sure all and only grammatical sentences areincluded.

I Categories tend to start exploding combinatorially.

I Alternative grammar formalisms are typically used for manual grammarconstruction; these are often based on constraints and a powerful algorithmic toolcalled unification.

28 / 87

Page 29: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Where do natural language CFGs come from?

As evidenced by the discussion in Jurafsky and Martin (2008), building a CFG for anatural language by hand is really hard.

I Need lots of categories to make sure all and only grammatical sentences areincluded.

I Categories tend to start exploding combinatorially.

I Alternative grammar formalisms are typically used for manual grammarconstruction; these are often based on constraints and a powerful algorithmic toolcalled unification.

29 / 87

Page 30: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Where do natural language CFGs come from?

As evidenced by the discussion in Jurafsky and Martin (2008), building a CFG for anatural language by hand is really hard.

I Need lots of categories to make sure all and only grammatical sentences areincluded.

I Categories tend to start exploding combinatorially.

I Alternative grammar formalisms are typically used for manual grammarconstruction; these are often based on constraints and a powerful algorithmic toolcalled unification.

Standard approach today:

1. Build a corpus of annotated sentences, called a treebank. (Memorable example:the Penn Treebank, Marcus et al., 1993.)

2. Extract rules from the treebank.

3. Optionally, use statistical models to generalize the rules.

30 / 87

Page 31: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Example from the Penn Treebank

S

NP-SBJ

NP

NNP

Pierre

NNP

Vinken

,

,

ADJP

NP

CD

61

NNS

years

JJ

old

,

,

VP

MD

will

VP

VB

join

NP

DT

the

NN

board

PP-CLR

IN

as

NP

DT

a

JJ

nonexecutive

NN

director

NP-TMP

NNP

Nov.

CD

29

31 / 87

Page 32: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

LISP Encoding in the Penn Treebank( (S

(NP-SBJ-1

(NP (NNP Rudolph) (NNP Agnew) )

(, ,)

(UCP

(ADJP

(NP (CD 55) (NNS years) )

(JJ old) )

(CC and)

(NP

(NP (JJ former) (NN chairman) )

(PP (IN of)

(NP (NNP Consolidated) (NNP Gold) (NNP Fields) (NNP PLC) ))))

(, ,) )

(VP (VBD was)

(VP (VBN named)

(S

(NP-SBJ (-NONE- *-1) )

(NP-PRD

(NP (DT a) (JJ nonexecutive) (NN director) )

(PP (IN of)

(NP (DT this) (JJ British) (JJ industrial) (NN conglomerate) ))))))

(. .) ))32 / 87

Page 33: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Some Penn Treebank Rules with Counts40717 PP → IN NP33803 S → NP-SBJ VP22513 NP-SBJ → -NONE-21877 NP → NP PP20740 NP → DT NN14153 S → NP-SBJ VP .12922 VP → TO VP11881 PP-LOC → IN NP11467 NP-SBJ → PRP11378 NP → -NONE-11291 NP → NN. . .989 VP → VBG S985 NP-SBJ → NN983 PP-MNR → IN NP983 NP-SBJ → DT969 VP → VBN VP. . .

100 VP → VBD PP-PRD100 PRN → : NP :100 NP → DT JJS100 NP-CLR → NN99 NP-SBJ-1 → DT NNP98 VP → VBN NP PP-DIR98 VP → VBD PP-TMP98 PP-TMP → VBG NP97 VP → VBD ADVP-TMP VP. . .10 WHNP-1 → WRB JJ10 VP → VP CC VP PP-TMP10 VP → VP CC VP ADVP-MNR10 VP → VBZ S , SBAR-ADV10 VP → VBZ S ADVP-TMP

33 / 87

Page 34: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Penn Treebank Rules: Statistics32,728 rules in the training section (not including 52,257 lexicon rules)4,021 rules in the development sectionoverlap: 3,128

34 / 87

Page 35: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

(Phrase-Structure) Recognition and Parsing

Given a CFG (N , S,Σ,R) and a sentence x, the recognition problem is:

Is x in the language of the CFG?

Related problem: parsing:

Show one or more derivations for x, using R.

35 / 87

Page 36: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

(Phrase-Structure) Recognition and Parsing

Given a CFG (N , S,Σ,R) and a sentence x, the recognition problem is:

Is x in the language of the CFG?

The proof is a derivation.

Related problem: parsing:

Show one or more derivations for x, using R.

36 / 87

Page 37: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

(Phrase-Structure) Recognition and Parsing

Given a CFG (N , S,Σ,R) and a sentence x, the recognition problem is:

Is x in the language of the CFG?

The proof is a derivation.

Related problem: parsing:

Show one or more derivations for x, using R.

With reasonable grammars, the number of parses is exponential in |x|.

37 / 87

Page 38: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Ambiguity

S

NP

I

VP

shot NP

an Nominal

Nominal

elephant

PP

in my pajamas

S

NP

I

VP

VP

shot NP

an Nominal

elephant

PP

in my pajamas

38 / 87

Page 39: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Parser Evaluation

Represent a parse tree as a collection of tuples 〈〈`1, i1, j1〉, 〈`2, i2, j2〉, . . . , 〈`n, in, jn〉,where

I `k is the nonterminal labeling the kth phrase

I ik is the index of the first word in the kth phrase

I jk is the index of the last word in the kth phrase

Example:

S

Aux

does

NP

Det

this

Noun

flight

VP

Verb

include

NP

Det

a

Noun

meal

−→⟨〈S, 1, 6〉, 〈NP, 2, 3〉,〈VP, 4, 6〉, 〈NP, 5, 6〉

Convert gold-standard tree and system hypothesized tree into this representation, thenestimate precision, recall, and F1.

39 / 87

Page 40: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Tree Comparison ExampleS

NP

I

VP

shot NP

an Nominal

Nominal

elephant

PP

in NP

my pajamas

S

NP

I

VP

VP

shot NP

an Nominal

elephant

PP

in NP

my pajamas

⟨〈NP, 3, 7〉,〈Nominal, 4, 7〉

⟩︸ ︷︷ ︸

only in left tree

⟨ 〈NP, 1, 1〉〈S, 1, 7〉, 〈VP, 2, 7〉,〈PP, 5, 7〉, 〈NP, 6, 7〉〈Nominal, 4, 4〉

⟩︸ ︷︷ ︸

in both trees

⟨〈VP, 2, 4〉,〈NP, 3, 4〉

⟩︸ ︷︷ ︸only in right tree

40 / 87

Page 41: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Two Views of Parsing

1. Incremental search: the state of the search is the partial structure built so far;each action incrementally extends the tree.

I Often greedy, with a statistical classifier deciding what action to take in every state.

2. Discrete optimization: define a scoring function and seek the tree with the highestscore.

I Today: scores are defined using the rules.

predict(x) = argmaxt

∏r∈R

s(r)ct(r) = argmaxt

∑r∈R

ct(r) log s(r)

where t is constrained to include grammatical trees with x as their yield. Denotethis set Tx.

41 / 87

Page 42: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Two Views of Parsing

1. Incremental search: the state of the search is the partial structure built so far;each action incrementally extends the tree.

I Often greedy, with a statistical classifier deciding what action to take in every state.

2. Discrete optimization: define a scoring function and seek the tree with the highestscore.

I Today: scores are defined using the rules.

predict(x) = argmaxt

∏r∈R

s(r)ct(r) = argmaxt

∑r∈R

ct(r) log s(r)

where t is constrained to include grammatical trees with x as their yield. Denotethis set Tx.

42 / 87

Page 43: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Two Views of Parsing

1. Incremental search: the state of the search is the partial structure built so far;each action incrementally extends the tree.

I Often greedy, with a statistical classifier deciding what action to take in every state.

2. Discrete optimization: define a scoring function and seek the tree with the highestscore.

I Today: scores are defined using the rules.

predict(x) = argmaxt

∏r∈R

s(r)ct(r) = argmaxt

∑r∈R

ct(r) log s(r)

where t is constrained to include grammatical trees with x as their yield. Denotethis set Tx.

43 / 87

Page 44: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Two Views of Parsing

1. Incremental search: the state of the search is the partial structure built so far;each action incrementally extends the tree.

I Often greedy, with a statistical classifier deciding what action to take in every state.

2. Discrete optimization: define a scoring function and seek the tree with the highestscore.

I Today: scores are defined using the rules.

predict(x) = argmaxt

∏r∈R

s(r)ct(r) = argmaxt

∑r∈R

ct(r) log s(r)

where t is constrained to include grammatical trees with x as their yield. Denotethis set Tx.

44 / 87

Page 45: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Two Views of Parsing

1. Incremental search: the state of the search is the partial structure built so far;each action incrementally extends the tree.

I Often greedy, with a statistical classifier deciding what action to take in every state.

2. Discrete optimization: define a scoring function and seek the tree with the highestscore.

I Today: scores are defined using the rules.

predict(x) = argmaxt

∏r∈R

s(r)ct(r) = argmaxt

∑r∈R

ct(r) log s(r)

where t is constrained to include grammatical trees with x as their yield. Denotethis set Tx.

45 / 87

Page 46: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Probabilistic Context-Free Grammar

A probabilistic context-free grammar consists of:I A finite set of nonterminal symbols N

I A start symbol S ∈ NI A finite alphabet Σ, called “terminal” symbols, distinct from NI Production rule set R, each of the form “N → α” where

I The lefthand side N is a nonterminal from NI The righthand side α is a sequence of zero or more terminals and/or nonterminals:α ∈ (N ∪ Σ)∗

I Special case: Chomsky normal form constrains α to be either a single terminalsymbol or two nonterminals

I For each N ∈ N , a probability distribution over the rules where N is the lefthandside, p(∗ | N).

46 / 87

Page 47: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

PCFG Example

S

Write down the start symbol. Here: S

Score:

1

47 / 87

Page 48: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

PCFG Example

S

Aux NP VP

Choose a rule from the “S” distribution. Here: S → Aux NP VP

Score:

p(Aux NP VP | S)

48 / 87

Page 49: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

PCFG Example

S

Aux

does

NP VP

Choose a rule from the “Aux” distribution. Here: Aux → does

Score:

p(Aux NP VP | S) · p(does | Aux)

49 / 87

Page 50: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

PCFG Example

S

Aux

does

NP

Det Noun

VP

Choose a rule from the “NP” distribution. Here: NP → Det Noun

Score:

p(Aux NP VP | S) · p(does | Aux) · p(Det Noun | NP)

50 / 87

Page 51: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

PCFG Example

S

Aux

does

NP

Det

this

Noun

VP

Choose a rule from the “Det” distribution. Here: Det → this

Score:

p(Aux NP VP | S) · p(does | Aux) · p(Det Noun | NP) · p(this | Det)

51 / 87

Page 52: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

PCFG Example

S

Aux

does

NP

Det

this

Noun

flight

VP

Choose a rule from the “Noun” distribution. Here: Noun → flight

Score:

p(Aux NP VP | S) · p(does | Aux) · p(Det Noun | NP) · p(this | Det)

· p(flight | Noun)

52 / 87

Page 53: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

PCFG Example

S

Aux

does

NP

Det

this

Noun

flight

VP

Verb NP

Choose a rule from the “VP” distribution. Here: VP → Verb NPScore:

p(Aux NP VP | S) · p(does | Aux) · p(Det Noun | NP) · p(this | Det)

· p(flight | Noun) · p(Verb NP | VP)

53 / 87

Page 54: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

PCFG Example

S

Aux

does

NP

Det

this

Noun

flight

VP

Verb

include

NP

Choose a rule from the “Verb” distribution. Here: Verb → includeScore:

p(Aux NP VP | S) · p(does | Aux) · p(Det Noun | NP) · p(this | Det)

· p(flight | Noun) · p(Verb NP | VP) · p(include | Verb)

54 / 87

Page 55: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

PCFG Example

S

Aux

does

NP

Det

this

Noun

flight

VP

Verb

include

NP

Det Noun

Choose a rule from the “NP” distribution. Here: NP → Det NounScore:

p(Aux NP VP | S) · p(does | Aux) · p(Det Noun | NP) · p(this | Det)

· p(flight | Noun) · p(Verb NP | VP) · p(include | Verb)

· p(Det Noun | NP)

55 / 87

Page 56: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

PCFG Example

S

Aux

does

NP

Det

this

Noun

flight

VP

Verb

include

NP

Det

a

Noun

Choose a rule from the “Det” distribution. Here: Det → aScore:

p(Aux NP VP | S) · p(does | Aux) · p(Det Noun | NP) · p(this | Det)

· p(flight | Noun) · p(Verb NP | VP) · p(include | Verb)

· p(Det Noun | NP) · p(a | Det)56 / 87

Page 57: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

PCFG Example

S

Aux

does

NP

Det

this

Noun

flight

VP

Verb

include

NP

Det

a

Noun

meal

Choose a rule from the “Noun” distribution. Here: Noun → mealScore:

p(Aux NP VP | S) · p(does | Aux) · p(Det Noun | NP) · p(this | Det)

· p(flight | Noun) · p(Verb NP | VP) · p(include | Verb)

· p(Det Noun | NP) · p(a | Det) · p(meal | Noun)57 / 87

Page 58: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

PCFG as a Noisy Channel

source −→ T −→ channel −→ X

The PCFG defines the source model.

The channel is deterministic: it erases everything except the tree’s leaves (the yield).

Decoding:

argmaxt

p(t) ·{

1 if t ∈ Tx0 otherwise

= argmaxt∈Tx

p(t)

58 / 87

Page 59: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Probabilistic Parsing with CFGs

I How to set the probabilities p(righthand side | lefthand side)?

I How to decode/parse?

59 / 87

Page 60: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Probabilistic CKY(Cocke and Schwartz, 1970; Kasami, 1965; Younger, 1967)

Input:

I a PCFG (N , S,Σ,R, p(∗ | ∗)), in Chomsky normal form

I a sentence x (let n be its length)

Output: argmaxt∈Tx

p(t | x) (if x is in the language of the grammar)

60 / 87

Page 61: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Probabilistic CKYBase case: for i ∈ {1, . . . , n} and for each N ∈ N :

si:i(N) = p(xi | N)

For each i, k such that 1 ≤ i < k ≤ n and each N ∈ N :

si:k(N) = maxL,R∈N ,j∈{i,...,k−1}

p(L R | N) · si:j(L) · s(j+1):k(R)

N

L

xi . . . xj

R

xj+1 . . . xk

Solution:

s1:n(S) = maxt∈Tx

p(t)

61 / 87

Page 62: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Parse Chart

x1

x2

x3

x4

x5

62 / 87

Page 63: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Parse Chart

s1:1(∗)

x1 s2:2(∗)

x2 s3:3(∗)

x3 s4:4(∗)

x4 s5:5(∗)

x5

63 / 87

Page 64: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Parse Chart

s1:1(∗) s1:2(∗)

x1 s2:2(∗) s2:3(∗)

x2 s3:3(∗) s3:4(∗)

x3 s4:4(∗) s4:5(∗)

x4 s5:5(∗)

x5

64 / 87

Page 65: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Parse Chart

s1:1(∗) s1:2(∗) s1:3(∗)

x1 s2:2(∗) s2:3(∗) s2:4(∗)

x2 s3:3(∗) s3:4(∗) s3:5(∗)

x3 s4:4(∗) s4:5(∗)

x4 s5:5(∗)

x5

65 / 87

Page 66: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Parse Chart

s1:1(∗) s1:2(∗) s1:3(∗) s1:4(∗)

x1 s2:2(∗) s2:3(∗) s2:4(∗) s2:5(∗)

x2 s3:3(∗) s3:4(∗) s3:5(∗)

x3 s4:4(∗) s4:5(∗)

x4 s5:5(∗)

x5

66 / 87

Page 67: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Parse Chart

s1:1(∗) s1:2(∗) s1:3(∗) s1:4(∗) s1:5(∗)

x1 s2:2(∗) s2:3(∗) s2:4(∗) s2:5(∗)

x2 s3:3(∗) s3:4(∗) s3:5(∗)

x3 s4:4(∗) s4:5(∗)

x4 s5:5(∗)

x5

67 / 87

Page 68: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Remarks

I Space and runtime requirements?

I Recovering the best tree?

I Probabilistic Earley’s algorithm does not require the grammar to be in Chomskynormal form.

68 / 87

Page 69: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Remarks

I Space and runtime requirements? O(|N |n2) space, O(|R|n3) runtime.

I Recovering the best tree?

I Probabilistic Earley’s algorithm does not require the grammar to be in Chomskynormal form.

69 / 87

Page 70: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Remarks

I Space and runtime requirements? O(|N |n2) space, O(|R|n3) runtime.

I Recovering the best tree?

I Probabilistic Earley’s algorithm does not require the grammar to be in Chomskynormal form.

70 / 87

Page 71: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Remarks

I Space and runtime requirements? O(|N |n2) space, O(|R|n3) runtime.

I Recovering the best tree? Backpointers.

I Probabilistic Earley’s algorithm does not require the grammar to be in Chomskynormal form.

71 / 87

Page 72: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Remarks

I Space and runtime requirements? O(|N |n2) space, O(|R|n3) runtime.

I Recovering the best tree? Backpointers.

I Probabilistic Earley’s algorithm does not require the grammar to be in Chomskynormal form.

72 / 87

Page 73: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

The Declarative View of CKY

i k

N

j + 1 k

R

i j

L

p(L R | N) i i

N

p(xi | N)

1 n

Sgoal:

73 / 87

Page 74: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Probabilistic CKY with an Agenda

1. Initialize every item’s value in the chart to the “default” (zero).

2. Place all initializing updates onto the agenda.

3. While the agenda is not empty or the goal is not reached:I Pop the highest-priority update from the agenda (item I with value v)I If I = goal, then return v.I If v > chart(I):

I chart(I)← vI Find all combinations of I with other items in the chart, generating new possible

updates; place these on the agenda.

Any priority function will work! But smart ordering will save time.

This idea can also be applied to other algorithms (e.g., Viterbi).

74 / 87

Page 75: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Starting Point: Phrase Structure

S

NP

DT

The

NN

luxury

NN

auto

NN

maker

NP

JJ

last

NN

year

VP

VBD

sold

NP

CD

1,214

NN

cars

PP

IN

in

NP

DT

the

NNP

U.S.

75 / 87

Page 76: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Parent Annotation(Johnson, 1998)

SROOT

NPS

DTNP

The

NNNP

luxury

NNNP

auto

NNNP

maker

NPS

JJNP

last

NNNP

year

VPS

VBDVP

sold

NPVP

CDNP

1,214

NNNP

cars

PPVP

INPP

in

NPPP

DTNP

the

NNPNP

U.S.

Increases the “vertical” Markov order:

p(children | parent, grandparent)

76 / 87

Page 77: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Headedness

S

NP

DT

The

NN

luxury

NN

auto

NN

maker

NP

JJ

last

NN

year

VP

VBD

sold

NP

CD

1,214

NN

cars

PP

IN

in

NP

DT

the

NNP

U.S.

Suggests “horizontal” markovization:

p(children | parent) = p(head | parent) ·∏i

p(ith sibling | head, parent)

77 / 87

Page 78: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Lexicalization

Ssold

NPmaker

DTThe

The

NNluxury

luxury

NNauto

auto

NNmaker

maker

NPyear

JJlast

last

NNyear

year

VPsold

VBDsold

sold

NPcars

CD1,214

1,214

NNcars

cars

PPin

INin

in

NPU.S.

DTthe

the

NNPU.S.

U.S.

Each node shares a lexical head with its head child.

78 / 87

Page 79: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Transformations on Trees

Starting around 1998, many different ideas—both linguistic and statistical—about howto transform treebank trees.

All of these make the grammar larger—and therefore all frequencies becamesparser—so a lot of research on smoothing the probability rules.

Parent annotation, headedness, markovization, and lexicalization; also categoryrefinement by linguistic rules (Klein and Manning, 2003).

I These are reflected in some versions of the popular Stanford and Berkeley parsers.

79 / 87

Page 80: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Tree Decorations(Klein and Manning, 2003)

I Mark nodes with only 1 child as UNARY

I Mark DTs (determiners), RBs (adverbs) when they are only children

I Annotate POS tags with their parents

I Split IN (prepositions; 6 ways), AUX, CC, %

I NPs: temporal, possessive, base

I VPs annotated with head tag (finite vs. others)

I DOMINATES-V

I RIGHT-RECURSIVE NP

80 / 87

Page 81: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Machine Learning and Parsing

I Define arbitrary features on trees, based on linguistic knowledge; to parse, use aPCFG to generate a k-best list of parses, then train a log-linear model to rerank(Charniak and Johnson, 2005).

I K-best parsing: Huang and Chiang (2005)

I Define rule-local features on trees (and any part of the input sentence); minimizehinge or log loss.

I These exploit dynamic programming algorithms for training (CKY for arbitraryscores, and the sum-product version).

I Learn refinements on the constituents, as latent variables (Petrov et al., 2006).I Neural, too:

I Socher et al. (2013) define compositional vector grammars that associate eachphrase with a vector, calculated as a function of its subphrases’ vectors. Usedessentially to rerank.

I Dyer et al. (2016): recurrent neural network grammars, generative models likePCFGs that encode arbitrary previous derivation steps in a vector. Parsing requiressome tricks.

81 / 87

Page 82: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Machine Learning and Parsing

I Define arbitrary features on trees, based on linguistic knowledge; to parse, use aPCFG to generate a k-best list of parses, then train a log-linear model to rerank(Charniak and Johnson, 2005).

I K-best parsing: Huang and Chiang (2005)

I Define rule-local features on trees (and any part of the input sentence); minimizehinge or log loss.

I These exploit dynamic programming algorithms for training (CKY for arbitraryscores, and the sum-product version).

I Learn refinements on the constituents, as latent variables (Petrov et al., 2006).I Neural, too:

I Socher et al. (2013) define compositional vector grammars that associate eachphrase with a vector, calculated as a function of its subphrases’ vectors. Usedessentially to rerank.

I Dyer et al. (2016): recurrent neural network grammars, generative models likePCFGs that encode arbitrary previous derivation steps in a vector. Parsing requiressome tricks.

82 / 87

Page 83: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Machine Learning and Parsing

I Define arbitrary features on trees, based on linguistic knowledge; to parse, use aPCFG to generate a k-best list of parses, then train a log-linear model to rerank(Charniak and Johnson, 2005).

I K-best parsing: Huang and Chiang (2005)

I Define rule-local features on trees (and any part of the input sentence); minimizehinge or log loss.

I These exploit dynamic programming algorithms for training (CKY for arbitraryscores, and the sum-product version).

I Learn refinements on the constituents, as latent variables (Petrov et al., 2006).I Neural, too:

I Socher et al. (2013) define compositional vector grammars that associate eachphrase with a vector, calculated as a function of its subphrases’ vectors. Usedessentially to rerank.

I Dyer et al. (2016): recurrent neural network grammars, generative models likePCFGs that encode arbitrary previous derivation steps in a vector. Parsing requiressome tricks.

83 / 87

Page 84: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Machine Learning and Parsing

I Define arbitrary features on trees, based on linguistic knowledge; to parse, use aPCFG to generate a k-best list of parses, then train a log-linear model to rerank(Charniak and Johnson, 2005).

I K-best parsing: Huang and Chiang (2005)

I Define rule-local features on trees (and any part of the input sentence); minimizehinge or log loss.

I These exploit dynamic programming algorithms for training (CKY for arbitraryscores, and the sum-product version).

I Learn refinements on the constituents, as latent variables (Petrov et al., 2006).

I Neural, too:I Socher et al. (2013) define compositional vector grammars that associate each

phrase with a vector, calculated as a function of its subphrases’ vectors. Usedessentially to rerank.

I Dyer et al. (2016): recurrent neural network grammars, generative models likePCFGs that encode arbitrary previous derivation steps in a vector. Parsing requiressome tricks.

84 / 87

Page 85: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

Machine Learning and Parsing

I Define arbitrary features on trees, based on linguistic knowledge; to parse, use aPCFG to generate a k-best list of parses, then train a log-linear model to rerank(Charniak and Johnson, 2005).

I K-best parsing: Huang and Chiang (2005)

I Define rule-local features on trees (and any part of the input sentence); minimizehinge or log loss.

I These exploit dynamic programming algorithms for training (CKY for arbitraryscores, and the sum-product version).

I Learn refinements on the constituents, as latent variables (Petrov et al., 2006).I Neural, too:

I Socher et al. (2013) define compositional vector grammars that associate eachphrase with a vector, calculated as a function of its subphrases’ vectors. Usedessentially to rerank.

I Dyer et al. (2016): recurrent neural network grammars, generative models likePCFGs that encode arbitrary previous derivation steps in a vector. Parsing requiressome tricks.

85 / 87

Page 86: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

References I

Eugene Charniak and Mark Johnson. Coarse-to-fine n-best parsing and maxent discriminative reranking. InProc. of ACL, 2005.

Noam Chomsky. Three models for the description of language. Information Theory, IEEE Transactions on, 2(3):113–124, 1956.

John Cocke and Jacob T. Schwartz. Programming languages and their compilers: Preliminary notes. Technicalreport, Courant Institute of Mathematical Sciences, New York University, 1970.

Michael Collins. Probabilistic context-free grammars, 2011. URLhttp://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/pcfgs.pdf.

Chris Dyer, Adhiguna Kuncoro, Miguel Ballesteros, and Noah A. Smith. Recurrent neural network grammars,2016. To appear.

Liang Huang and David Chiang. Better k-best parsing. In Proc. of IWPT, 2005.

Mark Johnson. PCFG models of linguistic tree representations. Computational Linguistics, 24(4):613–32, 1998.

Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural LanguageProcessing, Computational Linguistics, and Speech Recognition. Prentice Hall, second edition, 2008.

Tadao Kasami. An efficient recognition and syntax-analysis algorithm for context-free languages. TechnicalReport AFCRL-65-758, Air Force Cambridge Research Lab, 1965.

Dan Klein and Christopher D. Manning. Accurate unlexicalized parsing. In Proc. of ACL, 2003.

86 / 87

Page 87: Natural Language Processing (CSEP 517): Phrase …...Natural Language Processing (CSEP 517): Phrase Structure Syntax and Parsing Noah Smith c 2017 University of Washington nasmith@cs.washington.edu

References II

Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a large annotated corpus ofEnglish: the Penn treebank. Computational Linguistics, 19(2):313–330, 1993.

Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. Learning accurate, compact, and interpretable treeannotation. In Proc. of COLING-ACL, 2006.

Geoffrey K. Pullum. The Great Eskimo Vocabulary Hoax and Other Irreverent Essays on the Study of Language.University of Chicago Press, 1991.

Richard Socher, John Bauer, Christopher D. Manning, and Andrew Y. Ng. Parsing with compositional vectorgrammars. In Proc. of ACL, 2013.

Daniel H. Younger. Recognition and parsing of context-free languages in time n3. Information and Control, 10(2), 1967.

87 / 87