Phrase structure grammar and phrase structure...

49
Phrase structure grammar and phrase structure parsing Natural Language Processing (5LN710) 2012-12-06 Marco Kuhlmann Department of Linguistics and Philology Montag, 10. Dezember 12

Transcript of Phrase structure grammar and phrase structure...

Page 1: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Phrase structure grammar and phrase structure parsing

Natural Language Processing (5LN710)

2012-12-06

Marco KuhlmannDepartment of Linguistics and Philology

Montag, 10. Dezember 12

Page 2: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

What is syntax?

• Syntax addresses the question how sentences are constructed in particular languages.

• A grammar is a set of rules that govern the composition of sentences.

• Parsing refers to the process of analyzing an utterance in terms of its syntactic structure.

Montag, 10. Dezember 12

Page 3: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Why should you care?

Syntactic information is important for many tasks:

• Question answeringWhat books did he like?

• Grammar checkingHe is friend of mine.

• Information extractionOracle acquired Sun.

Montag, 10. Dezember 12

Page 4: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Theoretical frameworks

• Generative syntaxNoam Chomsky (1928–)

• Dependency syntaxLucien Tesnière (1893–1954)

• Categorial syntaxKazimierz Ajdukiewicz (1890–1963)

Montag, 10. Dezember 12

Page 5: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Theoretical frameworks

Chomsky AjdukiewiczTesnière

Montag, 10. Dezember 12

Page 6: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Overview

• Phrase structure grammars

• Phrase structure parsing

• Dependency parsing

Montag, 10. Dezember 12

Page 7: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Schedule

week 49 (2 hrs taught, 6 hrs reading, 8 hrs assignment)week 49 (2 hrs taught, 6 hrs reading, 8 hrs assignment)week 49 (2 hrs taught, 6 hrs reading, 8 hrs assignment)

06/12 10–12 Lecture 1

week 50 (4 hrs taught, 8 hrs reading, 28 hrs assignment)week 50 (4 hrs taught, 8 hrs reading, 28 hrs assignment)week 50 (4 hrs taught, 8 hrs reading, 28 hrs assignment)

11/12 10–12 Lab (detailed discussion of the assignment)

13/12 10–12 Lecture 2

week 51 (8 hrs assignment)week 51 (8 hrs assignment)week 51 (8 hrs assignment)

17/12 Deadline for the assignment (early bird)

week 01week 01week 01

06/01 Deadline for the assignment (regular)

week 03week 03week 03

20/01 Deadline for the assignment (completions and belated hand-ins)

Montag, 10. Dezember 12

Page 8: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Reading

• Daniel Jurafsky and James H. Martin. Speech and Language Processing. 2nd edition. Pearson Education, 2009. Chapters 13 and 14.

• Sandra Kübler, Ryan McDonald, and Joakim Nivre. Dependency Parsing. Morgan and Claypool, 2009. Chapter 3.

Montag, 10. Dezember 12

Page 9: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Phrase structure grammars

Montag, 10. Dezember 12

Page 10: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Constituency

• A basic observation about syntactic structure is that groups of words can act as single units.

Los Angeles. A high-class spot such as Mindy’s. Three parties from Brooklyn. They.

• Such groups of words are called constituents.

• Constituents tend to have similar internal structure, and behave similarly with respect to other units.

Montag, 10. Dezember 12

Page 11: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Examples of constituents

• noun phrases (NP)she, the house, Robin Hood and his merry men, a high-class spot such as Mindy’s

• verb phrases (VP)blushed, loves Mary, was told to sit down and be quiet, lived happily ever after

• prepositional phrases (PP)on it, with the telescope, through the foggy dew, apart from everything I have said so far

Constituency

Montag, 10. Dezember 12

Page 12: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Overview

• Constituency

• Context-free grammar (CFG)

• Ambiguity

• Probabilistic context-free grammar (PCFG)

Montag, 10. Dezember 12

Page 13: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Context-free grammar

• simple yet powerful formalism to describe the syntactic structure of natural languages

• developed in the mid-1950s by Noam Chomsky

• allows one to specify rules that state how a constituent can be segmented into smaller and smaller constituents, up to the level of individual words

Noam Chomsky

Montag, 10. Dezember 12

Page 14: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Context-free grammar

A context-free grammar (CFG) consists of

• a finite set of nonterminal symbols

• a finite set of terminal symbols

• a distinguished nonterminal symbol S (‘sentence’)

• a finite set of rules of the form A → α, where A is a nonterminal and α is a possibly empty sequence of nonterminal and terminal symbols

Context-free grammar

Montag, 10. Dezember 12

Page 15: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

A sample context-free grammar

Grammar rule Example

S → NP VP I + want a morning flight

NP → Pronoun I

NP → Proper-Noun Los Angeles

NP → Det Nominal a flight

Nominal → Nominal Noun morning flight

Nominal → Noun flights

VP → Verb do

VP → Verb NP want + a flight

VP → Verb NP PP leave + Boston + in the morning

VP → Verb PP leaving + on Thursday

PP → Preposition NP from + Los Angeles

Context-free grammar

Montag, 10. Dezember 12

Page 16: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

A sample phrase structure tree

prefer

a

morning

flightNoun

Nom Noun

NomDet

NPVerb

I

Pro

VPNP

S

Context-free grammar

Montag, 10. Dezember 12

Page 17: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

A sample phrase structure tree

leaves (bottom)

prefer

a

morning

flightNoun

Nom Noun

NomDet

NPVerb

I

Pro

VPNP

S root (top)

Context-free grammar

Montag, 10. Dezember 12

Page 18: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Problems with context-free grammars

• While context-free grammar can account for much of the syntactic structure of English, it is not a perfect solution.

• Two problems for context-free grammar:

• agreement constraints

• subcategorization constraints

Context-free grammar

Montag, 10. Dezember 12

Page 19: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Agreement

• The term agreement refers to constraints that hold between constituents that take part in a rule or a set of rules.

• In English, subject and verb agree in number:

[This flight] [leaves on Monday]. * [These flights] [leaves on Monday].

• Our earlier rules are deficient in the sense that they do not capture this constraint. They overgenerate.

Context-free grammar

Montag, 10. Dezember 12

Page 20: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Agreement

• One possible solution: Fold agreement information into the rules.

• While this approach is sound, it is practically infeasible: The grammars get too large.

Grammar rule Example

S → NP[sg] VP[sg] this flight + leaves on Monday

NP[sg] → Det[sg] Nom[sg] this + flight

VP[sg] → Verb[sg] PP leaves + on Monday

NP[pl] → Det[pl] NP[pl] these + flights

Context-free grammar

Montag, 10. Dezember 12

Page 21: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Subcategorization

• English VPs consist of a main verb along with zero or more constituents that we can call arguments.

• But not all verbs are allowed to participate in all of those rules: we need to subcategorize them.

Grammar rule Example

VP → Verb sleep

VP → Verb NP want + a flight

VP → Verb NP PP leave + Boston + in the morning

VP → Verb PP leave + on Thursday

Context-free grammar

Montag, 10. Dezember 12

Page 22: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Subcategorization

Modern grammars may have several hundreds of subcategories. Examples:

Verb Example

sleep John slept.

find + NP Please find [a flight to New York].

give + NP + NP Give [me] [a cheaper fare].

help + NP + PP Can you help [me] [with a flight]?

prefer + TO-VP I prefer [to leave earlier].

told + S I was told [United has a flight].

Context-free grammar

Montag, 10. Dezember 12

Page 23: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Problems with context-free grammars

• CFGs appear to be adequate for describing a lot of the syntactic structure of English.

The situation for other languages is less clear …

• Problems include agreement and subcategorization.

These problems can be solved within the CFG formalism, but the resulting grammars are not practical.

• There are other, more elegant formalisms.

But these formalisms have more formal power than CFG, and are harder to parse.

Context-free grammar

Montag, 10. Dezember 12

Page 24: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Overview

• Constituency

• Context-free grammar (CFG)

• Ambiguity

• Probabilistic context-free grammar (PCFG)

Montag, 10. Dezember 12

Page 25: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Ambiguity

I booked a flight from LA.

• This sentence is ambiguous. In what way?

• What should happen if we parse the sentence?

Montag, 10. Dezember 12

Page 26: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Ambiguity

booked

a

flight

Nom PP

NomDet

NPVerb

I

Pro

VPNP

S

from LANoun

Ambiguity

Montag, 10. Dezember 12

Page 27: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Ambiguity

booked

a

NomDet

NP PPVerb

I

Pro

VPNP

S

from LA

flight

Noun

Ambiguity

Montag, 10. Dezember 12

Page 28: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Combinatorial explosion

0

375

750

1125

1500

1 2 3 4 5 6 7 8

linear cubic exponential

Ambiguity

Montag, 10. Dezember 12

Page 29: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Overview

• Constituency

• Context-free grammar (CFG)

• Ambiguity

• Probabilistic context-free grammar (PCFG)

Montag, 10. Dezember 12

Page 30: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Probabilistic context-free grammar

• The number of possible parse trees grows rapidly with the length of the input.

• But not all parse trees are equally useful.

Example: I booked a flight from Los Angeles.

• In many applications, we want the ‘best’ parse tree, or the first few best trees.

• Special case: ‘best’ = ‘most probable’

Montag, 10. Dezember 12

Page 31: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Probabilistic context-free grammars

A probabilistic context-free grammar (PCFG) is a context-free grammar where

• each rule r has been assigned a probability p(r) between 0 and 1

• the probabilities of rules with the same left-hand side sum up to 1

Probabilistic context-free grammar

Montag, 10. Dezember 12

Page 32: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

A sample PCFG

Rule Probability

S → NP VP 1

NP → Pronoun 1/3

NP → Proper-Noun 1/3

NP → Det Nominal 1/3

Nominal → Nominal PP 1/3

Nominal → Noun 2/3

VP → Verb NP 8/9

VP → Verb NP PP 1/9

PP → Preposition NP 1

Probabilistic context-free grammar

Montag, 10. Dezember 12

Page 33: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

The probability of a parse tree

The probability of a parse tree is defined as the product of the probabilities of the rules that have been used to build the parse tree.

Probabilistic context-free grammar

Montag, 10. Dezember 12

Page 34: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

The probability of a parse tree

1/1

1/3 8/9

1/3

1/3

Probability: 16/729

booked

a

flight

Nom PP

NomDet

NPVerb

I

Pro

VPNP

S

from LANoun

2/3

Probabilistic context-free grammar

Montag, 10. Dezember 12

Page 35: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

The probability of a parse tree

1/1

1/3 1/9

1/3

Probability: 6/729

booked

a

NomDet

NP PPVerb

I

Pro

VPNP

S

from LA

flight

Noun

2/3

Probabilistic context-free grammar

Montag, 10. Dezember 12

Page 36: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Summary

• Context-free grammar can be used to provide a formal account of the syntax (of English).

• There are phenomena that context-free grammar is not able to handle gracefully (agreement, subcategorization).

• Natural language is ambiguous.

• One way to address the problem of ambiguity is to add probabilities to the rules of a grammar.

Montag, 10. Dezember 12

Page 37: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Phrase structure parsing

Montag, 10. Dezember 12

Page 38: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Parsing

Parsing is the automatic analysis of a sentence with respect to its syntactic structure.

Montag, 10. Dezember 12

Page 39: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Phrase structure trees

leaves (bottom)

prefer

a

morning

flightNoun

Nom Noun

NomDet

NPVerb

I

Pro

VPNP

S root (top)

Montag, 10. Dezember 12

Page 40: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Conventions

• We are given a probabilistic context-free grammar G and a sentence w = w1 … wn where the wi are individual words.

• We write S for the start symbol of G.

• We write |G| for the number of rules in G.

• We write |w| for the length of w, that is we have |w| = n.

Montag, 10. Dezember 12

Page 41: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

The CKY algorithm

• We are interested in an efficient algorithm that can compute a most probable parse tree for the sentence w given the grammar G.

• This task is also known as Viterbi parsing.

• By ‘efficient’, we mean that the algorithm should run in polynomial time with respect to the combined size |G| + |w|.

Montag, 10. Dezember 12

Page 42: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Combinatorial explosion

0

375

750

1125

1500

1 2 3 4 5 6 7 8

linear cubic exponential

The CKY algorithm

Montag, 10. Dezember 12

Page 43: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Dynamic programming

General idea: Do not work with individual objects, but group objects into classes that share relevant properties, and manipulate only these classes.

The CKY algorithm

Montag, 10. Dezember 12

Page 44: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Signatures

C

covers all words between min and max

[min, max, C]

The CKY algorithm

Montag, 10. Dezember 12

Page 45: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

The number of signatures

• The number of parse trees grows exponentially with |w|.

• The number of signatures only grows polynomially with |w|.

The CKY algorithm

Montag, 10. Dezember 12

Page 46: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Fencepost positions

We view the sentence w as a fence with n holes, one hole for each word wi , and we number the fenceposts from 0 till n.

0 1 2 3 4 5

morning

flight

I

want

a

The CKY algorithm

Montag, 10. Dezember 12

Page 47: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

The problem, reformulated

• Original problem: Find a most probable parse tree t for the sentence w according to the grammar G.

• Reformulated problem: Find a most probable parse tree t with signature [0, |w|, S].

• The CKY algorithm can solve this problem in time O(|G||w|3).

The CKY algorithm

Montag, 10. Dezember 12

Page 48: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Restrictions

• The CKY algorithm as we present it here can only handle rules of one of two forms: C → wi (preterminal) and C → C1 C2 (binary).

• This restriction is not a problem theoretically, but requires pre- and postprocessing.

The CKY algorithm

Montag, 10. Dezember 12

Page 49: Phrase structure grammar and phrase structure parsingstp.lingfil.uu.se/~nivre/master/5LN710/2012-12-06.pdf · Phrase structure grammar and phrase structure parsing Natural Language

Preterminal rules and binary rules

• preterminal rules: rules that rewrite a part-of-speech tag to a token, i.e. rules of the form C → wi .

Pro → I, Verb → booked, Noun → flight

• binary rules: rules that rewrite a syntactic category to exactly two other categories: C → C1 C2 .

S → NP VP, NP → Det Nom, VP → V NP

The CKY algorithm

Montag, 10. Dezember 12