Computational Linguistics Introduction

24
November 2011 CLINT-LN CFG 1 Computational Linguistics Introduction Context Free Grammars

description

Computational Linguistics Introduction. Context Free Grammars. Chomsky Hierarchy. Context Free Grammars (CFGs). Sets of rules expressing how symbols of the language fit together, e.g. S -> NP VP NP -> Det N Det -> the N -> dog. What Does Context Free Mean?. - PowerPoint PPT Presentation

Transcript of Computational Linguistics Introduction

Page 1: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 1

Computational Linguistics Introduction

Context Free Grammars

Page 2: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 2

Chomsky Hierarchy

Page 3: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 3

Context Free Grammars (CFGs)

Sets of rules expressing how symbols of the language fit together, e.g.

S -> NP VPNP -> Det NDet -> theN -> dog

Page 4: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 4

What Does Context Free Mean?

• LHS of rule is just one symbol.

• Can haveNP -> Det N

• Cannot haveX NP Y -> X Det N Y

• Everyday examples of context sensitivity

Page 5: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 5

What Does Context Free Mean?

• LHS of rule is just one symbol.• Can haveNP -> Det N

• Cannot haveX NP Y -> X Det N Y

• Everyday examples of context sensitivityy e -> i e[b]# -> [p]#

Page 6: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 6

Grammar Symbols

• Non Terminal Symbols

• Terminal Symbols– Words– Preterminals

Page 7: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 7

Non Terminal Symbols

• Symbols which have definitions

• Symbols which appear on the LHS of rules

S -> NP VPNP -> Det NDet -> theN -> dog

Page 8: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 8

Non Terminal Symbols

• Same Non Terminals can have several definitions

S -> NP VPNP -> Det NNP -> N Det -> theN -> dog

Page 9: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 9

Terminal Symbols

• Symbols which appear in final string• Correspond to words• Are not defined by the grammar

S -> NP VPNP -> Det NDet -> theN -> dog

Page 10: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 10

Parts of Speech (POS)

• NT Symbols which produce terminal symbols are sometimes called pre-terminalsS -> NP VPNP -> Det NDet -> theN -> dog

• Sometimes we are interested in the shape of sentences formed from pre-terminalsDet N VAux N V D N

Page 11: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 11

CFG - formal definition

A CFG is a tuple (N,,R,S)

• N is a set of non-terminal symbols is a set of terminal symbols disjoint from

N

• R is a set of rules each of the form A where A is non-terminal

• S is a designated start-symbol

Page 12: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 12

CFG - Example

grammar:

S NP VPNP NVP V NPlexicon:

V kicksN JohnN Bill

N = {S, NP, VP, N, V}

= {kicks, John, Bill}

R = (see opposite)

S = “S”

Page 13: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 13

Class Exercise

• Write grammars that generate the following languages, for m > 0(ab)m anbm

anbn

• Which of these are Regular?• Which of these are Context Free?

Page 14: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 14

(ab)m for m > 0

S -> a b

S -> a b S

Page 15: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 15

(ab)m for m > 0

S -> a bS -> a b S

S -> a XX -> b YY -> a bY -> S

Page 16: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 16

anbm

S -> A B

A -> a

A -> a A

B -> b

B -> b B

Page 17: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 17

anbm

S -> A B

A -> a

A -> a A

B -> b

B -> b B

S -> a AB

AB -> a AB

AB -> B

B -> b

B -> b B

Page 18: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 18

Grammar Defines a Structure

grammar:

S NP VPNP NVP V NPlexicon:

V kicksN JohnN Bill

S

NP

N

John kicks

NPV

VP

N

Bill

Page 19: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 19

Different Grammar Different Stucture

grammar:

S NP NPNP N VNP Nlexicon:

V kicksN JohnN Bill

S

NP

N

BillJohn

VN

NP

kicks

Page 20: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 20

Which Grammar is Best?

• The structure assigned by the grammar should be appropriate.

• The structure should– Be understandable– Allow us to make generalisations.– Reflect the underlying meaning of the

sentence.

Page 21: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 21

Ambiguity

• A grammar is ambiguous if it assigns two or more structures to the same sentence.NP NP CONJ NPNP Nlexicon:CONJ andN JohnN Bill

• The grammar should not generate too many possible structures for the same sentence.

Page 22: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 22

Criteria for Evaluating Grammars

• Does it undergenerate?• Does it overgenerate?• Does it assign appropriate structures to

sentences it generates?• Is it simple to understand? How many rules are

there?• Does it contain just a few generalisations or is it

full of special cases?• How ambiguous is it? How many structures does

it assign for a given sentence?

Page 23: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 23

PATR-II

• The PATR-II formalism can be viewed as a computer language for encoding linguistic information.

• A PATR-II grammar consists of a set of rules and a lexicon.

• The rules are CFG rules augmented with constaints.

• The lexicon provides information about terminal symbols.

Page 24: Computational Linguistics Introduction

November 2011 CLINT-LN CFG 24

Example PATR-IIGrammar and Lexicon

Grammar (grammar.grm)

Rule

s -> np vp

Rule

np -> n

Rule

vp -> v

Lexicon (lexicon.lex)

\w uther

\c n

\w sleeps

\c v