795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

22
795M Winter 2000 28/01/22 Beyond PCFGs Chris Brew Ohio State University

Transcript of 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

Page 1: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 1

Beyond PCFGs

Chris Brew

Ohio State University

Page 2: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 2

Beyond PCFGs

Shift-reduce parsers probabilistic LR parsers Data-oriented parsers

Page 3: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 3

Motivation

Get round the limitations of the PCFG model

Exploit knowledge about individual words Build better language models

Page 4: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 4

Shift-reduce

Simple versioneither shift a word from the input list to the parse

stack

or reduce two elements from top of parse stack to a single tree

Hermjakob and Mooney cmp-lg 9706002– structures rather than just trees and words– more complex parse action language– not just binary rules

Page 5: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 5

Machine learning for shift-reduce

supervisor shows the system correct sequences of parsing actions

system tries to learn to predict correct actions– needs a feature language

as it learns, the supervisor has less need to override the actions chosen by the system.

Page 6: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 6

Examples of the feature language

– broad syntactic class of the third element on the stack

– the tense of the first element of the input list– Does top element of stack contain an object?– Could top frame be an adjectival degree adverb

(e.g. very)?– Is frame1 a possible agent/patient of frame2?– Do frame1 and frame2 satisfy subject-verb

agreement?

Page 7: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 7

Hand-crafted knowledge used

205 features, all moderately local (no references to 1000th element of the stack or anything like that)

4356 node lexical knowledge base subcategorisation table for 242 verbs But we learn the association between

features and actions

Page 8: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 8

Various different hybrid decision structures

best was a hierarchical list of decision trees which encoded information about the task. Schematically.– decide whether to do anything

» if not, we are done

» if so, decide whether to do a reduction if so, decide which reduction if not, decide what sort of shift to do

Page 9: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 9

Evaluation

Corpus of 272 annotated sentences. 17-fold cross validation (17 blocks of 16

sentences each) Precision of 92.7%, recall of 92.8%:

average length 17.1 words, with 43.5 parse actions per sentence. Parseval measures.

Correct structure and labelling 26.8% (i.e. 1 in 4 sentences are completely correct)

Page 10: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 10

Comments on Hermjakob and Mooney

A lot of grunt work needed - but not as much as full rationalist NLP system

The knowledge used is micro-modular very small pieces of highly independent knowledge

Test set is small, sentences short Fairly robust Good on small scale tests in an

English/German MT task

Page 11: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 11

Probabilistic LR Parsers

Briscoe and Carroll CL 19(1) pp 25-59)

PCFGsgive these subtrees same probability

N

N N

N

N

N

N N

N

N

Page 12: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 12

LR Parsing

Builds a parsing table which gives parsing actions and Gotos for possible combinations of parser state and input symbols

There may be parsing action conflicts, in which more than one action is available.

In programming language grammars, you almost never want conflicts.

In NL grammars, you have no escape!

Page 13: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 13

Probabilistic LR

When there is a conflict, non-deterministically execute all possible actions.

But score them according to a probability distribution.

So where do the probabilities come from? And what do they mean? See analysis in Stolcke’s paper relating them to his forward and inner probabilities.

Page 14: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 14

LR parsing using Alvey Tools Grammar

Wide coverage unification grammar written by Claire Grover and Ted Briscoe

Build LR tables from CF backbone of this grammar

Interactively build disambiguated training corpus by supervising choice of parse actions

Page 15: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 15

Evaluation

Very good performance on LDOCE noun definitions 76% correct structure and labelling

State of the art results in later work on tag sequence grammars where the available lexical information is more restricted. (54% correct structure and labelling)

Work underway to bring this technique to Wall Street Journal data for comparison with other methods

Page 16: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 16

Data-oriented parsing Rens Bod: Enriching Linguistics with Statistics:

Performance Models of Natural Language, Amsterdam Ph.D

Treebank data again (this time ATIS -- 600 sentences)

Radical rejection of context-free assumption Count subtrees of arbitrary depth, not rule

applications

Page 17: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 17

A corpus

S

NP

Matthew

VP

V

hates

NP

Patrick

S

NP

Jamie

VP

V

likes

NP

Patrick

S

NP

Matthew

VP

V

likes

NP

Euan

Page 18: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 18

Tree fragments

Some of the fragments

S

NP

Matthew

VP

V

likes

NP

Euan

S

NP

Matthew

VP

V

likes

NP

Euan

NP

Euan

V

likes

NP

Matthew

S

NP VP

S

NP VP

V NP

S

NP

Matthew

VP

S

NP

Matthew

VP

V NP

Euan

Page 19: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 19

The probability of a tree

The probability of all the ways of making it out of fragments

The probability of a fragment is given as a ratio between the frequency of the fragment and the total frequency of all fragments having that root

Page 20: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 20

Complexity

It’s hard to find efficient algorithms for sewing together DOP trees (cf. Si’maan for solutions)

Only very small corpora feasible In practice, depth may have to be limited. Many tree fragments are very rare, so there

is an issue about smoothing

Page 21: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 21

Evaluation

several variations studied, DOP4 geta parse accuracies around 80% without a hand-coded dictionary, DOP-5 around 90% with.

results to be interpreted with caution due to small size of corpus

Evaluation on Dutch OVIS domain suggests that DOP is not competitive with Groningen’s more labour intensive system (but maybe that’s not the point)

Page 22: 795M Winter 200017/11/20151 Beyond PCFGs Chris Brew Ohio State University.

795M Winter 2000 20/04/23 22

Where to find out more

Papers by Bod, Carroll, Hermjakob. Manning and Schütze ch 12. http://xxx.soton.ac.uk/archive/cs/intro.html (subarea Computation and Language)