A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima...

62
A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima’an Institute for Logic, Language and Computation University of Amsterdam

Transcript of A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima...

Page 1: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing

Remko Scha, Rens Bod, Khalil Sima’an

Institute for Logic, Language and Computation

University of Amsterdam

Page 2: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Outline of the lecture

Introduction Disambiguation Data Oriented Parsing DOP1 computational aspects and

experiments Memory Based Learning framework Conclusions

Page 3: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Introduction

Human language cognition: Analogy-based processes on a store of past experiences

Modern linguistics Set of rules

Language processing algorithms Performance model of human language

processing Competence grammar as broad framework to

performance models. Memory / Analogy - based language processing

Page 4: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

The Problem of Ambiguity Resolution

Every input string has unmanageable large number of analyses

Uncertain input – generate guesses and choose one

Syntactic disambiguation might be a side effect of semantic one

Page 5: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

The Problem of Ambiguity Resolution

Frequency of occurrence of lexical item and syntactic structures: People register frequencies

People prefer analyses they already experienced than constructing a new ones

More frequent analyses are preferred to less frequent ones

Page 6: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

From Probabilistic Competence-Grammars to Data-Oriented Parsing

Probabilistic information derived from past experience

Characterization of the possible sentence-analyses of the language

Stochastic Grammar Define : all sentences, all analyses. Assign : probability for each Achieve : preference that people display

when they choose sentence or analyses.

Page 7: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Stochastic Grammar

These predictions are limited

Platitudes and conventional phrases

Allow redundancy

Use Tree Substitution Grammar

Page 8: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Stochastic Tree Substitution Grammar

Set of elementary trees

Tree rewrite process

Redundant model

Statistically relevant phrases

Memory based processing model

Page 9: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Memory based processing model

Data oriented parsing approach: Corpus of utterances – past experience STSG to analyze new input

In order to describe a specific DOP model A formalism for representing utterance-

analyses An extraction function Combination operations A probability model

Page 10: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

A Simple Data Oriented Parsing Model: DOP1

Our corpus: DOP1 - Imaginary corpus of two trees Possible sub trees:

t consists of more than one node t is connected except for the leaf nodes of t, each node in t has the

same daughter-nodes as the corresponding node in T

Stochastic Tree Substitution Grammar – set of sub trees

Generation process – composition: A B – B is substituted on the leftmost non terminal leaf

node of A

Page 11: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Example of sub trees

Page 12: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

S

VPNP

NPVshe

wanted NP PP

the dress P NP

on the rack

DOP1 - Imaginary corpus of two trees

S

VPNP

VPshe PP

P NP

with the telescope

V NP

saw the dress

Page 13: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Derivation and parse #1

S

VPNP

VPshe PP

V NP

saw

NP

the dress

PP

P NP

with the telescope

S

VPNP

VPshe PP

P NP

with the telescope

V NP

saw the dress

She saw the dress with the telescope.

Page 14: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Derivation and parse #2

S

VPNP

VPshe PP

V NP

saw

NP

the dress

VP

P NP

with the telescope

S

VPNP

VPshe PP

P NP

with the telescope

V NP

saw the dress

She saw the dress with the telescope.

Page 15: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Probability Computations:

Probability of substituting a sub tree t on a specific node

Probability of Derivation

Probability of Parse Tree

. ( ) ( )

( )

t r t r t

tP t

t

1( ) ( ... ) ( )n ii

P D P t t P t

( ) ( )D derives T

P T P D

Page 16: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Computational Aspects of DOP1

Parsing Disambiguation

Most Probable DerivationMost Probable Parse

Optimizations

Page 17: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Parsing

Chart-like parse forest

Derivation forest Elementary tree t as a context-free rule:

root(t) —> yield(t) Label phrase with it’s syntactic category

and its full elementary tree

Page 18: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

                                                                              

Elementary trees of an example STSG

0 1 2 3 4

Page 19: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Derivation forest for the string abcd

Page 20: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Derivations and parse trees for the string abcd

Page 21: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Derivations and parse trees for the string abcd

Page 22: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Disambiguation

Derivation forest define all derivation and parses

Most likely parse must be chosen MPP in DOP1 MPP vs. MPD

Page 23: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Most Probable Derivation

Viterbi algorithm: Eliminate low probability sub derivations

using bottom-up fashion Select the most probable sub derivation at

each chart entry, eliminate other sub derivation of that root node.

Page 24: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Viterbi algorithm

Two derivations for abc d1 > d2 : eliminate the right derivation

Page 25: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Algorithm 1 – Computing the probability of most probable derivation

Input : STSG , S , R , P Elementary trees in R are in CNF A—>t H : tree t, root A, sequence of

labels H. <A, i, j> - non terminal A in chart entry

(i,j) after parsing the input W1,...,Wn . PPMPD – probability of MPD of input

string W1,...,Wn.

Page 26: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Algorithm 1 – Computing the probability of most probable derivation

Page 27: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

The Most Probable Parse

Computing MPP in STSG is NP hard Monte Carlo method

Sample derivations Observe frequent parse tree Estimate parse tree probability Random – first search

The algorithm Law of Large Numbers

Page 28: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Algorithm 2: Sampling a random derivation

for length := 1 to n do for start := 0 to n - length do

for each root node X chart-entry (start, start + length) do:

1. select at random a tree from the distribution of elementary trees with root node X

2. eliminate the other elementary trees with root node X from this chart-entry

Page 29: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Results of Algorithm 2

Random derivation for the whole sentence

First guess for MPP Compute the size of the sampling set

Probability of error Upper bound 0 index of MPP,i index of parse i, N derivation

No unique MPP – ambiguity

Page 30: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Reminder

2 2[ ] [ ] [ ]

0 [ ] 1

( ) [ ]

V X E X E X

P X

X V X

Page 31: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Conclusions – lower bound for N

Lower bound for N: Pi is probability of parse i B - Estimated probability by frequencies in N Var(B) = Pi*(1-Pi)/N 0 < Pi^2 <= 1 -> Var(B) <= 1/(4*N) s = sqrt(Var(B)) -> S <= 1/(2*sqrt(N)) 1/(4*s^2) <= N 100 <= N -> s <= 0.05

Page 32: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Algorithm 3: Estimating the parse probabilities

Given a derivation forest of a sentence and a threshold sm for the standard error:

N := the smallest integer larger than 1/(4 sm 2) repeat N times:

sample a random derivation from the derivation forest

store the parse generated by this derivation for each parse i:

estimate the conditional probability given the sentence by pi := #(i) / N

Page 33: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Complexity of Algorithm 3

Assumes value of max allowed standard error

Samples number of derivations which is guaranteed to achieve the error

Number of needed samples is quadratic in chosen error

Page 34: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Optimizations

Sima’an : MPD in linear time in STSG size

Bod : MPP on small random corpus of sub trees

Sekine and Grishman : use only sub trees rooted with S or NP

Goodman : different polynomial time

Page 35: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Experimental Properties of DOP1

Experiments on the ATIS corpus MPP vs. MPD Impact of fragment size Impact of fragment lexicalization Impact of fragment frequency

Experiments on SRI-ATIS and OVIS Impact of sub tree depth

Page 36: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Experiments on ATIS corpus

ATIS = Air Travel Information System

750 annotated sentence analyses

Annotated by Penn Treebank

Purpose: compare accuracy obtained in undiluted DOP1 with the one obtained in restricted STSG

Page 37: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Experiments on ATIS corpus

Divide into training and test sets 90% = 675 in training set 10% = 75 in test set

Convert training set into fragments and enrich with probabilities

Test set sentences parsed with sub trees from the training set

MPP was estimated from 100 sampled derivations

Parse accuracy = % of MPP that are identical to test set parses

Page 38: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Results

On 10 random training / test splits of ATIS:

Average parse accuracy = 84.2%

Standard deviation = 2.9 %

Page 39: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Impact of overlapping fragments MPP vs. MPD

Can MPD achieve parse accuracies similar to MPP

Can MPD do better than MPP Overlapping fragments

Accuracies generated by MPD on test set The result is 69% Comparing to accuracy achieved with MPP on

test set : 69% vs. 85% Conclusion: overlapping fragments play important role

in predicting the appropriate analysis of a sentence

Page 40: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

The impact of fragment size

Large fragments capture more lexical/syntactic dependencies than small ones.

The experiment: Use DOP1 with restricted maximum depth Max depth 1 -> DOP1 = SCFG Compute the accuracies both for MPD and

MPP for each max depth

Page 41: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Impact of fragment size

Page 42: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Impact of fragment lexicalization

Lexicalized fragment More words -> more lexical

dependencies

Experiment: Different version of DOP1 Restrict max number of words per fragment Check accuracy for MPP and MPD

Page 43: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Impact of fragment lexicalization

Page 44: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Impact of fragment frequency

Frequent fragments contribute more large fragments are less frequent than

small ones but might contribute more Experiment:

Restrict frequency to min number of occurrences

Not other restrictions Check accuracy for MPP

Page 45: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Impact of fragment frequency

Page 46: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Experiments on SRI-ATIS and OVIS

Employ MPD because the corpus is bigger Tests performed on DOP1 and SDOP Use set of heuristic criteria for selecting the

fragments: Constraints of the form of sub trees

d - upper bound on depth n – number of substitution sites l – number of terminals L – number of consecutive terminals

Apply constraints on all sub trees besides those with depth 1

Page 47: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Experiments on SRI-ATIS and OVIS

d4 n2 l7 L3

DOP(i)

Evaluation metrics: Recognized Tree Language Coverage – TLC Exact match Labeled bracketing recall and precision

Page 48: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Experiments on SRI-ATIS

13335 annotated syntactically utterances Annotation scheme originated from Core

Language Engine system Fixed parameters except sub tree bound:

n2 l4 L3 Training set – 12335 trees Test set – 1000 trees Experiment:

Train and test on different depths upper bounds (takes more than 10 days for DOP(4) !!! )

Page 49: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Impact of sub tree depth SRI-ATIS

Page 50: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Experiments on OVIS corpus

10000 syntactically and semantically annotated trees

Both annotations treated as one More non terminal symbols Utterances are answers to questions in

dialog -> short utterances (avg. 3.43) Sima’an results – sentences with at least

2 words , avg. 4.57 n2 l7 L3

Page 51: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Experiments on OVIS corpus

Experiment:

Check different sub tree depth 1,3,4,5

Test set with 1000 trees Train set with 9000 trees

Page 52: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Impact of sub tree depth - OVIS

Page 53: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Summary of results

ATIS: Accuracy of parsing is 85% Overlapping fragments have impact on

accuracy Accuracy increases as fragment depth

increases both for MPP and MPD Optimal lexical maximum for ATIS is 8 Accuracy decreases if lower bound of

fragment frequency increases (for MPP)

Page 54: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Summary of results

SRI-ATIS: Availability of more data is more crucial

to accuracy of MPD. Depth has impact Accuracy is improved when using

memory based parsing(DOP(2)) and not SCFG (DOP(1))

Page 55: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Summary of results

OVIS: Recognition power isn’t affected by depth

No big difference between exact match in DOP1(1) and DOP1(4) mean and standard deviations

Page 56: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

DOP: probabilistic recursive MBL

Relationship between present DOP framework and Memory Based Learning framework

DOP extends MBL to deal with disambiguation

MBL vs. DOP Flat or intermediate description vs.

hierarchical

Page 57: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Case Based Reasoning - CBR

Case Based learning Lazy learning, doesn’t generalize Lazy generalization

Classify by means of similarity function

Refer this paradigm as MBL CBR vs. other variants of MBL

Task concept Similarity function Learning task

Page 58: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

The DOP framework and CBR CBR method

A formalism for representing utterance-analyses - case description language

An extraction function – retrieve units Combination operations – reuse and revision

Missing in DOP: Similarity function

Extend CBR: A probability model

DOP model defines CBR system for

natural language analysis

Page 59: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

DOP1 and CBR methods

DOP1 as extension to CBR system <string,tree> = classified instance Retrieve sub trees and construct tree Sentence = instance Tree = class Set of sentences = instance space Set of trees – class space Frontier , SSF , <str , st > Infinite runtime case-base containing

instance-class-weight triples: <SSF,subtree,probability>

Page 60: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

DOP1 and CBR methods

Task and similarity function: Task = disambiguation Similarity function:

Parsing -> recursive string matching procedure Ambiguity -> computing probability and selecting

the highest.

Conclusion: DOP1 is a lazy probabilistic recursive CBR

classifier

Page 61: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

DOP vs. other MBL approached in NLP

K-NN vs. DOP Memory Based Sequence Learning

DOP – stochastic model fro computing probabilities

MBSL – ad hoc heuristics for computing scores DOP – globally based ranking strategy of

alternative analyzes MBSL – locally based one Different generalization power

Page 62: A Memory-Based Model of Syntactic Analysis: Data Oriented Parsing Remko Scha, Rens Bod, Khalil Sima ’ an Institute for Logic, Language and Computation.

Conclusions

Memory Based aspects of DOP model Disambiguation Probabilities to account frequencies DOP as probabilistic recursive Memory

Based model DOP1 - properties, computational

aspects and experiments. DOP and MBL - differences