Working with Discourse Representation Theory Patrick Blackburn & Johan Bos Lecture 5 Applying DRT

Post on 04-Feb-2016

39 views 1 download

Tags:

description

Working with Discourse Representation Theory Patrick Blackburn & Johan Bos Lecture 5 Applying DRT. Today. Given what we know about DRT, both from a theoretical and practical perspective, can we use it for practical applications?. Outline. Spoken dialogue system with DRT - PowerPoint PPT Presentation

Transcript of Working with Discourse Representation Theory Patrick Blackburn & Johan Bos Lecture 5 Applying DRT

Working with Discourse Representation Theory

Patrick Blackburn & Johan Bos

Lecture 5Applying DRT

Today

Given what we know about DRT, both from a theoretical and practical perspective, can we use it for practical applications?

Outline

Spoken dialogue system with DRT Using DRT and inference to

control a mobile robot

Wide coverage parsing with DRT Recognising Textual Entailment

Human-Computer Dialogue

Focus on small domains Grammatical coverage ensured Background knowledge encoding

Spoken Dialogue system Godot the robot Speech recognition and synthesis People could give Godot directions, ask it

questions, tell it new information Godot was a REAL robot

Godot the Robot

Godot with Tetsushi Oka

Typical conversation with Godot

Simple dialogues in beginning Human: Robot? Robot: Yes? Human: Where are you? Robot: I am in the hallway. Human: OK. Go to the rest room!

Adding DRT to the robot

The language model that the robot used for speech recognition returned DRSs

We used the model builder MACE and the theorem prover SPASS for inference

The model produced by MACE was used to find out what the robot should do

This was possible as MACE produces minimal models

Of course we also checked for consistency and informativity

Advanced conversation with Godot

Dealing with inconsistency and informativity Human: Robot, where are you? Robot: I am in the hallway. Human: You are in my office. Robot: No, that is not true. Human: You are in the hallway. Robot: Yes I know!

Videos of Godot

Video 1: Godot in the basement of Bucceuch Place

Video 2: Screenshot of dialogue manager with DRSs and camera view of Godot

Minimal Models

Model builders normally generate models by iteration over the domain size

As a side-effect, the output is a model with a minimal domain size

From a linguistic point of view, this is interesting, as there is no redundant information

Using models

Examples:

Turn on a light.Turn on every light.Turn on everything except the radio. Turn off the red light or the blue light.Turn on another light.

Adding presupposition

Godot was connected to an automated home environment

One day, I asked Godot to switch on all the lights

However, Godot refused to do this, responding that it was unable to do so.

Why was that? At first I thought that the theorem prover made a

mistake. But it turned out that one of the lights was already

on.

Intermediate Accommodation

Because I had coded to switch on X having a precondition that X is not on, the theorem prover found a proof.

Coding this as a presupposition, would not give an inconsistency, but a beautiful case of intermediate accommodation.

In other words: Switch on all the lights!

[ All lights are off; switch them on.][=Switch on all the lights that are currently off]

Sketch of resolution

x

Robot[x]

=>

y

Light[y] Off[y]

e

switch[e]

Agent[e,x]

Theme[e,y]

Global Accommodation

x

Robot[x]

Off[y]

=>

y

Light[y]

e

switch[e]

Agent[e,x]

Theme[e,y]

Intermediate Accommodation

x

Robot[x]

=>

y

Light[y]

Off[y]

e

switch[e]

Agent[e,x]

Theme[e,y]

Local Accommodation

x

Robot[x]

=>

y

Light[y]

e

switch[e]

Agent[e,x]

Theme[e,y]

Off[y]

Outline

Spoken dialogue system with DRT Using DRT and inference to

control a mobile robot

Wide coverage parsing with DRT Recognising Textual Entailment

Wide-coverage DRT

Nowadays we have robust wide-coverage parsers that use stochastic methods for producing a parse tree

Trained on Penn Tree BankExamples are parsers like those from

Collins and Charniak

Wide-coverage parsers

Say we wished to produce DRSs on the output of these parsers

We would need quite detailed syntax derivations

Closer inspection reveals that many of the parsers use many [several thousands] phrase structure rules

Often, long distance dependencies are not recovered

Combinatory Categorial Grammar

CCG is a lexicalised theory of grammar (Steedman 2001)

Deals with complex cases of coordination and long-distance dependencies

Lexicalised, hence easy to implement English wide-coverage grammar Fast robust parser available

Categorial Grammar

Lexicalised theory of syntax Many different lexical categories Few grammar rules

Finite set of categories defined over a base of core categories Core categories:

s np n pp Combined categories:

np/n s\np (s\np)/np

CCG: type-driven lexicalised grammar

Category Name Examples

N noun Ralph, car

NP noun phrase Everyone

NP/N determiner a, the, every

S\NP intrans. verb walks, smiles

(S\NP)/NP transitive verb loves, hates

(S\NP)\(S\NP) adverb quickly

CCG: combinatorial rules

Forward Application (FA)Backward Application (BA) Generalised Forward Composition (FC) Backward Crossed Composition (BC)Type Raising (TR)Coordination

CCG derivation

NP/N:a N:spokesman S\NP:lied

CCG derivation

NP/N:a N:spokesman S\NP:lied

CCG derivation

NP/N:a N:spokesman S\NP:lied

------------------------------- (FA)

CCG derivation

NP/N:a N:spokesman S\NP:lied

------------------------------- (FA)

NP: a spokesman

CCG derivation

NP/N:a N:spokesman S\NP:lied

------------------------------- (FA)

NP: a spokesman

---------------------------------------- (BA)

CCG derivation

NP/N:a N:spokesman S\NP:lied

------------------------------- (FA)

NP: a spokesman

---------------------------------------- (BA)

S: a spokesman lied

CCG derivation

NP/N:a N:spokesman S\NP:lied

------------------------------- (FA)

NP: a spokesman

---------------------------------------- (BA)

S: a spokesman lied

Coordination in CCG

np:Artie (s\np)/np:likes (x\x)/x:and np:Tony (s\np)/np:hates np:beans

---------------- (TR) ---------------- (TR)

s/(s\np):Artie s/(s\np):Tony

------------------------------------ (FC) --------------------------------------- (FC)

s/np: Artie likes s/np:Tony hates

------------------------------------------------------- (FA)

(s/np)\(s/np):and Tony hates

--------------------------------------------------------------------------------- (BA)

s/np: Artie likes and Tony hates

------------------------------------------------------ (FA)

s: Artie likes and Tony hates beans

The Glue

Use the Lambda Calculus to combine CCG with DRT

Each lexical entry gets a DRS with lambda-bound variables, representing the “missing” information

Each combinatorial rule in CCG gets a semantic interpretation, again using the tools of the lambda calculus

Interpreting Combinatorial Rules

Each combinatorial rule in CCG is expressed in terms of the lambda calculus: Forward Application:

FA(,) = @ Backward Application:

BA(,) = @ Type Raising:

TR() = x.x@ Function Composition:

FC(,) = x.@x@

CCG: lexical semantics

Category Semantics Example

N x.

spokesman

NP/Np. q.(( ;p@x);q@x)

a

S\NP x. x@y.

lied

spokesman(x)

X

e

lie(e)

agent(e,y)

CCG derivation

NP/N:a N:spokesman S\NP:lied

p. q. ;p@x;q@x z. x.x@y. spokesman(z)

x e

lie(e)

agent(e,y)

CCG derivation

NP/N:a N:spokesman S\NP:lied

p. q. ;p@x;q@x z. x.x@y.

------------------------------------------------ (FA)

NP: a spokesman

p. q. ;p@x;q@x@z.

spokesman(z)

x

spokesman(z)

e

lie(e)

agent(e,y)

x

CCG derivation

NP/N:a N:spokesman S\NP:lied

p. q. ;p@x;q@x z. x.x@y.

-------------------------------------------------------- (FA)

NP: a spokesman

q. ; ;q@x

spokesman(z)

x

spokesman(x)

e

lie(e)

agent(e,y)

x

CCG derivation

NP/N:a N:spokesman S\NP:lied

p. q. ;p@x;q@x z. x.x@y.

-------------------------------------------------------- (FA)

NP: a spokesman

q. ;q@x

spokesman(z)

x

x

spokesman(x)

e

lie(e)

agent(e,y)

CCG derivation

NP/N:a N:spokesman S\NP:lied

p. q. ;p@x;q@x z. x.x@y.

-------------------------------------------------------- (FA)

NP: a spokesman

q. ;q@x

-------------------------------------------------------------------------------- (BA)

S: a spokesman lied

x.x@y. @q. ;q@x

spokesman(z)

x

x

spokesman(x)

e

lie(e)

agent(e,y)

e

lie(e)

agent(e,y)

x

spokesman(x)

CCG derivation

NP/N:a N:spokesman S\NP:lied

p. q. ;p@x;q@x z. x.x@y.

-------------------------------------------------------- (FA)

NP: a spokesman

q. ;q@x

-------------------------------------------------------------------------------- (BA)

S: a spokesman lied

q. ;q@x @ y.

spokesman(z)

x

x

spokesman(x)

e

lie(e)

agent(e,y)

e

lie(e)

agent(e,y)

x

spokesman(x)

CCG derivation

NP/N:a N:spokesman S\NP:lied

p. q. ;p@x;q@x z. x.x@y.

-------------------------------------------------------- (FA)

NP: a spokesman

q. ;q@x

-------------------------------------------------------------------------------- (BA)

S: a spokesman lied

;

spokesman(z)

x

x

spokesman(x)

e

lie(e)

agent(e,y)

e

lie(e)

agent(e,x)

x

spokesman(x)

CCG derivation

NP/N:a N:spokesman S\NP:lied

p. q. ;p@x;q@x z. x.x@y.

-------------------------------------------------------- (FA)

NP: a spokesman

q. ;q@x

-------------------------------------------------------------------------------- (BA)

S: a spokesman lied

spokesman(z)

x

x

spokesman(x)

e

lie(e)

agent(e,y)

x e

spokesman(x)

lie(e)

agent(e,x)

The Clark & Curran Parser

Use standard statistical techniques Robust wide-coverage parser Clark & Curran (ACL 2004)

Grammar derived from CCGbank 409 different categories Hockenmaier & Steedman (ACL 2002)

Results: 96% coverage WSJ Bos et al. (COLING 2004) Example output:

Applications

Has been used for different kind of applications Question Answering Recognising Textual Entailment

Recognising Textual Entailment

A task for NLP systems to recognise entailment between two (short) texts

Introduced in 2004/2005 as part of the PASCAL Network of Excellence

Proved to be a difficult, but popular task. Pascal provided a development and test set

of several hundred examples

RTE Example (entailment)

RTE 1977 (TRUE)

His family has steadfastly denied the

charges.

-----------------------------------------------------

The charges were denied by his family.

RTE Example (no entailment)

RTE 2030 (FALSE)

Lyon is actually the gastronomical capital

of France.

-----------------------------------------------------

Lyon is the capital of France.

Aristotle’s Syllogisms

All men are mortal.

Socrates is a man.

-------------------------------

Socrates is mortal.

ARISTOTLE 1 (TRUE)

How to deal with RTE

There are several methodsWe will look at five of them to see how

difficult RTE actually is

Recognising Textual Entailment

Method 1:Flipping a coin

Flipping a coin

Advantages Easy to implement

Disadvantages Just 50% accuracy

Recognising Textual Entailment

Method 2:

Calling a friend

Calling a friend

Advantages High accuracy (95%)

Disadvantages Lose friends High phonebill

Recognising Textual Entailment

Method 3:

Ask the audience

Ask the audience

RTE 893 (????)

The first settlements on the site of Jakarta wereestablished at the mouth of the Ciliwung, perhapsas early as the 5th century AD.

----------------------------------------------------------------

The first settlements on the site of Jakarta wereestablished as early as the 5th century AD.

Human Upper Bound

RTE 893 (TRUE)

The first settlements on the site of Jakarta wereestablished at the mouth of the Ciliwung, perhapsas early as the 5th century AD.

----------------------------------------------------------------

The first settlements on the site of Jakarta wereestablished as early as the 5th century AD.

Recognising Textual Entailment

Method 4:

Word Overlap

Word Overlap Approaches

Popular approachRanging in sophistication from simple

bag of word to use of WordNetAccuracy rates ca. 55%

Word Overlap

Advantages Relatively straightforward algorithm

Disadvantages Hardly better than flipping a coin

RTE State-of-the-Art

Pascal RTE challenge

Hard problemRequires

semantics

Accuracy RTE 2004/5 (n=25)

0 10

0.49-0.50

0.51-0.52

0.53-0.54

0.55-0.56

0.57-0.58

0.59-0.60

Recognising Textual Entailment

Method 5:

Using DRT

Inference

How do we perform inference with DRSs? Translate DRS into first-order logic,

use off-the-shelf inference engines.

What kind of inference engines? Theorem Provers Model Builders

Using Theorem Proving

Given a textual entailment pair T/H with text T and hypothesis H: Produce DRSs for T and H Translate these DRSs into FOL Give this to the theorem prover:

T’ H’

If the theorem prover finds a proof, then T entails H

Vampire (Riazanov & Voronkov 2002)

Let’s try this. We will use the theorem prover Vampire (currently the best known theorem prover for FOL)

This gives us good results for: apposition relative clauses coodination intersective adjectives/complements passive/active alternations

Example (Vampire: proof)

On Friday evening, a car bomb exploded

outside a Shiite mosque in Iskandariyah,

30 miles south of the capital.

-----------------------------------------------------

A bomb exploded outside a mosque.

RTE-2 112 (TRUE)

Example (Vampire: proof)

Initially, the Bundesbank opposed the

introduction of the euro but was compelled

to accept it in light of the political pressure

of the capitalist politicians who supportedits introduction.

-----------------------------------------------------

The introduction of the euro has been opposed.

RTE-2 489 (TRUE)

Background Knowledge

However, it doesn’t give us good results for cases requiring additional knowledge Lexical knowledge World knowledge

We will use WordNet as a start to get additional knowledge

All of WordNet is too much, so we create MiniWordNets

MiniWordNets

MiniWordNets Use hyponym relations from WordNet to

build an ontology Do this only for the relevant symbols Convert the ontology into first-order

axioms

MiniWordNet: an example

Example text:

There is no asbestos in our products now. Neither Lorillard nor the researchers who studied the workers were aware of any research on smokers of the Kent cigarettes.

MiniWordNet: an example

Example text:

There is no asbestos in our products now. Neither Lorillard nor the researchers who studied the workers were aware of any research on smokers of the Kent cigarettes.

x(user(x)person(x))

x(worker(x)person(x))

x(researcher(x)person(x))

x(person(x)risk(x))

x(person(x)cigarette(x))

…….

Using Background Knowledge

Given a textual entailment pair T/H with text T and hypothesis H: Produce DRS for T and H Translate drs(T) and drs(H) into FOL Create Background Knowledge for T&H Give this to the theorem prover:

(BK & T’) H’

MiniWordNets at work

Background Knowledge:x(soar(x)rise(x))

Crude oil prices soared to record levels.

-----------------------------------------------------

Crude oil prices rise.

RTE 1952 (TRUE)

Troubles with theorem proving

Theorem provers are extremely precise.

They won’t tell you when there is “almost” a proof.

Even if there is a little background knowledge missing, Vampire will say:

NO

Vampire: no proof

RTE 1049 (TRUE)

Four Venezuelan firefighters who were traveling

to a training course in Texas were killed when their

sport utility vehicle drifted onto the shoulder of a

Highway and struck a parked truck.

----------------------------------------------------------------

Four firefighters were killed in a car accident.

Using Model Building

Need a robust way of inferenceUse model builder Paradox

Claessen & Sorensson (2003)

Use size of (minimal) model Compare size of model of T and T&H If the difference is small, then it is likely

that T entails H

Using Model Building

Given a textual entailment pair T/H withtext T and hypothesis H: Produce DRSs for T and H Translate these DRSs into FOL Generate Background Knowledge Give this to the Model Builder:

i) BK & T’

ii) BK & T’ & H’

If the models for i) and ii) are similar in size, then we predict that T entails H

Example 1

T: John met Mary in RomeH: John met Mary

Model T: 3 entitiesModel T+H: 3 entities

Modelsize difference: 0 Prediction: entailment

Example 2

T: John met Mary H: John met Mary in Rome

Model T: 2 entitiesModel T+H: 3 entities

Modelsize difference: 1Prediction: no entailment

Model size differences

Of course this is a very rough approximation

But it turns out to be a useful oneGives us a notion of robustnessOf course we need to deal with

negation as well Give not T and not [T & H] to model builder Not necessarily one unique minimal model

Lack of Background Knowledge

RTE-2 235 (TRUE)

Indonesia says the oil blocks are within its borders, as does Malaysia, which has also sent warships to the area, claiming that its waters and airspace have been violated.

---------------------------------------------------------------

There is a territorial waters dispute.

How well does this work?

We tried this at the RTE 2004/05Combined this with a shallow approach

(word overlap)Using standard machine learning methods to

build a decision treeFeatures used:

Proof (yes/no) Model size Model size difference Word Overlap Task (source of RTE pair)

RTE Results 2004/5

Accuracy CWS

Shallow 0.569 0.624

Deep 0.562 0.608

Hybrid (S+D) 0.577 0.632

Hybrid+Task 0.612 0.646

Bos & Markert 2005

Conclusions

We have got the tools for doing computational semantics in a principled way using DRT

For many applications, success depends on the ability to systematically generate background knowledge Small restricted domains [dialogue] Open domain

What we did in this course

We introduced DRT, a notational variant of first-order logic.

Semantically, we can handle in DRT anything we can in FOL, including events.

Moreover, because it is so close to FOL, we can use first-order methods to implement inference for DRT.

The DRT box syntax, is essentially about nesting contexts, which allows a uniform treatment of anaphoric phenomena.

Moreover, this works not only on the theoretical level, but is also implementable, and even applicable.

What we hope you got out of it

First, we hope we made you aware that nowadays computational semantics is able to handle some difficult problems.

Second, we hope we made you aware that DRT is not just a theory. It is a complete architecture allowing us to experiment with computational semantics.

Third, we hope you are aware that state-of-the-art inference engines can help to study or apply semantics.

Where you can find more

For more on DRT read the standard textbook devoted to DRT by Kamp and Reyle. This book discusses not only the basic theory, but also plurals, tense, and aspect.

Where you can find more

For more on the basic architecture underlying this work on computational semantics, and particular on implementations on the lambda calculus, and parallel use of theorem provers and model builders, see:

www.blackburnbos.org

Where you can find more

All of the theory we discussed in this course is implemented in Prolog. This software can be downloaded from www.blackburnbos.org. For an introduction to Prolog written very much with this software in mind, try Learn Prolog Now!

www.learnprolognow.org