Working with Discourse Representation Theory Patrick Blackburn & Johan Bos Lecture 5 Applying DRT
description
Transcript of Working with Discourse Representation Theory Patrick Blackburn & Johan Bos Lecture 5 Applying DRT
Working with Discourse Representation Theory
Patrick Blackburn & Johan Bos
Lecture 5Applying DRT
Today
Given what we know about DRT, both from a theoretical and practical perspective, can we use it for practical applications?
Outline
Spoken dialogue system with DRT Using DRT and inference to
control a mobile robot
Wide coverage parsing with DRT Recognising Textual Entailment
Human-Computer Dialogue
Focus on small domains Grammatical coverage ensured Background knowledge encoding
Spoken Dialogue system Godot the robot Speech recognition and synthesis People could give Godot directions, ask it
questions, tell it new information Godot was a REAL robot
Godot the Robot
Godot with Tetsushi Oka
Typical conversation with Godot
Simple dialogues in beginning Human: Robot? Robot: Yes? Human: Where are you? Robot: I am in the hallway. Human: OK. Go to the rest room!
Adding DRT to the robot
The language model that the robot used for speech recognition returned DRSs
We used the model builder MACE and the theorem prover SPASS for inference
The model produced by MACE was used to find out what the robot should do
This was possible as MACE produces minimal models
Of course we also checked for consistency and informativity
Advanced conversation with Godot
Dealing with inconsistency and informativity Human: Robot, where are you? Robot: I am in the hallway. Human: You are in my office. Robot: No, that is not true. Human: You are in the hallway. Robot: Yes I know!
Videos of Godot
Video 1: Godot in the basement of Bucceuch Place
Video 2: Screenshot of dialogue manager with DRSs and camera view of Godot
Minimal Models
Model builders normally generate models by iteration over the domain size
As a side-effect, the output is a model with a minimal domain size
From a linguistic point of view, this is interesting, as there is no redundant information
Using models
Examples:
Turn on a light.Turn on every light.Turn on everything except the radio. Turn off the red light or the blue light.Turn on another light.
Adding presupposition
Godot was connected to an automated home environment
One day, I asked Godot to switch on all the lights
However, Godot refused to do this, responding that it was unable to do so.
Why was that? At first I thought that the theorem prover made a
mistake. But it turned out that one of the lights was already
on.
Intermediate Accommodation
Because I had coded to switch on X having a precondition that X is not on, the theorem prover found a proof.
Coding this as a presupposition, would not give an inconsistency, but a beautiful case of intermediate accommodation.
In other words: Switch on all the lights!
[ All lights are off; switch them on.][=Switch on all the lights that are currently off]
Sketch of resolution
x
Robot[x]
=>
y
Light[y] Off[y]
e
switch[e]
Agent[e,x]
Theme[e,y]
Global Accommodation
x
Robot[x]
Off[y]
=>
y
Light[y]
e
switch[e]
Agent[e,x]
Theme[e,y]
Intermediate Accommodation
x
Robot[x]
=>
y
Light[y]
Off[y]
e
switch[e]
Agent[e,x]
Theme[e,y]
Local Accommodation
x
Robot[x]
=>
y
Light[y]
e
switch[e]
Agent[e,x]
Theme[e,y]
Off[y]
Outline
Spoken dialogue system with DRT Using DRT and inference to
control a mobile robot
Wide coverage parsing with DRT Recognising Textual Entailment
Wide-coverage DRT
Nowadays we have robust wide-coverage parsers that use stochastic methods for producing a parse tree
Trained on Penn Tree BankExamples are parsers like those from
Collins and Charniak
Wide-coverage parsers
Say we wished to produce DRSs on the output of these parsers
We would need quite detailed syntax derivations
Closer inspection reveals that many of the parsers use many [several thousands] phrase structure rules
Often, long distance dependencies are not recovered
Combinatory Categorial Grammar
CCG is a lexicalised theory of grammar (Steedman 2001)
Deals with complex cases of coordination and long-distance dependencies
Lexicalised, hence easy to implement English wide-coverage grammar Fast robust parser available
Categorial Grammar
Lexicalised theory of syntax Many different lexical categories Few grammar rules
Finite set of categories defined over a base of core categories Core categories:
s np n pp Combined categories:
np/n s\np (s\np)/np
CCG: type-driven lexicalised grammar
Category Name Examples
N noun Ralph, car
NP noun phrase Everyone
NP/N determiner a, the, every
S\NP intrans. verb walks, smiles
(S\NP)/NP transitive verb loves, hates
(S\NP)\(S\NP) adverb quickly
CCG: combinatorial rules
Forward Application (FA)Backward Application (BA) Generalised Forward Composition (FC) Backward Crossed Composition (BC)Type Raising (TR)Coordination
CCG derivation
NP/N:a N:spokesman S\NP:lied
CCG derivation
NP/N:a N:spokesman S\NP:lied
CCG derivation
NP/N:a N:spokesman S\NP:lied
------------------------------- (FA)
CCG derivation
NP/N:a N:spokesman S\NP:lied
------------------------------- (FA)
NP: a spokesman
CCG derivation
NP/N:a N:spokesman S\NP:lied
------------------------------- (FA)
NP: a spokesman
---------------------------------------- (BA)
CCG derivation
NP/N:a N:spokesman S\NP:lied
------------------------------- (FA)
NP: a spokesman
---------------------------------------- (BA)
S: a spokesman lied
CCG derivation
NP/N:a N:spokesman S\NP:lied
------------------------------- (FA)
NP: a spokesman
---------------------------------------- (BA)
S: a spokesman lied
Coordination in CCG
np:Artie (s\np)/np:likes (x\x)/x:and np:Tony (s\np)/np:hates np:beans
---------------- (TR) ---------------- (TR)
s/(s\np):Artie s/(s\np):Tony
------------------------------------ (FC) --------------------------------------- (FC)
s/np: Artie likes s/np:Tony hates
------------------------------------------------------- (FA)
(s/np)\(s/np):and Tony hates
--------------------------------------------------------------------------------- (BA)
s/np: Artie likes and Tony hates
------------------------------------------------------ (FA)
s: Artie likes and Tony hates beans
The Glue
Use the Lambda Calculus to combine CCG with DRT
Each lexical entry gets a DRS with lambda-bound variables, representing the “missing” information
Each combinatorial rule in CCG gets a semantic interpretation, again using the tools of the lambda calculus
Interpreting Combinatorial Rules
Each combinatorial rule in CCG is expressed in terms of the lambda calculus: Forward Application:
FA(,) = @ Backward Application:
BA(,) = @ Type Raising:
TR() = x.x@ Function Composition:
FC(,) = x.@x@
CCG: lexical semantics
Category Semantics Example
N x.
spokesman
NP/Np. q.(( ;p@x);q@x)
a
S\NP x. x@y.
lied
spokesman(x)
X
e
lie(e)
agent(e,y)
CCG derivation
NP/N:a N:spokesman S\NP:lied
p. q. ;p@x;q@x z. x.x@y. spokesman(z)
x e
lie(e)
agent(e,y)
CCG derivation
NP/N:a N:spokesman S\NP:lied
p. q. ;p@x;q@x z. x.x@y.
------------------------------------------------ (FA)
NP: a spokesman
p. q. ;p@x;q@x@z.
spokesman(z)
x
spokesman(z)
e
lie(e)
agent(e,y)
x
CCG derivation
NP/N:a N:spokesman S\NP:lied
p. q. ;p@x;q@x z. x.x@y.
-------------------------------------------------------- (FA)
NP: a spokesman
q. ; ;q@x
spokesman(z)
x
spokesman(x)
e
lie(e)
agent(e,y)
x
CCG derivation
NP/N:a N:spokesman S\NP:lied
p. q. ;p@x;q@x z. x.x@y.
-------------------------------------------------------- (FA)
NP: a spokesman
q. ;q@x
spokesman(z)
x
x
spokesman(x)
e
lie(e)
agent(e,y)
CCG derivation
NP/N:a N:spokesman S\NP:lied
p. q. ;p@x;q@x z. x.x@y.
-------------------------------------------------------- (FA)
NP: a spokesman
q. ;q@x
-------------------------------------------------------------------------------- (BA)
S: a spokesman lied
x.x@y. @q. ;q@x
spokesman(z)
x
x
spokesman(x)
e
lie(e)
agent(e,y)
e
lie(e)
agent(e,y)
x
spokesman(x)
CCG derivation
NP/N:a N:spokesman S\NP:lied
p. q. ;p@x;q@x z. x.x@y.
-------------------------------------------------------- (FA)
NP: a spokesman
q. ;q@x
-------------------------------------------------------------------------------- (BA)
S: a spokesman lied
q. ;q@x @ y.
spokesman(z)
x
x
spokesman(x)
e
lie(e)
agent(e,y)
e
lie(e)
agent(e,y)
x
spokesman(x)
CCG derivation
NP/N:a N:spokesman S\NP:lied
p. q. ;p@x;q@x z. x.x@y.
-------------------------------------------------------- (FA)
NP: a spokesman
q. ;q@x
-------------------------------------------------------------------------------- (BA)
S: a spokesman lied
;
spokesman(z)
x
x
spokesman(x)
e
lie(e)
agent(e,y)
e
lie(e)
agent(e,x)
x
spokesman(x)
CCG derivation
NP/N:a N:spokesman S\NP:lied
p. q. ;p@x;q@x z. x.x@y.
-------------------------------------------------------- (FA)
NP: a spokesman
q. ;q@x
-------------------------------------------------------------------------------- (BA)
S: a spokesman lied
spokesman(z)
x
x
spokesman(x)
e
lie(e)
agent(e,y)
x e
spokesman(x)
lie(e)
agent(e,x)
The Clark & Curran Parser
Use standard statistical techniques Robust wide-coverage parser Clark & Curran (ACL 2004)
Grammar derived from CCGbank 409 different categories Hockenmaier & Steedman (ACL 2002)
Results: 96% coverage WSJ Bos et al. (COLING 2004) Example output:
Applications
Has been used for different kind of applications Question Answering Recognising Textual Entailment
Recognising Textual Entailment
A task for NLP systems to recognise entailment between two (short) texts
Introduced in 2004/2005 as part of the PASCAL Network of Excellence
Proved to be a difficult, but popular task. Pascal provided a development and test set
of several hundred examples
RTE Example (entailment)
RTE 1977 (TRUE)
His family has steadfastly denied the
charges.
-----------------------------------------------------
The charges were denied by his family.
RTE Example (no entailment)
RTE 2030 (FALSE)
Lyon is actually the gastronomical capital
of France.
-----------------------------------------------------
Lyon is the capital of France.
Aristotle’s Syllogisms
All men are mortal.
Socrates is a man.
-------------------------------
Socrates is mortal.
ARISTOTLE 1 (TRUE)
How to deal with RTE
There are several methodsWe will look at five of them to see how
difficult RTE actually is
Recognising Textual Entailment
Method 1:Flipping a coin
Flipping a coin
Advantages Easy to implement
Disadvantages Just 50% accuracy
Recognising Textual Entailment
Method 2:
Calling a friend
Calling a friend
Advantages High accuracy (95%)
Disadvantages Lose friends High phonebill
Recognising Textual Entailment
Method 3:
Ask the audience
Ask the audience
RTE 893 (????)
The first settlements on the site of Jakarta wereestablished at the mouth of the Ciliwung, perhapsas early as the 5th century AD.
----------------------------------------------------------------
The first settlements on the site of Jakarta wereestablished as early as the 5th century AD.
Human Upper Bound
RTE 893 (TRUE)
The first settlements on the site of Jakarta wereestablished at the mouth of the Ciliwung, perhapsas early as the 5th century AD.
----------------------------------------------------------------
The first settlements on the site of Jakarta wereestablished as early as the 5th century AD.
Recognising Textual Entailment
Method 4:
Word Overlap
Word Overlap Approaches
Popular approachRanging in sophistication from simple
bag of word to use of WordNetAccuracy rates ca. 55%
Word Overlap
Advantages Relatively straightforward algorithm
Disadvantages Hardly better than flipping a coin
RTE State-of-the-Art
Pascal RTE challenge
Hard problemRequires
semantics
Accuracy RTE 2004/5 (n=25)
0 10
0.49-0.50
0.51-0.52
0.53-0.54
0.55-0.56
0.57-0.58
0.59-0.60
Recognising Textual Entailment
Method 5:
Using DRT
Inference
How do we perform inference with DRSs? Translate DRS into first-order logic,
use off-the-shelf inference engines.
What kind of inference engines? Theorem Provers Model Builders
Using Theorem Proving
Given a textual entailment pair T/H with text T and hypothesis H: Produce DRSs for T and H Translate these DRSs into FOL Give this to the theorem prover:
T’ H’
If the theorem prover finds a proof, then T entails H
Vampire (Riazanov & Voronkov 2002)
Let’s try this. We will use the theorem prover Vampire (currently the best known theorem prover for FOL)
This gives us good results for: apposition relative clauses coodination intersective adjectives/complements passive/active alternations
Example (Vampire: proof)
On Friday evening, a car bomb exploded
outside a Shiite mosque in Iskandariyah,
30 miles south of the capital.
-----------------------------------------------------
A bomb exploded outside a mosque.
RTE-2 112 (TRUE)
Example (Vampire: proof)
Initially, the Bundesbank opposed the
introduction of the euro but was compelled
to accept it in light of the political pressure
of the capitalist politicians who supportedits introduction.
-----------------------------------------------------
The introduction of the euro has been opposed.
RTE-2 489 (TRUE)
Background Knowledge
However, it doesn’t give us good results for cases requiring additional knowledge Lexical knowledge World knowledge
We will use WordNet as a start to get additional knowledge
All of WordNet is too much, so we create MiniWordNets
MiniWordNets
MiniWordNets Use hyponym relations from WordNet to
build an ontology Do this only for the relevant symbols Convert the ontology into first-order
axioms
MiniWordNet: an example
Example text:
There is no asbestos in our products now. Neither Lorillard nor the researchers who studied the workers were aware of any research on smokers of the Kent cigarettes.
MiniWordNet: an example
Example text:
There is no asbestos in our products now. Neither Lorillard nor the researchers who studied the workers were aware of any research on smokers of the Kent cigarettes.
x(user(x)person(x))
x(worker(x)person(x))
x(researcher(x)person(x))
x(person(x)risk(x))
x(person(x)cigarette(x))
…….
Using Background Knowledge
Given a textual entailment pair T/H with text T and hypothesis H: Produce DRS for T and H Translate drs(T) and drs(H) into FOL Create Background Knowledge for T&H Give this to the theorem prover:
(BK & T’) H’
MiniWordNets at work
Background Knowledge:x(soar(x)rise(x))
Crude oil prices soared to record levels.
-----------------------------------------------------
Crude oil prices rise.
RTE 1952 (TRUE)
Troubles with theorem proving
Theorem provers are extremely precise.
They won’t tell you when there is “almost” a proof.
Even if there is a little background knowledge missing, Vampire will say:
NO
Vampire: no proof
RTE 1049 (TRUE)
Four Venezuelan firefighters who were traveling
to a training course in Texas were killed when their
sport utility vehicle drifted onto the shoulder of a
Highway and struck a parked truck.
----------------------------------------------------------------
Four firefighters were killed in a car accident.
Using Model Building
Need a robust way of inferenceUse model builder Paradox
Claessen & Sorensson (2003)
Use size of (minimal) model Compare size of model of T and T&H If the difference is small, then it is likely
that T entails H
Using Model Building
Given a textual entailment pair T/H withtext T and hypothesis H: Produce DRSs for T and H Translate these DRSs into FOL Generate Background Knowledge Give this to the Model Builder:
i) BK & T’
ii) BK & T’ & H’
If the models for i) and ii) are similar in size, then we predict that T entails H
Example 1
T: John met Mary in RomeH: John met Mary
Model T: 3 entitiesModel T+H: 3 entities
Modelsize difference: 0 Prediction: entailment
Example 2
T: John met Mary H: John met Mary in Rome
Model T: 2 entitiesModel T+H: 3 entities
Modelsize difference: 1Prediction: no entailment
Model size differences
Of course this is a very rough approximation
But it turns out to be a useful oneGives us a notion of robustnessOf course we need to deal with
negation as well Give not T and not [T & H] to model builder Not necessarily one unique minimal model
Lack of Background Knowledge
RTE-2 235 (TRUE)
Indonesia says the oil blocks are within its borders, as does Malaysia, which has also sent warships to the area, claiming that its waters and airspace have been violated.
---------------------------------------------------------------
There is a territorial waters dispute.
How well does this work?
We tried this at the RTE 2004/05Combined this with a shallow approach
(word overlap)Using standard machine learning methods to
build a decision treeFeatures used:
Proof (yes/no) Model size Model size difference Word Overlap Task (source of RTE pair)
RTE Results 2004/5
Accuracy CWS
Shallow 0.569 0.624
Deep 0.562 0.608
Hybrid (S+D) 0.577 0.632
Hybrid+Task 0.612 0.646
Bos & Markert 2005
Conclusions
We have got the tools for doing computational semantics in a principled way using DRT
For many applications, success depends on the ability to systematically generate background knowledge Small restricted domains [dialogue] Open domain
What we did in this course
We introduced DRT, a notational variant of first-order logic.
Semantically, we can handle in DRT anything we can in FOL, including events.
Moreover, because it is so close to FOL, we can use first-order methods to implement inference for DRT.
The DRT box syntax, is essentially about nesting contexts, which allows a uniform treatment of anaphoric phenomena.
Moreover, this works not only on the theoretical level, but is also implementable, and even applicable.
What we hope you got out of it
First, we hope we made you aware that nowadays computational semantics is able to handle some difficult problems.
Second, we hope we made you aware that DRT is not just a theory. It is a complete architecture allowing us to experiment with computational semantics.
Third, we hope you are aware that state-of-the-art inference engines can help to study or apply semantics.
Where you can find more
For more on DRT read the standard textbook devoted to DRT by Kamp and Reyle. This book discusses not only the basic theory, but also plurals, tense, and aspect.
Where you can find more
For more on the basic architecture underlying this work on computational semantics, and particular on implementations on the lambda calculus, and parallel use of theorem provers and model builders, see:
www.blackburnbos.org
Where you can find more
All of the theory we discussed in this course is implemented in Prolog. This software can be downloaded from www.blackburnbos.org. For an introduction to Prolog written very much with this software in mind, try Learn Prolog Now!
www.learnprolognow.org