Using construction grammar in conversational systems
-
Upload
cj-jenkins -
Category
Technology
-
view
2.778 -
download
4
description
Transcript of Using construction grammar in conversational systems
Using Construction Grammarin
Conversational Systems
Marie-Claire Jenkins, PhD Thesis
(High level overview)
Overview
This thesis was motivated by the machine's limitations in understanding natural language and in forming responses. The limitations and complexities of current search engine querying was also a factor.
Conversational systems are good for testing possible solutions and are useful on the web.
We used methods that are not common in these systems:
- Construction Grammar (CxG)- OWL ontologies- Lexical semantics- A new stemmer (Uea-Lite)
What I'm going to talk about
• Conversational systems: what they are and how they work & what their limitations are
• The Turing test and the Loebner prize
• 2 early experimental systems that we built
• OWL ontologies vs databases
• Construction grammar and Fluid construction grammar
• UEA-Lite stemmer
• Machine learning component
• KIA system diagram
• Evaluation methods and learnings
Things I covered in my research:
- Natural language understanding - Natural language generation- Human computer interaction- Service oriented systems
Things I didn't cover in my research:
- Knowledge acquisition- Open domains- Affective behaviour- Everything else
Conversational systems
They are more commonly referred to as "chatbots" or “Artificial Conversational Entities”
They converse with a user in natural language and simulate a human-human conversation.
They need to:
- "Understand” the user input- Retrieve relevant information- Generate a natural language response
There are 3 different kinds of chatbots...
Social chatbots
Their purpose is to chat freely about anything at all with a user, much like you would with a friend. They are used online for fun.
Educational chatbots
Their purpose is to help the user learn about something such as a new language, history or geography. They are often used in schools
Service oriented chatbots
Their purpose is to help customers find their way around the website and also to answer questions about their products & services.
How they work
There are a variety of methods used but the most popular are:
- Database driven- AIML (artificial intelligence markup language, xml based) - Canned responses- Stochastic methods- Supervised learning- Named entity recognition- Templates
“Phrase Based systems” are seen as generalized templates at the sentence level (like phrase structure rules) or at the discourse level.
1- Phrasal pattern selected [subject noun verb]
2 - Each part of the pattern is expanded [noun modifiers]
3 - When each phrasal pattern has been replaced by 1+ words –END
They are very difficult to build because the phrasal interrelationships must be clearly specified otherwise there can be inappropriate phrase expansions.
Phrase-based systems
In “Feature-based systems” each possible alternative is represented by a feature and each sentence is specified by them.
Sentence generation is achieved by using all of these features until the sentence is determined.
Features may include: positive/negative, past/present, statement/question…
Strength: any distinction in language can be a feature
Weakness: very hard to maintain feature inter-relationships and the control of feature selection.
Feature-based systems
Tests on dialogue from the human-human customer service system on a large commercial website reveal that there is no consistency in language or phrase formulation.
There is a very small amount of Formulaic language (canned responses).
A question was never formulated in the same way and never answered in the same way (apart from formulaicity).
This makes it hard for us to produce templates or anticipate user utterances.
Observations from live data
More Limitations
Main issues with existing systems:
- Scalability- Knowledge & information storage- User input disambiguation- Response generation (word order, vocabulary, etc...)- Knowledge/information retrieval- Anaphora- Managing the dialogue- Displaying appropriate behaviour (affective issues)- Knowledge assimilation- Evaluation
Turing test
“A machine is termed capable of thinking if it can, under certain prescribed conditions imitate a human by answering questions sufficiently well to deceive a human questioner for a reasonable period of time.” (Turing)
Objections to the test include proving intelligence, "understanding" and other things.
My personal opinion has changed since the beginning of my PhD research:
“The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.” (Dijkstra)
Turing test illustration
Wikipedia
XKCD
Loebner prize
This yearly contest is run by Hugh Loebner who has offered a $100,000 prize for the 1st chatbot to pass the Turing test
This test is controversial. Marvin Minsky said:
“I do hope that someone will volunteer to violate this proscription so that Mr. Loebner will indeed revoke his stupid prize, save himself
some money, and spare us the horror of this obnoxious and unproductive annual publicity campaign.”
Loebner prize diagram
Michael Mauldin- carnegie mellon
John
We built a conversational chatbot and entered it into the Loebner prize (2006). It was designed & built in 2 months and operated on a closed domain.
Reason: to run on a small database requiring little manual labour. We used ngrams, weighted responses, a vector approach, perl, Brill, UEA-Lite, wildcards, AIML
We were a finalist and we learned that:
- A small database worked for a small amount of time- A database system makes for laborious build and limited
information (well used systems work much better)- Template methods are limited- Canned responses are awkward- AIML is restrictive
KIA: the HCI tests
We designed a system made to research human-machine interaction and human behaviour: this is a test on humans and not the system
We included functions that were meant to test user persistence with query repair, emotive response, language etc...
Results: users persist, are emotive, sensitive to interface design and more.
Details available in our paper
KIA – a CxG & OWL driven system
Databases vs OWL ontologies:
Databases focus on local semantics and ontologies on global semantics.
In ontologies the semantics are explicit and in databases implicit.
Ontologies allow data to be reused whereas database schemas cannot be reused.
Ontologies are portable between websites to facilitate maintenance and construction
Restrictions in databases do not allow for all of the necessary relations to be built into the data.
Database(Wordpress Bits)
Owl Ontology(Richard Durban)
OWL flavour
We used OWL (Web Ontology Language) as it is more expressive than other semantic web languages and is built to enable ontologies to be created easily.
It is a semantic markup language and an extension of RDF (Resource Description Framework).
There are different subsets of OWL: OWL Full, OWL Lite and OWL DL (Description Logic).
We chose to use OWL DL.
Why Ontologies & why OWL DL?
Taxonomies are also not as expansive as ontologies.
“At one extreme there are ontologies and the other mind maps and pathfinder networks, and in between taxonomies and browserable hierarchies”. (Brewtser and Wilkes)
Ontologies have a greater potential for inference and a greater degree of formality.
OWL DL has stricter restrictions which are necessary in our type of system.
It has maximum expressiveness without losing computational completeness (all entailments are will be computed) and decidability (all computations will finish in finite time) of reasoning systems.
OWL Ontology example: Koala
What do we store in there?
- All of the domain knowledge (e.g all about Koalas)
- The collection of constructions (commonly used when discussing koalas)
- Canned responses (formulaic language)
KIA system domain knowledge
Construction Grammar
It is a cognitive linguistic method and it is:
- Constraint based- Generative- Non-derivational- A monostratal grammatical model- Incorporates the cognitive and interactional foundations of
language- Consists of taxonomies of families of constructions- Uses entire constructions as the primary unit of grammar- Is a pairing of form and meaning (metonomic)- Frames used in CxG != regular frames because the argument
structure types invoke frames which designate event types- The verb alone is not the main unit of meaning, the construction
itself is
ConstructionsWords
Sentences
Constructions make sense in computing
Example of CxG
Semantics: relational predicate involving a singer Syntactics: predicate requires arguments and ``Heather'' is the
subject
Generative Grammar
Construction Grammar
Advantages of CxG
- Adapts to changing language patterns easily
- Takes into consideration both semantics and syntactics
- Constructions are easier to manage than words as the atomic unit
- Allows for integration into bigger collections of constructions
- Can be computed
UEA-Lite stemmerAfter testing the system with all available stemmers, we realised that
we needed to design our own to facilitate topic/construction detection.
UEA-Lite stems conservatively to orthographically correct word forms and recognizes words which do not need to be stemmed.
There is a Perl, Java and Ruby version
More information here(an updated paper to follow soon)
Machine learning
It identifies constructions (NP or VP), the syntactic pole and the semantic pole feed information so constructions to be loaded with meaning and form information.
The machine learning engine finds sets of constructions which commonly work in conjunction with each other or that have been used in conjunction in the past.
The weights are adjusted each time a new construction is added. This happens when the system encounters a new instance.
The engine runs through this data and calculates a probability of the right matches to the query information to be found.
Algorithms
- Jaccard Distance to weight the constructions (how often different constructions are found in conjunction, partial or complete)
- Naive Bayes algorithm clusters all of the constructions according to their different features in our training set (requires little training data)
Once the data has been processed through the Naive Bayes algorithm we know which constructions are often found with others, and in what order. We not only look at the syntax but also at the semantic aspect both in isolation and in conjunction with each other.
The role of the classifier is to determine which categories future constructions belong to, and also to tell us which constructions are a likely match to a query.
Naïve Bayes for CxG
P (Constructions) doesn't change over time. Naive Bayes estimates a multinomial distribution over categories, which is the prior distribution of categories We can therefore say that:
Best category [ArgaMax cat in cats] = P (constructions ¦ cat) (P (cat))
If c1, c2, ... cn are the constructions in the document, then:
Best category [ArgaMax cat in cats] = P(c1|cat)*P(c2|cat)*...*P(cn|cat)*P(cat)
System diagram
There are many more components to the systemthan presented in this presentation as you can see.
Evaluation methods
There are not any robust evaluation methods for conversational systems but we found that a mixture of the following worked well:
- Human evaluation (feedback form)- "Pourpre” to evaluate sentence complexity (Jimmy Lin)- Expected vs Given response score
Evaluation is not finished as yet but the initial results are encouraging with good knowledge retrieval and construction selection.
Things that didn't work
Using LSI/PLSI to determine the similarity between individual utterances in order to extract useful constructions failed.
The reasons:
LSI is an information retrieval method and Q&A systems require a higher level of accuracy.
Information retrieval uses a hammer and every problem is a nail.Subtler systems require a more delicate approach.
It is very hard to get LSI to scale to sentence level, which is interesting as it has been proven that it doesn't scale
The fact that it can't capture polysemy is ok because we disambiguate prior to this and append information to constructions
Fluid construction Grammar (FCG)(also didn't work!)
- Bi-directional (using rules)
- Selects meanings and maps them into the real world.
- "fluid" because it takes into consideration the fact that users change and update their grammars often.
- User input can be broken down syntactically in order to gain meaning from the grammatical components, whilst also being able to map the semantic relationships
BUT: not developed enough to work well in our system
Also: bi-directional rules are very hard to write
Some Outcomes & Learnings
- Construction Grammar is a useful method for NLU & NLG
- OWL ontologies are well suited to these systems
- Stemming affects the system greatly
- Fluid CxG is not practical at this time
- Better evaluation methods need to be developed
- Turing test is not useful as it does not provemachine intelligence or understanding
- User perception is a primordial area of research
Applications & Future work
- Assisted search- Summarization systems
- Content creation- Speech systems
- Sentiment analysis- More powerful AI module
- Anaphora resolution- Open domain testing
- Improved machine learning- Further work on query disambiguation methods
Thank you
Find me at:
http://www.scienceforseo.comhttp://twitter.com/missmcj
Google reader