Using construction grammar in conversational systems

Using Construction Grammarin

Conversational Systems

Marie-Claire Jenkins, PhD Thesis

(High level overview)

Overview

This thesis was motivated by the machine's limitations in understanding natural language and in forming responses. The limitations and complexities of current search engine querying was also a factor.

Conversational systems are good for testing possible solutions and are useful on the web.

We used methods that are not common in these systems:

- Construction Grammar (CxG)- OWL ontologies- Lexical semantics- A new stemmer (Uea-Lite)

What I'm going to talk about

• Conversational systems: what they are and how they work & what their limitations are

• The Turing test and the Loebner prize

• 2 early experimental systems that we built

• OWL ontologies vs databases

• Construction grammar and Fluid construction grammar

• UEA-Lite stemmer

• Machine learning component

• KIA system diagram

• Evaluation methods and learnings

Things I covered in my research:

- Natural language understanding - Natural language generation- Human computer interaction- Service oriented systems

Things I didn't cover in my research:

- Knowledge acquisition- Open domains- Affective behaviour- Everything else

Conversational systems

They are more commonly referred to as "chatbots" or “Artificial Conversational Entities”

They converse with a user in natural language and simulate a human-human conversation.

They need to:

- "Understand” the user input- Retrieve relevant information- Generate a natural language response

There are 3 different kinds of chatbots...

Social chatbots

Their purpose is to chat freely about anything at all with a user, much like you would with a friend. They are used online for fun.

Educational chatbots

Their purpose is to help the user learn about something such as a new language, history or geography. They are often used in schools

Service oriented chatbots

Their purpose is to help customers find their way around the website and also to answer questions about their products & services.

How they work

There are a variety of methods used but the most popular are:

- Database driven- AIML (artificial intelligence markup language, xml based) - Canned responses- Stochastic methods- Supervised learning- Named entity recognition- Templates

“Phrase Based systems” are seen as generalized templates at the sentence level (like phrase structure rules) or at the discourse level.

1- Phrasal pattern selected [subject noun verb]

2 - Each part of the pattern is expanded [noun modifiers]

3 - When each phrasal pattern has been replaced by 1+ words –END

They are very difficult to build because the phrasal interrelationships must be clearly specified otherwise there can be inappropriate phrase expansions.

Phrase-based systems

In “Feature-based systems” each possible alternative is represented by a feature and each sentence is specified by them.

Sentence generation is achieved by using all of these features until the sentence is determined.

Features may include: positive/negative, past/present, statement/question…

Strength: any distinction in language can be a feature

Weakness: very hard to maintain feature inter-relationships and the control of feature selection.

Feature-based systems

Tests on dialogue from the human-human customer service system on a large commercial website reveal that there is no consistency in language or phrase formulation.

There is a very small amount of Formulaic language (canned responses).

A question was never formulated in the same way and never answered in the same way (apart from formulaicity).

This makes it hard for us to produce templates or anticipate user utterances.

Observations from live data

More Limitations

Main issues with existing systems:

- Scalability- Knowledge & information storage- User input disambiguation- Response generation (word order, vocabulary, etc...)- Knowledge/information retrieval- Anaphora- Managing the dialogue- Displaying appropriate behaviour (affective issues)- Knowledge assimilation- Evaluation

Turing test

“A machine is termed capable of thinking if it can, under certain prescribed conditions imitate a human by answering questions sufficiently well to deceive a human questioner for a reasonable period of time.” (Turing)

Objections to the test include proving intelligence, "understanding" and other things.

My personal opinion has changed since the beginning of my PhD research:

“The question of whether a computer can think is no more interesting than the question of whether a submarine can swim.” (Dijkstra)

Turing test illustration

Wikipedia

Loebner prize

This yearly contest is run by Hugh Loebner who has offered a $100,000 prize for the 1st chatbot to pass the Turing test

This test is controversial. Marvin Minsky said:

“I do hope that someone will volunteer to violate this proscription so that Mr. Loebner will indeed revoke his stupid prize, save himself

some money, and spare us the horror of this obnoxious and unproductive annual publicity campaign.”

Loebner prize diagram

Michael Mauldin- carnegie mellon

John

We built a conversational chatbot and entered it into the Loebner prize (2006). It was designed & built in 2 months and operated on a closed domain.

Reason: to run on a small database requiring little manual labour. We used ngrams, weighted responses, a vector approach, perl, Brill, UEA-Lite, wildcards, AIML

We were a finalist and we learned that:

- A small database worked for a small amount of time- A database system makes for laborious build and limited

information (well used systems work much better)- Template methods are limited- Canned responses are awkward- AIML is restrictive

KIA: the HCI tests

We designed a system made to research human-machine interaction and human behaviour: this is a test on humans and not the system

We included functions that were meant to test user persistence with query repair, emotive response, language etc...

Results: users persist, are emotive, sensitive to interface design and more.

Details available in our paper

KIA – a CxG & OWL driven system

Databases vs OWL ontologies:

Databases focus on local semantics and ontologies on global semantics.

In ontologies the semantics are explicit and in databases implicit.

Ontologies allow data to be reused whereas database schemas cannot be reused.

Ontologies are portable between websites to facilitate maintenance and construction

Restrictions in databases do not allow for all of the necessary relations to be built into the data.

Database(Wordpress Bits)

Owl Ontology(Richard Durban)

OWL flavour

We used OWL (Web Ontology Language) as it is more expressive than other semantic web languages and is built to enable ontologies to be created easily.

It is a semantic markup language and an extension of RDF (Resource Description Framework).

There are different subsets of OWL: OWL Full, OWL Lite and OWL DL (Description Logic).

We chose to use OWL DL.

Why Ontologies & why OWL DL?

Taxonomies are also not as expansive as ontologies.

“At one extreme there are ontologies and the other mind maps and pathfinder networks, and in between taxonomies and browserable hierarchies”. (Brewtser and Wilkes)

Ontologies have a greater potential for inference and a greater degree of formality.

OWL DL has stricter restrictions which are necessary in our type of system.

It has maximum expressiveness without losing computational completeness (all entailments are will be computed) and decidability (all computations will finish in finite time) of reasoning systems.

OWL Ontology example: Koala

What do we store in there?

- All of the domain knowledge (e.g all about Koalas)

- The collection of constructions (commonly used when discussing koalas)

- Canned responses (formulaic language)

KIA system domain knowledge

Construction Grammar

It is a cognitive linguistic method and it is:

- Constraint based- Generative- Non-derivational- A monostratal grammatical model- Incorporates the cognitive and interactional foundations of

language- Consists of taxonomies of families of constructions- Uses entire constructions as the primary unit of grammar- Is a pairing of form and meaning (metonomic)- Frames used in CxG != regular frames because the argument

structure types invoke frames which designate event types- The verb alone is not the main unit of meaning, the construction

itself is

ConstructionsWords

Sentences

Constructions make sense in computing

Example of CxG

Semantics: relational predicate involving a singer Syntactics: predicate requires arguments and ``Heather'' is the

subject

Generative Grammar

Construction Grammar

Advantages of CxG

- Adapts to changing language patterns easily

- Takes into consideration both semantics and syntactics

- Constructions are easier to manage than words as the atomic unit

- Allows for integration into bigger collections of constructions

- Can be computed

UEA-Lite stemmerAfter testing the system with all available stemmers, we realised that

we needed to design our own to facilitate topic/construction detection.

UEA-Lite stems conservatively to orthographically correct word forms and recognizes words which do not need to be stemmed.

There is a Perl, Java and Ruby version

More information here(an updated paper to follow soon)

Machine learning

It identifies constructions (NP or VP), the syntactic pole and the semantic pole feed information so constructions to be loaded with meaning and form information.

The machine learning engine finds sets of constructions which commonly work in conjunction with each other or that have been used in conjunction in the past.

The weights are adjusted each time a new construction is added. This happens when the system encounters a new instance.

The engine runs through this data and calculates a probability of the right matches to the query information to be found.

Algorithms

- Jaccard Distance to weight the constructions (how often different constructions are found in conjunction, partial or complete)

- Naive Bayes algorithm clusters all of the constructions according to their different features in our training set (requires little training data)

Once the data has been processed through the Naive Bayes algorithm we know which constructions are often found with others, and in what order. We not only look at the syntax but also at the semantic aspect both in isolation and in conjunction with each other.

The role of the classifier is to determine which categories future constructions belong to, and also to tell us which constructions are a likely match to a query.

Naïve Bayes for CxG

P (Constructions) doesn't change over time. Naive Bayes estimates a multinomial distribution over categories, which is the prior distribution of categories We can therefore say that:

Best category [ArgaMax cat in cats] = P (constructions ¦ cat) (P (cat))

If c1, c2, ... cn are the constructions in the document, then:

Best category [ArgaMax cat in cats] = P(c1|cat)*P(c2|cat)*...*P(cn|cat)*P(cat)

System diagram

There are many more components to the systemthan presented in this presentation as you can see.

Evaluation methods

There are not any robust evaluation methods for conversational systems but we found that a mixture of the following worked well:

- Human evaluation (feedback form)- "Pourpre” to evaluate sentence complexity (Jimmy Lin)- Expected vs Given response score

Evaluation is not finished as yet but the initial results are encouraging with good knowledge retrieval and construction selection.

Things that didn't work

Using LSI/PLSI to determine the similarity between individual utterances in order to extract useful constructions failed.

The reasons:

LSI is an information retrieval method and Q&A systems require a higher level of accuracy.

Information retrieval uses a hammer and every problem is a nail.Subtler systems require a more delicate approach.

It is very hard to get LSI to scale to sentence level, which is interesting as it has been proven that it doesn't scale

The fact that it can't capture polysemy is ok because we disambiguate prior to this and append information to constructions

Fluid construction Grammar (FCG)(also didn't work!)

- Bi-directional (using rules)

- Selects meanings and maps them into the real world.

- "fluid" because it takes into consideration the fact that users change and update their grammars often.

- User input can be broken down syntactically in order to gain meaning from the grammatical components, whilst also being able to map the semantic relationships

BUT: not developed enough to work well in our system

Also: bi-directional rules are very hard to write

Some Outcomes & Learnings

- Construction Grammar is a useful method for NLU & NLG

- OWL ontologies are well suited to these systems

- Stemming affects the system greatly

- Fluid CxG is not practical at this time

- Better evaluation methods need to be developed

- Turing test is not useful as it does not provemachine intelligence or understanding

- User perception is a primordial area of research

Applications & Future work

- Assisted search- Summarization systems

- Content creation- Speech systems

- Sentiment analysis- More powerful AI module

- Anaphora resolution- Open domain testing

- Improved machine learning- Further work on query disambiguation methods

Thank you

Find me at:

http://www.scienceforseo.comhttp://twitter.com/missmcj

Google reader

Using construction grammar in conversational systems

Technology

Transcript of Using construction grammar in conversational systems