By : Vanessa López, Enrico Motta Knowledge Media Institute. Open University Ontology-driven...

26
By : Vanessa López, Enrico Motta Knowledge Media Institute. Open University Ontology-driven question answering in: AQUALog 9 th International Conference on Applications of Natural Language to Information Systems NLDB’04 {v.lopez, e.motta}@open.ac.uk
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    1

Transcript of By : Vanessa López, Enrico Motta Knowledge Media Institute. Open University Ontology-driven...

By : Vanessa López, Enrico MottaKnowledge Media Institute.

Open University

Ontology-driven question answering in:

AQUALog

9th International Conference on Applications of Natural Language to Information

Systems NLDB’04

{v.lopez, e.motta}@open.ac.uk

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04IndexIndex

• Motivation: NL front-end for the Semantic Web

• AquaLog approach

•System Architecture

• Examples

• Evaluation/ Discussion

• Future Lines/ Conclusions

vl474
AquaLog was born under our vision of Engineering semantics on the web, so, I would like to start talking about the motivation and apportation of AquaLog under or semantic web vision

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04

•The future web:

-knowledge to be managed in an automatic way

The Semantic Web Vision

The Semantic Web Vision

Semantics through:

- Set of representation languages: rdf,…

- Structures for knowledge: ontologies

ASSUMPTION:

Ontology-based semantic markup

will become widely available

Engineering Semantics on the Web

?

vl474
In the semantic web vision, rich, ontology-based semantic markup is widely available to support human usesrs in locating and making sense of the information The web is set for an intelectual revolution. The dream ever since the net began to catch more info than anyone can consume has been some kind of intelligence internet or semantic web. The semantic web is an intelligence web which can understand what it contains.The use of these ontologies and SET OF REPRESENTATION LANGUAGES TO ADD SEMANTICS ON THE WEB, opens the way to novel and sophisticated forms of question answering (as discusses by Mcguinness in her recens essay about QA on the web- IEEE intelligent systems)(The semantics is meaning associate with the information on the web, information about a domain in the form of taxonomies or relationships, this information is organize inside our structures for knowledge or ontologies.)

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04Question-Answering Question-Answering

- Novel, sophisticated Question Answering using Semantic

Mark-up

- Semantic Mark-up queried directly

Similar scenario- asking NL queries to databases

(semantic mark-up viewed as a knowledge base)

•So…

Ontology

portable

vl474
while semantic information can be used in different ways to improve question answering and semantic search, an important consequence of the availability of semantic markup on the net, is that this can be indeed queried directly.Of course, this scenario is similar to asking NL queries to databases which has long been an area of research in the AI and database communities, even in the past decade has somewhat gone otu of fashion. However, it is our vies that the semantic web provides a new and potentially important context in which the results from this area can be applied.We believe that in the semantic web scenario it makes sense to provide query answering systems on the semantic web, which are portable to respect to ontologies, where the use is able to select and ontology ans ask queries to respect the domain covered by the ontology

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04For instance..

For instance..

vl474
For instance, we are currently augmenting our departmental website in kmi with semantic markup by instantiating an ontology describing academic life with information about personnel, projects, tecnhologies, events, which is automatically extracted from departmental databases and unustructured web pages. This semantic markup makes possible to answer search queries like the homepage of enrico motta. Moreover as pointed before we can also query this semanic markup diretly, and ask a query such what are the project in semantic web area? and reasonign about the semantic markup and the ontology we could get the answer

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04

• Example!!:

What are the projects of enrico motta?

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04AquaLog: ApproachAquaLog: Approach

NL SENTENCEINPUT

LINGUISTIC &

QUERY CLASSIFICATION

RELATIONSIMILARITY

SERVICE

INFERENCEENGINE

QUERY

TRIPLE

S

ONTOLOGY

COMPATIBLE

TRIPLES

ANSWER

Intermediate triples: <subject, predicate, object> +

features

vl474
AquaLog is a portable QA which takes queries expresed in natural language and an ontology as an input and returns answers from one or more KBs (which instantiate the input ontology).AquaLog adopts a triple-based data model. The role of the linguistic component is to translate the input query into a set of intermediate triples. These are further processed by the module called Relation Similarity Service to produce ontology-compliant queries. There are two main reasons for adopting a triple-based data model. Althoug not all possible queries can be represented in the binary relational model, in practise these occur very frequently. Secondly, RDF-based knowledge representations formalisms for the semantic web (RDF, OWL) also subscribe to this binary relational model and express statements as <subject, predicate, object>

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04

2 main subtasks:

•Intermediate representation from the input query

•Map the intermediate representation to the kb

OntologíesKnowledge

Bases

? PLUG-INSAQUALOG

AquaLog: ApproachAquaLog: Approach

Linguistic Component:

Relation Similarity Service:

vl474
So, the tasks of producing an intermediate logical representation from the input query and mapping the intermediate query into the target KB are independent from each otherA key-feature of the architecture is the use of a plug-in mechanism which allos aqualog to be configured for different knowledge representation languages (currently we used with our ocml-based knowledge representation infrastracture)

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04Linguistic ComponentLinguistic Component

USER’S SESSION

QUERY INTERFACE

LINGUISTICCOMPONENT

GATELIBRARIES

NL QUERY TRIPLES

NL QUERY

TERM

S

VERBS

FEATU

RES

TOKENS

NOUNS

PREPS

WH-S

TRIPLE(s)RELATIONSJAPE

TERMSRELATIONSWH-TERMS

QUERY-PATTERN-CLASSIFICATION

vl474
Aqualog uses gate infrastracture and resources as part of the linguistic component. After the execution of the GATE controller a set of syntactical annotations are returned, annotations include information about: tokens, nouns, prepositions, and verbs, and features like voice and tense for the verbs or categories for the nouns like singular/plural. We extended the set of annotations returned by GATE by identifying relations, question indicators and patterns or types of questions, through the use of JAPE grammars, which consists in a set of phases, which run sequentially, each of them defined as a set of rulesJAPE is a language for writing regular expressions over annotations, and for using patterns matched in this way as the basis for creating more annotationsJAPE allows you to recognise regular expressions in annotations on documents. A regular language can only describe sets of strings ( linear sequence of items), not graphs, and GATE’s model of annotations is based on graphs. When there is structure in the graph being matched that requires more than the power of a regular automaton to recognise, JAPE chooses an alternative arbitrarily. However, this is not the bad news that it seems to be, as it turns out that in many useful cases the data stored in annotation graphs in GATE (and other language processing systems) can be regarded as simple sequences, and matched deterministically with regular expressions.

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04Linguistic ComponentLinguistic Component

WH-GENERIC TERM: projects? – involved - semantic web

person/organization? - managed (passive) - motta

WH-UNKNOWN TERM: value?– job title -

motta WH-UNKNOWN RELATION:

DESCRIPTION:

AFFIRMATIVE-NEGATIVE:

COMBINATION OF BASIC QUERIES:

value?– web address -- peter

person/organization? –

has interest – semantic web

vl474
For instance, to get the relations we exploited the fact that natural language commonly employs a preposition to express relationship

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04

Ontologies

LINGUISTIC TRIPLE

RSS USER

MECHANISM(S

)

Relation Similarity Service

Relation Similarity Service

vl474
The RSS is the backbone of the QA system, is called after the NL query has been transformed into the triple, and it is responsible for producing an ontology compliant queryThe RSS tries to make sense of the input query by looking at hte structure in the ontology (in terms of taxonomies and relationships) and the information stored on the kb. There is not a single strategy here (different ways in which a query can be interpreted), also it is using string algorithms, wordnet and a lexicon obrained through a learning mechanism.An important aspect of the RSS is that it is interactive, it will ask the user for help when it is not able to disambiguate

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04The relation similarity service

The relation similarity service

?Relations/concepts

similarities

Translated query Ontological structures

THE PROBLEM

dynamic

secretary(person, KMI) works-in-unit (secretary, knowledge-media-institute)

Who is the secretary in Kmi?

RSSResearch

institute

vl474
For example, in the question "who is the secretary of kmi" the first step is to identify that kmi is a research institute in the target kb, and who may be a person (sometimes can be an organization). So, the problem becomes one of finding a relation which links a person (or any of its subclasses like academics, students, ..) to a research institute. But, also in this case there is a succesful matching for secretary in the kb (in which secretary is not a relation but a concept, which is a subclass of person), so the problem now is to find relations between secretary (which is more restricted than person) and research institute.We could see the answer in the next screenshot

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04

CATEGORY: WH-UNKNOWN TERM

QUERY TRIPLE:

VALUE? – PROJECTS – JOHN

DOMINGUE

USER’S FEEDBACK REQUIRED!!

PROJECT? – ? - JOHN DOMINGUE

What are the projects of john domingue

vl474
This is a similar case in which projects instead of being a relation of john domingue is a concept, so we are in fact looking for an unknown relation between projects and john domingue. There are multiple relations as possible candidates for interpreting the query, the string maching and lexical resources cannot help so user's feedback is required

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04

ONTO TRIPLE:

PROJECTS – HAS-PROJECT-MEMBER

(OR) HAS-PROJECT-LEADER- JOHN-

DOMINGUE

ANSWER:

LIST OF PROJECTS

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04

CATERGORY: WH-GENERIC TERM

QUERY TRIPLE:

RESEARCH AREAS – COVERED -AKT

ONTO TRIPLE:

RESEARCH-AREA – ADDRESSES-

GENERIC-AREA-OF-INTEREST – AKT

SOLUTION: LIST OF RESEARCH AREAS

What are the research areas covered by the akt project ?

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04

vl474
It is important to emphasize that calling on the users to disambiguate is only done if not information is available in aqualog to allow the system disambiguate the query directly by itself. For instance, let's consider the 2 queries shown in the example. On the left screen we are looking for the homepage of peter and given that the system is unable to disambiguate between peter scott, sharpe and whalley user's feedback is required.However, on the right screen we are asking for the homepage of peter who has interest in semantic web, in this case aqualog doesn't need assistance from theuser given that only one of the three Peters has an interest in semantic web. Also if founds that the relation homepage correspond to has-web-address in the ontology thanks to the use of the lexicon

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04AquaLog: ArchitectureAquaLog: Architecture

USER’S SESSION

QUERY INTERFACE

ANSWERING PROCESSING INTERFACE

Gatelibraries Configuration

files

LINGUISTICCOMPONENT

String patternlibraries

Configurationfiles

HelpRSS-IEmodules

Interpreter

WordNetthesauruslibrariesOntologíes

KnowledgeBases

RELATION SIMILARITY SERVICE

TRIPLES

QUERY

‘Raw’ AnswerOntology-compliant

query

Answer

User’sfeedback

Post-processSemanticmodules

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04Evaluation Evaluation

•Initial study:- Satisfy the users expectations about the range of questions?

- Possible extensions to the ontology and linguistic components?

- 70 questions: no linguistic constraints

- 48.68% of the total were handled correctly

Aktive-referenceontology

?

LINGUISTIC

FAILURE

DATA

MODEL

FAILURE

RSS

FAILURE

CONCEPTUAL

FAILURE

SERVICE

FAILURE

NLP -> TRIPLE

69% of errors

NL too complicate

for triples

0% of errors

Query

TRIPLE

Onto

TRIPLE

7.6% of errors

Ontology does

no cover query

10.2% of errors

requires

ranking

and similarity

services

20.5% of errors

vl474
We perform an initial study to assess to what extend the aqualog system, using the kmi KB satisfied the users expectations about the range of question the system should be able to answer. Also to identify the nature of possible extensions to the ontology and linguistic components.We asked 10 members of kmi to ask questions about the lab without provide them with any information about AquaLog linguistic ability, only about the conceptual coverage of the ontology and we also asked them not to ask question which required temporal reasoning. However, it was ok if they contain spelling mistakes.We collect in total 76 different questions, which is quite ok considering that no linguistic restriciton was imposed on the questions.Linguistic Failure: when the NLP component is unable to generate the intermediate representation (the question can usually be reformulated and answered)Data model failute: NL query is too complicate for the intermediate representations. Never ocurred our intuition is this is not only because it is a good way to model queries but also because the ontology-driven natureService failure: required services to be defined over the ontologies. For instance, "top researchers", "succesful projects" requires a mechanism for ranking.Examples???????

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04AquaLog version 2 AquaLog version 2

-Improved linguistic coverage:- which researchers wrote publications related to social aspects?

-Implementing services-Similarity services: is there a project similar to akt?

-Ranking services: what are the most successful projects?

NEW VERSION TO HANDLE 87% OF THE FAILURES

vl474
After the evaluation study , what can be done to improve it?SERVICES???

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04Current work: Example

Current work: Example

Are there any projects about ontologies sponsored by eprsc?

CLAUSE

CATEGORY: WH-GENERIC-1TERM-CLAUSE

QUERY TRIPLE:

SPONSORED (PROJECTS, ONTOLOGY,

EPRSC)

CATEGORY: WH-UNKNOWN-REL

ONTO TRIPLE: ADDRESSES-GENERIC-AREA-OF-

INTEREST (PROJECT?, ONTOLOGIES)

CATEGORY: WH-GENERIC-TERM

ONTO TRIPLE: FUNDING SOURCE (PROJECT?,

ONTOLOGIES)

SOLUTION(WH-GENERIC-1TERM-CLAUSE):

COMBINATION OF LISTS

vl474
get all the projects about ontologies and get all the projects sponsored by eprsc

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04

Which projects are headed by researchers in akt?

CLAUSEIS A CLASS!

CATEGORY: WH--3TERM

QUERY TRIPLE:

HEADED (PROJECTS, RESEARCHERS, AKT)

CATEGORY: WH-3TERM

ONTO TRIPLE: HAS-PROJECT-MEMBER OR LEADER

(PROJECT?, RESEARCHER)

CATEGORY: WH-UNKNOWN-REL

ONTO TRIPLE: HAS-PROJECT-MEMBER OR LEADER

(RESEARCHERS, AKT)

SOLUTION(WH-3-TERM – CLAUSE TO THE 2

TERM): GET THE FIRST LIST FOR THE CLAUSE AND

GET A LIST

FOR EACH OF THE ELEMENTS IN THE LIST

Current work: Example

Current work: Example

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04ConclusionConclusion

• Novel RSS Service: combination of pattern matching, lexicon &

reasoning about the ontology (taxonomy, relationships).

•Term (Triple): instance/class

•Relation (Triple): relation/class (not necessarily known)

•Linguistic Component: GATE (Sheffield university). Very flexible

through the use of patterns: currently around 26 linguistic

patterns.

•String algorithms find matching in the ontology for any of the

triple terms. Based on combination of string distance metrics for

name matching tasks (open source: Carnegie Melon University –

Pittsburgh)

•Portable little configuration effort

•Portability across ontologies have to be evaluated

Ontology-driven Question Answering in AquaLog Vanessa López

NLDB’04

End

Thanks for your attention

[email protected]