Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica...

29
Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield [email protected]

Transcript of Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica...

Page 1: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

Natural Language Interfaces to Ontologies

LarKc PhD symphosium, Beijing, 14 November 2010

Danica DamljanovićUniversity of Sheffield

[email protected]

Page 2: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

3

What are Natural Language Interfaces to Ontologies?

Page 3: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

Customisation

Ontology editing (e.g. using Protege)

Domain lexicon

NLI for querying

Domain knowledge

WordNet

Domain expert

Ontology engineer NLI for Ontology authoring

Page 4: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

The Objective

• Increase usability of Natural Language Interfaces to ontologies– For end users: increase precision and recall– For application developers: decrease the time for

customisation

Page 5: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

Previous Work: QuestIO

1.15

1.19

compare

Page 6: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

But...

• Ontologies are not perfect:– ontology lexicalisations often missing or too many– ranking based on ontology structure might be

misleading• Encouraging users to use keywords might be

misleading• User evaluation:– defined tasks: user satisfaction reaching 90%– undefined tasks: user satisfaction low (~44%)

Page 7: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

13

• Feedback: showing the user system interpretation of the query• Refinement:

– resolving ambiguity: generating dialog whenever one term refers to more than one concept in the ontology (precision)

• Extended Vocabulary:– expressiveness: generating dialog whenever an “unknown” term appears

in the question (recall)– portability: no need for customisation from application developers

• The dialog:– generated by combining the syntactic parsing and ontology-based lookup– learns from the user’s selections

FREyA - Feedback, Refinement, Extended Vocabulary Aggregator

Page 8: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

14

FREyA Workflow

• Potential Ontology Concept (POC)

• Ontology Concept (OC)

answer

answer

NL query

POCsOCs

triples

SPARQL

learn

Indentify the Answer Type

Answer Type

Page 9: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

Find Potential Ontology Concepts

CNL 2010, Marettimo, Sicily 16

Page 10: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

17

Finding Ontology Concepts

Page 11: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

18geo:City

geo:State new york

POC

POC

population

geo:cityPopulation

Mapping POC to OCs: Ambiguities

geo:State

Page 12: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

19

New York is a city

Page 13: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

20

New York is a state

Page 14: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

21

Ambiguous Lexicon

POC OC (context) candidate OC function

new york geo:State -

new york geo:City -

population geo:State geo:statePopulation -

population geo:City geo:cityPopulation -

IF THEN

Page 15: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

22

POC

POC

POC

state

areageo:stateArea

geo:State

geo:isLowestPointOf

point

The User Controls the Output

maxgeo:LoPoint

geo:loElevation

min

Page 16: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

23

TRIPLES:?firstJoker – geo:isLowestPointOf – geo:Stategeo:State – (max) geo:stateArea - ?lastJoker

SPARQL:prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>prefix xsd: <http://www.w3.org/2001/XMLSchema#>select ?firstJoker ?p0 ?c1 ?p2 ?lastJoker where { { { ?c1 ?p0 ?firstJoker} UNION { ?firstJoker ?p0 ?c1} . filter (?p0=<http://www.mooney.net/geo#isLowestPointOf>) . } ?c1 rdf:type <http://www.mooney.net/geo#State> . ?c1 ?p2 ?lastJoker . filter (?p2=<http://www.mooney.net/geo#stateArea>) . } ORDER BY DESC(xsd:double(?lastJoker))

WHAT IS THE LOWEST POINT OF THE STATE WITH THE LARGEST AREA?

Page 17: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

24

WHAT IS THE LOWEST POINT OF THE STATE WITH THE LARGEST AREA?

TRIPLES:?firstJoker – (min) geo:loElevation – geo:LoPointgeo:LoPoint - ?joker3 – geo:Stategeo:State – (max) geo:stateArea - ?lastJoker

SPARQL:prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>prefix xsd: <http://www.w3.org/2001/XMLSchema#>select ?firstJoker ?p0 ?c1 ?joker3 ?c2 ?p3 ?lastJoker where { ?c1 ?p0 ?firstJoker . filter (?p0=<http://www.moony.net/geo#loElevation>) . ?c1 rdf:type <http://www.mooney.net/geo#LoPoint> . {{ ?c2 ?joker3 ?c1 } UNION { ?c1 ?joker3 ?c2 }} ?c2 rdf:type <http://www.mooney.net/geo#State> . ?c2 ?p3 ?lastJoker . filter (?p3=<http://www.mooney.net/geo#stateArea>) . } ORDER BY ASC(xsd:double(?firstJoker)) DESC(xsd:double(?lastJoker))

the answer for both is Death Valley

Page 18: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

25

New Lexicon

POC OC (context) candidate OC function

area geo:State geo:stateArea -

largest geo:stateArea geo:stateArea max

point geo:State geo:LoPoint -

lowest geo:LoPoint geo:loElevation min

lowest geo:isLowestPointOf - -

IF THEN

Page 19: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

27

Learning

Page 20: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

ESWC 2010 28

FREyA: a Natural Language Interface to Ontologies

03 June 2010

http://gate.ac.uk/freya

Page 21: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

31

Evaluation: correctness

Mooney GeoQuery dataset, 250 questions

34 no dialog, 14 failed to be answered

Precision=recall=94.4%

Page 22: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

32

Evaluation: Learning 10-fold cross-validation , 202 Mooney GeoQuery questions that could be

correctly mapped into SPARQL and required dialog, from 0.25 to 0.48

Errors: ambiguity and sparseness

Page 23: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

Evaluation: Ranking Mean Reciprocal Rank: 0.76 (default ranking based on string similarity and

synonym detection)

Page 24: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

Learning the Correct Ranking Randomly selected 103 dialogs from 202 questions (343 dialogs)

MRR increased for 6% from 0.72 to 0.78

Page 25: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

35

Evaluation: Answer Type

45.60%

53.20%

0.01%

Answer TypeCorrect (1 dialog)Correct (no dialog)Incorrect

Page 26: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

Evaluation: Customisation

• Small empirical evaluation with 1 subject who is not familiar with ontologies and NLP

• No training, short introduction into the domain

• 17 questions asked in total; 3 were cancelled by the user during one of the dialogs• 78.57% correctly answered• 21.43% failed or incorrectly answered

Page 27: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

38

Conclusion

• Combining syntactic parsing with ontology-based lookup through user interaction can increase the precision and recall of NLIs to ontologies,

• while reducing the time for customisation by shifting it from application developers to end users.

Page 28: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

39

Next steps

• Improvement of the learning model to avoid errors due to ambiguities– point> geo:HiPoint or geo:LoPoint

• Using lexicon to improve other systems

Page 29: Natural Language Interfaces to Ontologies LarKc PhD symphosium, Beijing, 14 November 2010 Danica Damljanović University of Sheffield danica@dcs.shef.ac.uk.

More information...• D. Damljanovic, M. Agatonovic, H. Cunningham: Natural Language Interfaces

to Ontologies: Combining Syntactic Analysis and Ontology-based Lookup through the User Interaction. In Proceedings of the 7th Extended Semantic Web Conference (ESWC 2010), Springer Verlag, Heraklion, Greece, May 31-June 3, 2010. PDF

• D. Damljanovic, M. Agatonovic, H. Cunningham: Identification of the Question Focus: Combining Syntactic Analysis and Ontology-based Lookup through the User Interaction. In Proceedings of the 7th Language Resources and Evaluation Conference (LREC 2010), ELRA 2010, La Valletta, Malta, May 17-23, 2010. PDF

D. Damljanovic. Towards portable controlled natural languages for querying ontologies. In Rosner, M., Fuchs, N., eds.: Proceedings of the 2nd Workshop on Controlled Natural Language. Lecture Notes in Computer Science. Springer Berlin/Heidelberg, Marettimo Island, Sicily (September 2010)