Controlled Language for Ontology Editing Adam Funk, Valentin Tablan, Kalina Bontcheva, Hamish...

Controlled Language for Ontology Editing

Adam Funk, Valentin Tablan, Kalina Bontcheva, Hamish Cunningham,

Brian Davis, Siegfried Handschuh

2

University of Sheffield NLP

Purpose

• To provide a controlled language for basic ontology-editing (and later, querying) functions: easy to learn from examples and simple rules relatively easy to deploy (Java, GATE) unambiguous compact (e.g., create many classes or

instances with one sentence) natural but grammatically lax

3


Implementation

• Developed and tested in the GATE GUI, but deployable as a service

• GATE application using text as input to modify an ontology

• Based partly on standard NLP components and modified IE components, with manipulation of the GATE ontology API

4


Implementation

5


Syntax

• Quoted chunks: words in pairs of single or double quotes

• Keyphrases: identified and tagged by the gazetteer (is, are: Copula; is a, InstanceOf; forget, Negate)

• Prepositions and determiners: POS-tagged• Chunks: everything else• ChunkLists: one or more chunks separated

by and or commas

6


Syntax and semantics

• 10 syntactic rules• Some have up to three semantic rules;

CLOnE refers to the ontology to select one deterministically

• Create and delete classes, subclass relations and instances

• Create and instantiate datatype and object properties

7



• Rule: ChunkList0 InstanceOf Chunk1“.”

• Example: Alice Jones and Bob Smith are persons.

• Semantics: If Chunk1 names a class, create instances of it. Otherwise return an error message.

8



• Rule: ChunkList0 Copula Chunk Prep ChunkList1 “.”

• Examples: Persons are authors of documents. Carl Pollard and Ivan Sag are authors of 'Head-Driven Phrase-Structure Grammar'.

• Flexible semantics: Create a property between two classes. Instantiate a suitable property between two instances. Return an error message (mixed classes and

instances, or a chunk that can't be dereferenced).

9



• Rule: Negate ChunkList “.”

• Example: Forget projects, journals and 'Department of Computer Science'.

• Semantics: Delete each class or instance in the list.

10


Evaluation

• Pre-test questionnaire to let users rate their own knowledge of ontologies and CLs

• Short manual on ontologies and both tools• Two progressive lists of 6 simple tasks, A & B

CLOnE task list A -> Protégé B or Protégé A then CLOnE B

• SUS and SUS-based questionnaires

11


Evaluation

• “Repeated-measures, task-based” evaluation of CLOnE in comparison with Protégé

• Sample size = 15 (sufficient for SUS)• Evenly split by task-tool association and tool

order

12


Evaluation

• 95% confidence intervals of SUS scores (SUS baseline is 65 to 70%)

13


Evaluation: correlations

14



• Pre-test score has no correlation with task times or SUS results.

• Correlations between C/P, CLOnE SUS and Protégé SUS show coherence of the set of questionnaires.

15



• Task times for both tools are moderately correlated with each other, but not with SUS values. Both tools are technically suitable for both

tasks. We do not claim that CLOnE is faster for

simple tasks, just that users prefer it.

16


Evaluation: sample quality

• Sample is sufficient for SUS evaluation• Sample quality according to task-tool

association, tool order, and subject type?

17


Evaluation: sample quality

• SUS values for both tools were slightly lower for task list B: waning interest as the evaluation progressed

• Similar task times for A & B: similar effort required (in any case, the task-tool association was almost evenly split)

• Consistent SUS and C/P values between groups G and NG

18


Continuing work

• Bugfixes, technical improvements• Better error messages• Support for distinct string, date and numeric

datatypes• Development of CLOnE-QL query language• Implementation of a web-service for

question-answering from an ontology

19


Acknowledgements

• KnowledgeWeb (EU Network of Excellence IST-2004-507482)

• TAO (EU FP6 project IST-2004-026460)• SEKT (EU FP6 project IST IP-2003-506826• Líon (Science Foundation Ireland project

SFI/02/CE1/1131)• NEPOMUK (EU project FP6-027705)

20


21


Evaluation summary

22


Questionnaire CIs

A data sample’s 95% confidence interval is a range 95% likely to contain the mean of the whole population that the sample represents.

23


Correlation coefficients

24



• +1 = perfect correlation equivalent to a straight ascending line on a

scatter plot

• +0.7 = strong correlation• 0 = no correlation

random scatter plot)

• -0.7 = strong negative correlation• -1 = perfect negative correlation

25



• Pearson's formula assumes that the two variables are linearly meaningful; especially suitable for physical measurements

• Spearman's formula assumes only that they are ordinally meaningful (ranking); suitable for subjective measures such as many in social sciences

26


Sample quality

27


Sample quality

28


Sample quality

29


Sample quality

30


Subsequent improvements

• Better handling of punctuation inside quoted chunks

• A catch-all syntactic rule that produces an error message for unparseable sentences

• Support for different datatypes: string, date, numeric

• Better unit-testing• Embedded in web-service

31


Subsequent improvements

• Use the features of the new GATE ontology API for more efficient dereferencing of names and RDF-friendly handling of synonyms

• Web-application using CLOnE-QL for question answering

• Better documentation of the input language

Controlled Language for Ontology Editing Adam Funk, Valentin Tablan, Kalina Bontcheva, Hamish...

Documents

Transcript of Controlled Language for Ontology Editing Adam Funk, Valentin Tablan, Kalina Bontcheva, Hamish...