Ontology-Aware Information Extraction gate.ac.uk/ Hamish Cunningham, Kalina Bontcheva
Controlled Language for Ontology Editing Adam Funk, Valentin Tablan, Kalina Bontcheva, Hamish...
-
Upload
kathleen-daniel -
Category
Documents
-
view
225 -
download
0
Transcript of Controlled Language for Ontology Editing Adam Funk, Valentin Tablan, Kalina Bontcheva, Hamish...
Controlled Language for Ontology Editing
Adam Funk, Valentin Tablan, Kalina Bontcheva, Hamish Cunningham,
Brian Davis, Siegfried Handschuh
2
University of Sheffield NLP
Purpose
• To provide a controlled language for basic ontology-editing (and later, querying) functions: easy to learn from examples and simple rules relatively easy to deploy (Java, GATE) unambiguous compact (e.g., create many classes or
instances with one sentence) natural but grammatically lax
3
University of Sheffield NLP
Implementation
• Developed and tested in the GATE GUI, but deployable as a service
• GATE application using text as input to modify an ontology
• Based partly on standard NLP components and modified IE components, with manipulation of the GATE ontology API
4
University of Sheffield NLP
Implementation
5
University of Sheffield NLP
Syntax
• Quoted chunks: words in pairs of single or double quotes
• Keyphrases: identified and tagged by the gazetteer (is, are: Copula; is a, InstanceOf; forget, Negate)
• Prepositions and determiners: POS-tagged• Chunks: everything else• ChunkLists: one or more chunks separated
by and or commas
6
University of Sheffield NLP
Syntax and semantics
• 10 syntactic rules• Some have up to three semantic rules;
CLOnE refers to the ontology to select one deterministically
• Create and delete classes, subclass relations and instances
• Create and instantiate datatype and object properties
7
University of Sheffield NLP
Syntax and semantics
• Rule: ChunkList0 InstanceOf Chunk1“.”
• Example: Alice Jones and Bob Smith are persons.
• Semantics: If Chunk1 names a class, create instances of it. Otherwise return an error message.
8
University of Sheffield NLP
Syntax and semantics
• Rule: ChunkList0 Copula Chunk Prep ChunkList1 “.”
• Examples: Persons are authors of documents. Carl Pollard and Ivan Sag are authors of 'Head-Driven Phrase-Structure Grammar'.
• Flexible semantics: Create a property between two classes. Instantiate a suitable property between two instances. Return an error message (mixed classes and
instances, or a chunk that can't be dereferenced).
9
University of Sheffield NLP
Syntax and semantics
• Rule: Negate ChunkList “.”
• Example: Forget projects, journals and 'Department of Computer Science'.
• Semantics: Delete each class or instance in the list.
10
University of Sheffield NLP
Evaluation
• Pre-test questionnaire to let users rate their own knowledge of ontologies and CLs
• Short manual on ontologies and both tools• Two progressive lists of 6 simple tasks, A & B
CLOnE task list A -> Protégé B or Protégé A then CLOnE B
• SUS and SUS-based questionnaires
11
University of Sheffield NLP
Evaluation
• “Repeated-measures, task-based” evaluation of CLOnE in comparison with Protégé
• Sample size = 15 (sufficient for SUS)• Evenly split by task-tool association and tool
order
12
University of Sheffield NLP
Evaluation
• 95% confidence intervals of SUS scores (SUS baseline is 65 to 70%)
13
University of Sheffield NLP
Evaluation: correlations
14
University of Sheffield NLP
Evaluation: correlations
• Pre-test score has no correlation with task times or SUS results.
• Correlations between C/P, CLOnE SUS and Protégé SUS show coherence of the set of questionnaires.
15
University of Sheffield NLP
Evaluation: correlations
• Task times for both tools are moderately correlated with each other, but not with SUS values. Both tools are technically suitable for both
tasks. We do not claim that CLOnE is faster for
simple tasks, just that users prefer it.
16
University of Sheffield NLP
Evaluation: sample quality
• Sample is sufficient for SUS evaluation• Sample quality according to task-tool
association, tool order, and subject type?
17
University of Sheffield NLP
Evaluation: sample quality
• SUS values for both tools were slightly lower for task list B: waning interest as the evaluation progressed
• Similar task times for A & B: similar effort required (in any case, the task-tool association was almost evenly split)
• Consistent SUS and C/P values between groups G and NG
18
University of Sheffield NLP
Continuing work
• Bugfixes, technical improvements• Better error messages• Support for distinct string, date and numeric
datatypes• Development of CLOnE-QL query language• Implementation of a web-service for
question-answering from an ontology
19
University of Sheffield NLP
Acknowledgements
• KnowledgeWeb (EU Network of Excellence IST-2004-507482)
• TAO (EU FP6 project IST-2004-026460)• SEKT (EU FP6 project IST IP-2003-506826• Líon (Science Foundation Ireland project
SFI/02/CE1/1131)• NEPOMUK (EU project FP6-027705)
20
University of Sheffield NLP
21
University of Sheffield NLP
Evaluation summary
22
University of Sheffield NLP
Questionnaire CIs
A data sample’s 95% confidence interval is a range 95% likely to contain the mean of the whole population that the sample represents.
23
University of Sheffield NLP
Correlation coefficients
24
University of Sheffield NLP
Correlation coefficients
• +1 = perfect correlation equivalent to a straight ascending line on a
scatter plot
• +0.7 = strong correlation• 0 = no correlation
random scatter plot)
• -0.7 = strong negative correlation• -1 = perfect negative correlation
25
University of Sheffield NLP
Correlation coefficients
• Pearson's formula assumes that the two variables are linearly meaningful; especially suitable for physical measurements
• Spearman's formula assumes only that they are ordinally meaningful (ranking); suitable for subjective measures such as many in social sciences
26
University of Sheffield NLP
Sample quality
27
University of Sheffield NLP
Sample quality
28
University of Sheffield NLP
Sample quality
29
University of Sheffield NLP
Sample quality
30
University of Sheffield NLP
Subsequent improvements
• Better handling of punctuation inside quoted chunks
• A catch-all syntactic rule that produces an error message for unparseable sentences
• Support for different datatypes: string, date, numeric
• Better unit-testing• Embedded in web-service
31
University of Sheffield NLP
Subsequent improvements
• Use the features of the new GATE ontology API for more efficient dereferencing of names and RDF-friendly handling of synonyms
• Web-application using CLOnE-QL for question answering
• Better documentation of the input language