A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov,...
-
date post
20-Dec-2015 -
Category
Documents
-
view
219 -
download
1
Transcript of A System for A Semi-Automatic Ontology Annotation Kiril Simov, Petya Osenova, Alexander Simov,...
A System for A Semi-Automatic A System for A Semi-Automatic Ontology AnnotationOntology Annotation
Kiril Simov, Petya Osenova, Alexander Kiril Simov, Petya Osenova, Alexander Simov, Anelia Tincheva, Borislav KirilovSimov, Anelia Tincheva, Borislav Kirilov
BulTreeBank GroupBulTreeBank Group
LML, IPP, BASLML, IPP, BAS
CALP 2007, RANLP, BorovetsCALP 2007, RANLP, Borovets
Outline of the TalkOutline of the TalkMotivationMotivation
Requirements to the system Requirements to the system
Parameters of semantic annotationParameters of semantic annotation– General overviewGeneral overview– Problematic issuesProblematic issues
CLaRK SystemCLaRK System– Basic architecture in briefBasic architecture in brief– The new functionalitiesThe new functionalities
Conclusions Conclusions
Motivation (1)Motivation (1)
The creation of automatic systems The creation of automatic systems for semantic annotation needs:for semantic annotation needs:
– Reliably annotated corpora with Reliably annotated corpora with semantic information = gold standard semantic information = gold standard datadata
Motivation (2)Motivation (2)
The The semantic annotationsemantic annotation requires requires various types of support:various types of support:
– appropriate source of semantic appropriate source of semantic information (information (domain ontologydomain ontology))
– comprehensive annotation guidelinescomprehensive annotation guidelines– a system to support semi-automatic a system to support semi-automatic
creation of such corpora (creation of such corpora (CLaRKCLaRK))
Motivation (3)Motivation (3)
The annotation process follows the two steps:
– chunk annotationchunk annotation - identification of identification of the text segment which represents a the text segment which represents a given concept or a relation in the given concept or a relation in the texttext
– concept selectionconcept selection - a chunk might - a chunk might represent more than one concept or represent more than one concept or relation depending on the contextrelation depending on the context
Motivation (4)Motivation (4)We follow the ideas of We follow the ideas of Erdmann et al. Erdmann et al. 20002000 that the manual (or semi- that the manual (or semi-automatic) semantic annotation is a automatic) semantic annotation is a cyclic process mixing:cyclic process mixing:– the actual annotation, andthe actual annotation, and– the evolution of the ontology the evolution of the ontology
In our case we also include the In our case we also include the lexiconlexicon and the and the concept annotation grammarconcept annotation grammar in the process of the concurrent in the process of the concurrent development.development.
Support requirements to the system Support requirements to the system (1)(1)
Search for a text segment:Search for a text segment: helps the helps the annotator to determine the exact annotator to determine the exact segment of text which is the carrier segment of text which is the carrier of the concept or relation from the of the concept or relation from the ontologyontology
Concept selection:Concept selection: determines which determines which concept/relation to be added to the concept/relation to be added to the annotation of the corresponding text annotation of the corresponding text segmentsegment
Support requirements to the system Support requirements to the system (2)(2)
Ontology evolution:Ontology evolution: updates the updates the ontology in following cases:ontology in following cases:– new concept/relation is necessary for the new concept/relation is necessary for the
annotation of a text segmentannotation of a text segment– an existing concept needs to be changed an existing concept needs to be changed
in order to be more precisein order to be more precise
Lexicon/grammar evolution:Lexicon/grammar evolution: updates updates them when:them when:– there are changes in ontologythere are changes in ontology– there are new expressions for already there are new expressions for already
existing concepts/relationsexisting concepts/relations
Support requirements to the system Support requirements to the system (3)(3)
Annotation evolution:Annotation evolution: after changes after changes in the ontology and/or the in the ontology and/or the lexicon/grammar it is necessary to lexicon/grammar it is necessary to update the previously done update the previously done annotationsannotations
In the implementation of these In the implementation of these functionalities we follow the functionalities we follow the requirements for a semantic requirements for a semantic annotation system as they are stated annotation system as they are stated in in Uren et al. 2006Uren et al. 2006
Parameters of semantic annotationParameters of semantic annotation
The ideal prerequisite for semantic The ideal prerequisite for semantic annotation is the interaction among annotation is the interaction among the following three components:the following three components:
Domainontology
Lexicons
Grammars
concepts
terms
link of terms to concept
s
domaintexts
Domain ontologies (3)Domain ontologies (3)We use English as lingua franca (as usual)We use English as lingua franca (as usual)
HOWEVER:HOWEVER:
We rely on the meanings of the conceptsWe rely on the meanings of the concepts
We aim at reconciling the discrepancy between We aim at reconciling the discrepancy between knowledge conceptualization and language knowledge conceptualization and language lexicalizationlexicalization
– If there is no a lexicalized term forIf there is no a lexicalized term for a a concept, concept, thenthen
one of the terms is selected as a nameone of the terms is selected as a name
((ASCIIASCII vs. vs. ASCII code tableASCII code table), or), or
a concept name is constructed as a phrasea concept name is constructed as a phrase
((BarWithButtonsBarWithButtons vs. vs. ToolbarToolbar))
Terminological lexicons (1)Terminological lexicons (1)Lists of the main keywords in a certain Lists of the main keywords in a certain domain domain Free expressions are also allowedFree expressions are also allowed
Example: Example: AlphanumericDisplay [a display that gives the information in the form of characters (numbers or letters)]
In Bulgarian: 9 spelling and lexical variants
буквеноцифров дисплей, буквено-цифров дисплей, символен дисплей, буквеноцифров монитор, буквено-цифров монитор, символен монитор, буквеноцифров екран, буквено-цифров екран, символен екран
Terminological lexicons (2)Terminological lexicons (2)Generalized structure of the LexiconGeneralized structure of the Lexicon(1)(1) a a representativerepresentative term which constitutes the term which constitutes the
meaning for all the term wordings within meaning for all the term wordings within the entry. This term usually ensures the the entry. This term usually ensures the mapping to the relevant concept mapping to the relevant concept
(2)(2) explanation of the concept meaning in explanation of the concept meaning in lingua franca (usually it is English, but in lingua franca (usually it is English, but in fact fact it might be any natural languageit might be any natural language););
(3)(3) a set of terms in a given language that a set of terms in a given language that have the meaning expressed by the leading have the meaning expressed by the leading termterm
GrammarsGrammars
Two interconnected steps:Two interconnected steps:
(1)(1) concept annotation step (by cascaded concept annotation step (by cascaded regular grammars in CLaRK)regular grammars in CLaRK)
(2)(2) disambiguation step (by constraint disambiguation step (by constraint facilities in CLaRK)facilities in CLaRK)
The quality of the grammar predefines The quality of the grammar predefines the coverage and precision of the the coverage and precision of the annotation, and hence – the efficiency annotation, and hence – the efficiency of the searchof the search
Interaction among modulesInteraction among modules
Ontology LexicalizedTerms
Free Phrases
Grammars
Domain Text
Problematic issues wrt SAProblematic issues wrt SA
Disambiguation is needed of Disambiguation is needed of ambiguous casesambiguous cases(LINK as (LINK as ConnectionConnection and and HyperlinkHyperlink))
Due to the problems of coverage and Due to the problems of coverage and precision of the ontology the precision of the ontology the following operations are also needed:following operations are also needed:– addition, extension, deletion of concepts addition, extension, deletion of concepts
or their correctionor their correction
CLaRK: architecture and toolsCLaRK: architecture and tools
CLaRK
XML
Regular grammars
Constraints
Editingoperations
Extraction
SortStatistics
XPath Engine
Macro Language
http://www.bultreebank.org/clark/index.html
The CLaRK System: previous work The CLaRK System: previous work flow architectureflow architecture
Tool preparation phaseTool preparation phase– Writing grammarsWriting grammars– Writing constraints, etcWriting constraints, etc
Document ProcessingDocument Processing– Application of grammars, constraintsApplication of grammars, constraints– User input – selection of constraint User input – selection of constraint
options, selection of grammar options, selection of grammar applicationapplication
Revision of the toolsRevision of the tools
The CLaRK System: new work flow The CLaRK System: new work flow architecturearchitecture
Tool preparation phaseTool preparation phase– Writing grammarsWriting grammars– Writing constraints, etcWriting constraints, etc
Document ProcessingDocument Processing– Application of grammars, constraintsApplication of grammars, constraints– User input – selection of constraint User input – selection of constraint
options, selection of grammar options, selection of grammar applicationapplication
– Processing-time revision of the toolsProcessing-time revision of the tools
Revision of the toolsRevision of the tools
ConclusionsConclusionsWe presented an architecture for the We presented an architecture for the semantic annotation of XML documents in semantic annotation of XML documents in a domain from both sides of view - a domain from both sides of view - linguistic adequacy and implementationlinguistic adequacy and implementation
The process of semantic annotation The process of semantic annotation interleaves with ontology / lexicon / interleaves with ontology / lexicon / grammar evolutiongrammar evolution
This way of combining the three tasks This way of combining the three tasks allows the annotation process also to allows the annotation process also to develop from almost completely manual develop from almost completely manual work towards an effective semi-automatic work towards an effective semi-automatic support modulesupport module
Thank you!Thank you!
Ever moving CLaRK Functionalities
User running for better tools