Ontology Summit 2013: Ontology Evaluation Across the Ontology Lifecyle
Unsupervised Ontology Acquisition from plain texts : The OntoGain method
description
Transcript of Unsupervised Ontology Acquisition from plain texts : The OntoGain method
![Page 1: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/1.jpg)
Unsupervised Ontology Acquisition from plain texts: The OntoGain method
Efthymios DrymonasKalliopi ZervanouEuripides G.M. PetrakisIntelligent Systems Laboratoryhttp://www.intelligence.tuc.gr
Technical University of Crete (TUC), Chania, Greece
![Page 2: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/2.jpg)
OntoGain A platform for unsupervised ontology
acquisition from text Application independent Ontology of multi-word term concepts Adjusts existing methods for taxonomy &
relation acquisition to handle multi-word concepts
Outputs ontology in OWL Good results on Medical, Computer science
corpora
2
![Page 3: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/3.jpg)
Why multi-word term concepts? Majority of terminological expressions Convey classificatory information,
expressed as modifiers e.g. “carotid artery disease” denotes a type
of “artery disease” which is a type of “disease”
Leads to more expressive and compact ontology lexicon
3
![Page 4: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/4.jpg)
Ontology Learning Steps Concept Extraction
C/NC-value Taxonomy Induction
Clustering, Formal Concept Analysis Non-taxonomic Relations
Association Rules, Probabilistic algorithm
4
![Page 5: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/5.jpg)
5
The C/NC-Value method [Frantzi et.al. , 2000] Identifies multi-word term phrases
denoting domain concepts Noun phrases are extracted first ((adj | noun)+ | ((adj | noun) * (noun prep)?)
(adj | noun) *) noun C-Value: Term validity criterion, relying
on the hypothesis that multi-word terms tend to consist of other terms
NC-Value: Uses context information (valid terms tend to appear in specific context and co-occur with other terms)
![Page 6: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/6.jpg)
C-Value: Statistical Part For candidate term a
f(a): Total frequency of occurrence f(b): Frequency of a as part of longer termsP(Ta): number of these longer terms|a|: The length of the candidate string
otherwisebf
TPafa
nestednotaafaavalueC
aTba
,))()(
1)((||log
:),(||log)(
2
2
Concept Extraction
![Page 7: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/7.jpg)
C/NC-Value sample resultsoutput term c-nc value
web page 1740.11
information retrieval 1274.14
search engine 1103.99
machine learning 727.70
computer science 723.82
experimental result 655.125
text mining 645.57
natural language processing 582.83
world wide web 557.33
large number 530.67
artificial intelligence 515.73
relevant document 468.22
similarity measure 464.64
information extraction 443.29
knowledge discovery 435.79
7
![Page 8: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/8.jpg)
Ontology Learning Steps
Preprocessing Concept ExtractionTaxonomy Induction Non-taxonomic Relations
8
![Page 9: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/9.jpg)
Taxonomy InductionAims at organizing concepts into a
hierarchical structure where each concept is related to its respective broader and narrower terms
Two methods in OntoGainAgglomerative clustering Formal Concept Analysis (FCA)
![Page 10: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/10.jpg)
Agglomerative Clustering Proceeds bottom-up: at each step, the
most similar clusters are merged Initially each term is considered a cluster Similarity between all pairs of clusters is
computed The most similar clusters are merged as
long as they share terms with common heads
Group average for clusters, Dice like formula for terms
10
![Page 11: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/11.jpg)
Formal Concept Analysis (FCA) [Ganter et al., 1999]FCA relies on the idea that the objects
(terms) are associated with their attributes (verbs)
Finds common attributes (verbs) between objects and forms object clusters that share common attributes
Formal concepts are connected with the sub-concept relationship
)(),(),( 21212211 AAOOAOAO
![Page 12: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/12.jpg)
FCA ExampleTakes as input a matrix showing
associations between terms (concepts) and attributes (verbs)
submit test describe print compute search
Html form * * *Hierarchical clustering
* *
Text retrieval *
Root node * * * *Single cluster * * *
Web page * *
![Page 13: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/13.jpg)
FCA Taxonomy
13
Formal concepts ({hierarchical
clustering, root node, single cluster}, {compute, search})
({html form, web page}, {print, search})
Not all dependencies c,v are interesting
tvfvcfvcP )(),()|(
![Page 14: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/14.jpg)
Non-Taxonomic Relations extraction phase
14
Concept Extraction Taxonomy InductionNon-Taxonomic Relations
![Page 15: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/15.jpg)
Non-Taxonomic RelationsConcepts are also characterized by
attributes and relations to other concepts in the hierarchy
Typically expressed by a verb relating pair of concepts
Two approaches Associations rules Probabilistic
![Page 16: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/16.jpg)
Association Rules [Aggrawal et.al., 1993]Introduced to predict the purchase
behavior of customersExtract terms connected with some
relation subject-verb-objectEnhance with general terms from the
taxonomyEliminate redundant relations:
predictive accuracy < t
![Page 17: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/17.jpg)
Association Rules: ExampleDomain Range Label
chiasmal syndrome pituitary disproportion cause bymedial collateral ligament surgical treatment need
blood transfusion antibiotic prophylaxis resultlipid peroxidation cardiopulmonary bypass lead to
prostate specific antigen prostatectomy followchronic fatigue syndrome cardiac function yieldright ventricular infraction radionuclide ventriculography analyze by
creatinine clearance arteriovenous hemofiltration achievecardioplegic solution superoxide dismutase give
bacterial translocation antibiotic prophylaxis decreaseaccurate diagnosis clinical suspicion depend
ultrasound examination clinical suspicion givetotal body oxygen consumption epidural analgesia attenuate by
coronary arteriography physician perform by
17
![Page 18: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/18.jpg)
Probabilistic approach [Cimiano et.al. 2006] Collect verbal relations from the corpus Find the most general relation wrt verb
using frequency of occurrence Suffer_from(man, head_ache)Suffer_from(woman, stomach_ache)Suffer_from(patient,ache)
Select relationships satisfying a conditional probability measureAssociations > t become accepted
18
![Page 19: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/19.jpg)
Evaluation Relevance judgments are provided by
humans Precision - Recall We examined the 200 top-ranked
concepts and their respective relations in 500 lines
Results from OhsuMed & Computer Science corpus
19
![Page 20: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/20.jpg)
Results
20
Processing Layer Method
Precision –
OhsuMed
Recall -
OhsuMed
Precision –
Comp. Science
Recall –
Comp. Science
Concept Extraction C/NC-Value 89.7% 91.4% 86.7% 89.6%
Taxonomic Relations
Formal Concept Analysis
47.1% 41.6% 44.2% 48.6%
Hierarchical Clustering 71.2% 67.3% 71.3% 62.7%
Non-Taxonomic Relations
Association Rules 71.8% 67.7% 72.8% 61.7%
Probabilistic 62.7% 55.9% 61.6% 49.4%
![Page 21: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/21.jpg)
Comparison with Text2Onto [Cimiano & Volker, 2005]
21
Huge lists of plain single word terms, and relations lacking of semantic meaning
Text2Onto cannot work with big texts Cannot export results in OWL
![Page 22: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/22.jpg)
Conclusions OntoGain
Multi-word term concepts Exports ontology in OWL Domain independent
Results C/NC-Value yields good results Clustering outperforms FCA Association Rules perform better than
Verbal Expressions
22
![Page 23: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/23.jpg)
Future Work Explore more methods / combinations
e.g., clustering, FCA Hearst patterns for discovering additional
relation types (Part-of) Discover attributes and cardinality
constraints Incorporate term similarity information
from WordNet, MeSH Resolve term ambiguities
23
![Page 24: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/24.jpg)
Thank you!
Questions ?
24
![Page 25: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/25.jpg)
PreprocessingTokenization, POS tagging, Shallow
parsing (OpenNLP suite)Lemmatization (WordNet Java LibraryApply to all steps of OntoGainShallow parsing is used in relations
acquisition for the detection of verbal dependencies
![Page 26: Unsupervised Ontology Acquisition from plain texts : The OntoGain method](https://reader035.fdocuments.net/reader035/viewer/2022062410/568165f9550346895dd9246a/html5/thumbnails/26.jpg)
26
Terms sharing a head tend to be similar e.g. hierarchical method and agglomerative
method are both methods Nested terms are related to each other
e.g. agglomerative clustering method and clustering method should be associated)