Ontologies for life sciences: examples from the gene ontology
Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain...
-
Upload
maximillian-knight -
Category
Documents
-
view
224 -
download
1
Transcript of Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain...
![Page 1: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/1.jpg)
Ontology AlignmentOntology Alignment
![Page 2: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/2.jpg)
Problem StatementProblem Statement
Given N Ontologies (O1 ,…, On)◦In a Particular Domain ◦Different Level of Coverage
Goal◦Evaluate Commonality of Entities◦Rank Entities
![Page 3: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/3.jpg)
Challenges & SolutionsChallenges & SolutionsOntology Alignments
◦Largest Common Subgraph (LCS)◦Vector Space Model (TF/ IDF)
Accuracy of Entities in Aligned Concepts◦Ranking Entities
![Page 4: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/4.jpg)
LCS Algorithm for Multiple LCS Algorithm for Multiple OntologiesOntologies
Find the LCS for two
Ontologies
Align LCS with other
Ontologies
![Page 5: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/5.jpg)
Largest Common Subgraph Largest Common Subgraph (LCS) Algorithm between two (LCS) Algorithm between two OntologiesOntologies
![Page 6: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/6.jpg)
Data Structure for LCS Data Structure for LCS Algorithm Algorithm
C1
C2
C3
C4
C5
C6C
7
C’1
C’2
C’3
C’4
C’5
C’6
Similarity Measure for Corresponding EntitiesNode Similarity + Structural Similarity
C1(C1,C’1, .95),(C1,C’6,.77),(C1,C’3,.71),(C1,C’4,.65),(C1,C’5,.54),(C1,C’2,.34)
C2(C2,C’3, .85),(C2,C’2,.67),(C2,C’1,.51),(C2,C’4,.45),(C2,C’5,.24),(C2,C’6,.14)
C3(C3,C’4, .90),(C3,C’1,.67),(C3,C’3,.51),(C3,C’2,.45),(C3,C’5,.34),(C3,C’6,.24)
C4(C4,C’2, .95),(C4,C’1,.65),(C4,C’3,.51),(C4,C’4,.45),(C4,C’5,.23),(C4,C’6,.14)
C5(C5,C’4, .80),(C5,C’1,.67),(C5,C’3,.65),(C5,C’2,.35),(C5,C’5,.34),(C5,C’6,.24)
C6(C6,C’1, .20),(C6,C’1,.15),(C6,C’3,.12),(C6,C’2,.12),(C6,C’5,.09),(C6,C’6,.08)
C7(C7,C’4, .31),(C7,C’1,.25),(C7,C’3,.23),(C7,C’2,.15),(C7,C’5,.14),(C7,C’6,.12)
![Page 7: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/7.jpg)
Node Similarity: Instance-based Node Similarity: Instance-based Representing types using N-grams*Representing types using N-grams*
Node Similarity (Name-Match)◦Find Common N-gram (N = 2) for
corresponding columns
StrName FENAME Status
LOCUST-GROVE DR
LOCUST GROVE
BUILT
LOUISE LN LOUISE BUILT
Street Laddress
Raddress
TRAIL RANGE DR
1600 1798
CR45/MANET CT
2500 2598
CA
N-gram types from A.StrName = {LO, OC, CU,ST,…..}
N-gram types from B.Street = {TR, RA, R4, 5/,…..}
CB
*Jeffrey Partyka, Neda Alipanah, Latifur Khan, Bhavani Thuraisingham & Shashi Shekhar, “Content Based Ontology Matching for GIS Datasets“, ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2008), Page: 407-410, Irvine, California, USA, November 2008.
![Page 8: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/8.jpg)
Node Similarity: Instance-Node Similarity: Instance-basedbasedVisualizing Entropy and Conditional Visualizing Entropy and Conditional EntropyEntropy
H(C) = –Σpi log pi for all x є C1 U
C2
H(C | T) = H (C,T) – H(C) for all x є C1 U C2 and t є T
![Page 9: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/9.jpg)
Node Similarity: Faults of Node Similarity: Faults of this Methodthis Method• Semantically similar columns are not
guaranteed to have a high similarity score City Countr
y
Dallas USA
Houston USA
Kingston Jamaica
Halifax Canada
Mexico City
Mexico
ctyName country
Shanghai China
Beijing China
Tokyo Japan
New Delhi India
Kuala Lumpur
Malaysia
2-grams extracted from A: {Da, al, la, as, Ho, ou, us…}
A є O1 B є O2
2-grams extracted from B: {Sh, ha, an, ng, gh, ha, ai, Be, ei, ij…}
![Page 10: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/10.jpg)
: Column 1
: Column 2
Similarity = H(C|T) / H(C)
C1 є O1 C2 є O2
Step3Step3: Calculate Similarity
Step1Step1: Extract distinct keywords from compared columns
Step2Step2: Group distinct keywords together into semantic clusters
Keywords extracted from columns = {Johnson, Rd., School, 15th,…}
“Rd.”,”Dr.”,”St.”,”Pwy”,…“Johnson”,”School”,”Dr.”….
C1 C2
C1 U C2
roadName City
Johnson Rd. Plano
School Dr. Richardson
Zeppelin St. Lakehurst
Road County
Custer Pwy Collin
15th St. Collin
Parker Rd. Collin
Node Similarity: Instance-Node Similarity: Instance-basedbasedK-medoid + NGD instance similarityK-medoid + NGD instance similarity
![Page 11: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/11.jpg)
Node Similarity: Instance-Node Similarity: Instance-basedbased Problems with K-medoid + NGD*Problems with K-medoid + NGD*It is possible that two different geographic entities (ie: Dallas,
TX and Dallas County) in the same location will have a very low computed NGD value, and thus, be mistaken for being similar:
roadName City
Johnson Rd. Plano
School Dr. Richardson
Zeppelin St. Lakehurst
Alma Dr. Richardson
Preston Rd. Addison
Dallas Pkwy Dallas
Road County
Custer Pwy Cooke
15th St. Collin
Parker Rd. Collin
Alma Dr. Collin
Campbell Rd. Denton
Harry Hines Blvd.
Dallas
*Jeffrey Partyka, Latifur Khan, Bhavani Thuraisingham, “Semantic Schema Matching Without Shared Instances,” to appear in Third IEEE International Conference on Semantic Computing, Berkeley, CA, USA - September 14-16, 2009.
![Page 12: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/12.jpg)
NodeNode Similarity: Instance-basedSimilarity: Instance-basedUsing geographic type information*Using geographic type information*We use a gazetteer to determine the geographic type of an instance:
O1 O2Geotypes
*Jeffrey Partyka, Latifur Khan, Bhavani Thuraisingham, “Geographically-Typed Semantic Schema Matching,” to appear in ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2009), Seattle, Washington, USA, November 2009.
![Page 13: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/13.jpg)
Node Similarity: Instance-basedNode Similarity: Instance-basedResults of Geographic Matching Over 2 Results of Geographic Matching Over 2 Separate Road Network Data SourcesSeparate Road Network Data Sources
![Page 14: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/14.jpg)
Structural Similarity Structural Similarity
◦ Structural Similarity MeasurementI. Neighbor Similarity
C’1
C’3
C’4C’5
C1
C2
C3
C5
C6
![Page 15: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/15.jpg)
Structural Similarity Structural Similarity
Structural Similarity MeasurementI. Properties Similarity
C1
C2
C3
C4
C5
C6
C7
C’1
C’2
C’3
C’4 C’
5
C’6
isA
isA
isA
subClass
hasFlavor
hasColor
subClass
isA
hasFlavorhasFlavo
r
hasFood
hasDrink
subclass
RTC1 = [3isA, 2subClass,1hasFlavor,1hasColor, 0 hasFood,1 hasTopping] RTC2 = [1isA, 1subClass,2hasFlavor,0hasColor,1hasFood]
hasTopping
![Page 16: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/16.jpg)
SimilaritySimilarityResults of Pairwise Ontology Results of Pairwise Ontology Matching(I3CON Matching(I3CON Benchmark)Benchmark)
Matching using Name Similarity + RTS
Matching usingName Similarity + (RTS and Neighbor)
![Page 17: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/17.jpg)
Ontology MatchingOntology MatchingVector Space Model (VSM)Vector Space Model (VSM)
Define the VSM for Each Entity• Collection of Words in label, edge types,
comment and neighbors.
C1
C2
C3
C4
C5
C6
C7
C’1
C’2
C’3
C’4 C’
5
C’6
isA
isAisA
subClass
hasFlavor
hasColor
subClass
isA
hasFlavorhasFlavo
r
hasFood
hasDrink
subclass
VSM(C1)= [1C1,1C2,1C3,1C5,1C6,1isA, 2subClass,1hasFlavor]
VSM(C’1)= [1C’3, C’4,1C’5, 1isA, 2hasFlavor]
hasTopping
![Page 18: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/18.jpg)
Ontology MatchingOntology MatchingVector Space Model (VSM)Vector Space Model (VSM)• Update VSM by Word Score Using TF/IDF
• Calculate Cosine Similarity for
corresponding entities
Cos(VSM(C1) , VSM(C2) )
![Page 19: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/19.jpg)
Aligned ConceptsAligned Concepts• Aggregate different
ontologies• Example
![Page 20: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/20.jpg)
Aligned ConceptsAligned Concepts• Statistical Model
Global Ontology
Entity1
O1: Person
Entity2
O1:hasFather
O1:hasMaleParent
O2:hasFather
Entity3 Entity4
O1: hasMon
O2: Person
O1:hasMother
O1:hasFemaleParent
O1:Harry
α1 α2 α3 α4
β1
Β2
Β3
β5
β4
β6
β7
β9
Β10=1
O2:hasMother
β8
![Page 21: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/21.jpg)
Aligned ConceptsAligned Concepts• Calculate the probabilities of
appearance of each entity in GO
• Use Maximum likelihood Estimation
• Calculate and
![Page 22: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/22.jpg)
ReificationReification
Reification can be considered as a metadata about RDF/OWL statements.
Ontology Alignment approaches rely on probabilistic measures to find matches between concepts in different ontologies.
Reification data can be attached with the alignment information to show the 'match factor' between two concepts in OWL-2.
Advanced analytic algorithms can benefit from reification in establishing the relevance of search results.
![Page 23: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/23.jpg)
OWL - 2OWL - 2 OWL – 2 is an extension to OWL. Some of
the new features in OWL 2 are as follows - Syntactic sugar (eg. Disjoint union of classes) Property chains Richer datatypes, data ranges Qualified cardinality restrictions new constructs that increase expressivity simple metamodeling capabilities extended annotation capabilities Following link lists all the new features in OWL
2http://www.w3.org/TR/2009/REC-owl2-new-features-20091027/
![Page 24: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/24.jpg)
Ontology Extraction from Ontology Extraction from Text DocumentsText Documents
![Page 25: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/25.jpg)
Problem StatementProblem StatementOur solution for ontology
construction of documents◦Use hierarchical clustering algorithm to
build a hierarchy for documents Hierarchical Agglomerative Clustering (HAC) Modified Self-Organizing Tree (MSOT) Hierarchical Growing Self-Organizing Tree
(HGSOT)
◦Assign concept for each node in the hierarchy Usage of the WordNet
![Page 26: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/26.jpg)
Concept AssignmentConcept Assignment Concept Assignment to document
LVQ1: topic vector (t) is built by training with the training documents.
Clusters in LVQ are predefined. Each topic cluster is represented by a node in the output map, and the LVQ use pre-labeled data for training. Only the best match node’s vector (winning
vector) will be updated, rather than its neighbors. Vector updating rule will use following equations:
If data x and best match node c belong to the same class,
If data x and best match node c belong to the different class.
))((),()()()1( twxcittwtw iii
))((),()()()1( twxcittwtw iii
![Page 27: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/27.jpg)
Concept AssignmentConcept Assignment◦ Concept sense disambiguation
One keyword associated with more than one concept in WordNet.
Keyword “gold” has 4 senses in WordNet and keyword “copper” has five senses in WordNet.
For disambiguation of concepts we apply the same technique (i.e., cosine similarity measure) used in topic tracking.
To construct a vector for each sense we will use a short description that appears in WordNet.
![Page 28: Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain ◦ Different Level of Coverage Goal ◦ Evaluate Commonality.](https://reader030.fdocuments.net/reader030/viewer/2022020717/56649d055503460f949d85df/html5/thumbnails/28.jpg)
Concept AssignmentConcept AssignmentConcept assignment for leaf node
◦ If there are majority documents have the same concept we assign the concept to the leaf.
◦ If there is not majority we will choose a generic concept of all concept from WordNet to the leaf.
Concept assignment for non leaf node◦ If there are majority children have the same
concept we assign the concept to the internal node.
◦ If there is not majority we will choose a generic concept of all concept from WordNet to the internal node.