Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain...
-
Upload
maximillian-knight -
Category
Documents
-
view
224 -
download
1
Transcript of Ontology Alignment. Problem Statement Given N Ontologies (O 1,…, O n ) ◦ In a Particular Domain...
Ontology AlignmentOntology Alignment
Problem StatementProblem Statement
Given N Ontologies (O1 ,…, On)◦In a Particular Domain ◦Different Level of Coverage
Goal◦Evaluate Commonality of Entities◦Rank Entities
Challenges & SolutionsChallenges & SolutionsOntology Alignments
◦Largest Common Subgraph (LCS)◦Vector Space Model (TF/ IDF)
Accuracy of Entities in Aligned Concepts◦Ranking Entities
LCS Algorithm for Multiple LCS Algorithm for Multiple OntologiesOntologies
Find the LCS for two
Ontologies
Align LCS with other
Ontologies
Largest Common Subgraph Largest Common Subgraph (LCS) Algorithm between two (LCS) Algorithm between two OntologiesOntologies
Data Structure for LCS Data Structure for LCS Algorithm Algorithm
C1
C2
C3
C4
C5
C6C
7
C’1
C’2
C’3
C’4
C’5
C’6
Similarity Measure for Corresponding EntitiesNode Similarity + Structural Similarity
C1(C1,C’1, .95),(C1,C’6,.77),(C1,C’3,.71),(C1,C’4,.65),(C1,C’5,.54),(C1,C’2,.34)
C2(C2,C’3, .85),(C2,C’2,.67),(C2,C’1,.51),(C2,C’4,.45),(C2,C’5,.24),(C2,C’6,.14)
C3(C3,C’4, .90),(C3,C’1,.67),(C3,C’3,.51),(C3,C’2,.45),(C3,C’5,.34),(C3,C’6,.24)
C4(C4,C’2, .95),(C4,C’1,.65),(C4,C’3,.51),(C4,C’4,.45),(C4,C’5,.23),(C4,C’6,.14)
C5(C5,C’4, .80),(C5,C’1,.67),(C5,C’3,.65),(C5,C’2,.35),(C5,C’5,.34),(C5,C’6,.24)
C6(C6,C’1, .20),(C6,C’1,.15),(C6,C’3,.12),(C6,C’2,.12),(C6,C’5,.09),(C6,C’6,.08)
C7(C7,C’4, .31),(C7,C’1,.25),(C7,C’3,.23),(C7,C’2,.15),(C7,C’5,.14),(C7,C’6,.12)
Node Similarity: Instance-based Node Similarity: Instance-based Representing types using N-grams*Representing types using N-grams*
Node Similarity (Name-Match)◦Find Common N-gram (N = 2) for
corresponding columns
StrName FENAME Status
LOCUST-GROVE DR
LOCUST GROVE
BUILT
LOUISE LN LOUISE BUILT
Street Laddress
Raddress
TRAIL RANGE DR
1600 1798
CR45/MANET CT
2500 2598
CA
N-gram types from A.StrName = {LO, OC, CU,ST,…..}
N-gram types from B.Street = {TR, RA, R4, 5/,…..}
CB
*Jeffrey Partyka, Neda Alipanah, Latifur Khan, Bhavani Thuraisingham & Shashi Shekhar, “Content Based Ontology Matching for GIS Datasets“, ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2008), Page: 407-410, Irvine, California, USA, November 2008.
Node Similarity: Instance-Node Similarity: Instance-basedbasedVisualizing Entropy and Conditional Visualizing Entropy and Conditional EntropyEntropy
H(C) = –Σpi log pi for all x є C1 U
C2
H(C | T) = H (C,T) – H(C) for all x є C1 U C2 and t є T
Node Similarity: Faults of Node Similarity: Faults of this Methodthis Method• Semantically similar columns are not
guaranteed to have a high similarity score City Countr
y
Dallas USA
Houston USA
Kingston Jamaica
Halifax Canada
Mexico City
Mexico
ctyName country
Shanghai China
Beijing China
Tokyo Japan
New Delhi India
Kuala Lumpur
Malaysia
2-grams extracted from A: {Da, al, la, as, Ho, ou, us…}
A є O1 B є O2
2-grams extracted from B: {Sh, ha, an, ng, gh, ha, ai, Be, ei, ij…}
: Column 1
: Column 2
Similarity = H(C|T) / H(C)
C1 є O1 C2 є O2
Step3Step3: Calculate Similarity
Step1Step1: Extract distinct keywords from compared columns
Step2Step2: Group distinct keywords together into semantic clusters
Keywords extracted from columns = {Johnson, Rd., School, 15th,…}
“Rd.”,”Dr.”,”St.”,”Pwy”,…“Johnson”,”School”,”Dr.”….
C1 C2
C1 U C2
roadName City
Johnson Rd. Plano
School Dr. Richardson
Zeppelin St. Lakehurst
Road County
Custer Pwy Collin
15th St. Collin
Parker Rd. Collin
Node Similarity: Instance-Node Similarity: Instance-basedbasedK-medoid + NGD instance similarityK-medoid + NGD instance similarity
Node Similarity: Instance-Node Similarity: Instance-basedbased Problems with K-medoid + NGD*Problems with K-medoid + NGD*It is possible that two different geographic entities (ie: Dallas,
TX and Dallas County) in the same location will have a very low computed NGD value, and thus, be mistaken for being similar:
roadName City
Johnson Rd. Plano
School Dr. Richardson
Zeppelin St. Lakehurst
Alma Dr. Richardson
Preston Rd. Addison
Dallas Pkwy Dallas
Road County
Custer Pwy Cooke
15th St. Collin
Parker Rd. Collin
Alma Dr. Collin
Campbell Rd. Denton
Harry Hines Blvd.
Dallas
*Jeffrey Partyka, Latifur Khan, Bhavani Thuraisingham, “Semantic Schema Matching Without Shared Instances,” to appear in Third IEEE International Conference on Semantic Computing, Berkeley, CA, USA - September 14-16, 2009.
NodeNode Similarity: Instance-basedSimilarity: Instance-basedUsing geographic type information*Using geographic type information*We use a gazetteer to determine the geographic type of an instance:
O1 O2Geotypes
*Jeffrey Partyka, Latifur Khan, Bhavani Thuraisingham, “Geographically-Typed Semantic Schema Matching,” to appear in ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems (ACM GIS 2009), Seattle, Washington, USA, November 2009.
Node Similarity: Instance-basedNode Similarity: Instance-basedResults of Geographic Matching Over 2 Results of Geographic Matching Over 2 Separate Road Network Data SourcesSeparate Road Network Data Sources
Structural Similarity Structural Similarity
◦ Structural Similarity MeasurementI. Neighbor Similarity
C’1
C’3
C’4C’5
C1
C2
C3
C5
C6
Structural Similarity Structural Similarity
Structural Similarity MeasurementI. Properties Similarity
C1
C2
C3
C4
C5
C6
C7
C’1
C’2
C’3
C’4 C’
5
C’6
isA
isA
isA
subClass
hasFlavor
hasColor
subClass
isA
hasFlavorhasFlavo
r
hasFood
hasDrink
subclass
RTC1 = [3isA, 2subClass,1hasFlavor,1hasColor, 0 hasFood,1 hasTopping] RTC2 = [1isA, 1subClass,2hasFlavor,0hasColor,1hasFood]
hasTopping
SimilaritySimilarityResults of Pairwise Ontology Results of Pairwise Ontology Matching(I3CON Matching(I3CON Benchmark)Benchmark)
Matching using Name Similarity + RTS
Matching usingName Similarity + (RTS and Neighbor)
Ontology MatchingOntology MatchingVector Space Model (VSM)Vector Space Model (VSM)
Define the VSM for Each Entity• Collection of Words in label, edge types,
comment and neighbors.
C1
C2
C3
C4
C5
C6
C7
C’1
C’2
C’3
C’4 C’
5
C’6
isA
isAisA
subClass
hasFlavor
hasColor
subClass
isA
hasFlavorhasFlavo
r
hasFood
hasDrink
subclass
VSM(C1)= [1C1,1C2,1C3,1C5,1C6,1isA, 2subClass,1hasFlavor]
VSM(C’1)= [1C’3, C’4,1C’5, 1isA, 2hasFlavor]
hasTopping
Ontology MatchingOntology MatchingVector Space Model (VSM)Vector Space Model (VSM)• Update VSM by Word Score Using TF/IDF
• Calculate Cosine Similarity for
corresponding entities
Cos(VSM(C1) , VSM(C2) )
Aligned ConceptsAligned Concepts• Aggregate different
ontologies• Example
Aligned ConceptsAligned Concepts• Statistical Model
Global Ontology
Entity1
O1: Person
Entity2
O1:hasFather
O1:hasMaleParent
O2:hasFather
Entity3 Entity4
O1: hasMon
O2: Person
O1:hasMother
O1:hasFemaleParent
O1:Harry
α1 α2 α3 α4
β1
Β2
Β3
β5
β4
β6
β7
β9
Β10=1
O2:hasMother
β8
Aligned ConceptsAligned Concepts• Calculate the probabilities of
appearance of each entity in GO
• Use Maximum likelihood Estimation
• Calculate and
ReificationReification
Reification can be considered as a metadata about RDF/OWL statements.
Ontology Alignment approaches rely on probabilistic measures to find matches between concepts in different ontologies.
Reification data can be attached with the alignment information to show the 'match factor' between two concepts in OWL-2.
Advanced analytic algorithms can benefit from reification in establishing the relevance of search results.
OWL - 2OWL - 2 OWL – 2 is an extension to OWL. Some of
the new features in OWL 2 are as follows - Syntactic sugar (eg. Disjoint union of classes) Property chains Richer datatypes, data ranges Qualified cardinality restrictions new constructs that increase expressivity simple metamodeling capabilities extended annotation capabilities Following link lists all the new features in OWL
2http://www.w3.org/TR/2009/REC-owl2-new-features-20091027/
Ontology Extraction from Ontology Extraction from Text DocumentsText Documents
Problem StatementProblem StatementOur solution for ontology
construction of documents◦Use hierarchical clustering algorithm to
build a hierarchy for documents Hierarchical Agglomerative Clustering (HAC) Modified Self-Organizing Tree (MSOT) Hierarchical Growing Self-Organizing Tree
(HGSOT)
◦Assign concept for each node in the hierarchy Usage of the WordNet
Concept AssignmentConcept Assignment Concept Assignment to document
LVQ1: topic vector (t) is built by training with the training documents.
Clusters in LVQ are predefined. Each topic cluster is represented by a node in the output map, and the LVQ use pre-labeled data for training. Only the best match node’s vector (winning
vector) will be updated, rather than its neighbors. Vector updating rule will use following equations:
If data x and best match node c belong to the same class,
If data x and best match node c belong to the different class.
))((),()()()1( twxcittwtw iii
))((),()()()1( twxcittwtw iii
Concept AssignmentConcept Assignment◦ Concept sense disambiguation
One keyword associated with more than one concept in WordNet.
Keyword “gold” has 4 senses in WordNet and keyword “copper” has five senses in WordNet.
For disambiguation of concepts we apply the same technique (i.e., cosine similarity measure) used in topic tracking.
To construct a vector for each sense we will use a short description that appears in WordNet.
Concept AssignmentConcept AssignmentConcept assignment for leaf node
◦ If there are majority documents have the same concept we assign the concept to the leaf.
◦ If there is not majority we will choose a generic concept of all concept from WordNet to the leaf.
Concept assignment for non leaf node◦ If there are majority children have the same
concept we assign the concept to the internal node.
◦ If there is not majority we will choose a generic concept of all concept from WordNet to the internal node.