A Scalable Approach to Learn Semantic Models of Structured Sources
-
Upload
mohsen-taheriyan -
Category
Science
-
view
250 -
download
2
description
Transcript of A Scalable Approach to Learn Semantic Models of Structured Sources
A Scalable Approach to Learn Semantic Models of Structured Sources
Mohsen Taheriyan
Craig Knoblock
Pedro Szekely
Jose Luis Ambite 8th IEEE International Conference on Semantic Computing
2
Motivation
How to express the intended meaning of data?
Explicit semantics is missing in many of the structured sources
creator? actor? rightsHolder?
artwork? movie? referenced entity?
3
Map the Source to the Domain Ontology
EDM: Europeana Data Model SKOS: Simple Knowledge Organization SystemFOAF: Friend of a FriendAAC: American Art CollaborativeElementsGr2: RDA Group 2 ElementsORE: Open Archive InitiativeDCTerms: Dublin Core Metadata Terms
Data Source: artworks in the Indianapolis Museum of Art
Domain ontologies
Semantic Model: a mapping from the source to the concepts and relationships defined by the domain
ontologies
4
Semantic Model
aac:CulturalHeritageObject edm:WebResou
rce
skos:Concept
aac:Person
edm:EuropeanaAggregation
dcterms:title
edm:aggregatedCHO
skos:prefLabel
ElementsGr2:nameOfThePerson
rdf:type
edm:hasResource
dcterms:creator
edm:hasType
dcterms:description
Key ingredient to automate source discovery, data integration, and publishing RDF triples
5
Problem: How to automatically learn a semantic model for a source
6
Main Idea
Sources in the same domain often have similar data
Exploit knowledge of known semantic models to hypothesize a semantic model for a new sources
7
Previous Approach (ISWC 2013)Input
Learn semantic types for attributes(s)
• Sample data from new source (S)• Domain Ontologies (O)• Known semantic models
Construct Graph G=(V,E)
Generate mappings between attributes(S) and V
Generate and rank semantic models
1
2
3
4
Output• A ranked set of semantic models for
S
8
LimitationsInput
Learn semantic types for attributes(s)
• Sample data from new source (S)• Domain Ontologies (O)• Known semantic models
Construct Graph G=(V,E)
Generate mappings between attributes(S) and V
Generate and rank semantic models
1
2
3
4
Output• A ranked set of semantic models for
S
Consider only one semantic type (label) for each attribute
Not scalable to sources with a large number of attributes
9
ContributionsInput
Learn semantic types for attributes(s)
• Sample data from new source (S)• Domain Ontologies (O)• Known semantic models
Build Graph G=(V,E)
Generate mappings between attributes(S) and V
Generate and rank semantic models
1
2
3
4
Output• A ranked set of semantic models for
S
Consider K candidate semantic types per attribute
A Beam search algorithm to score and prune the mappings
10
Example
New source: Indianapolis Museum of Art
EDM
SKOS
FOAF
AAC
ElementsGr2
ORE
DCTerms
Domain ontologies:
S1(title, creationDate, name, birthDate, deathDate, type)
Known Semantic Models:S1: Dallas MuseumS2: The Metropolitan Museum of Art
S2(name, copyright, materials, dimensions, imageUrl)
S(title, label, image, type, artist)
Goal: Semantic model for source S
Semantic model of S1
Semantic model of S2
11
• Sample data from new source (S)
ApproachInput
Learn semantic types for attributes(s)
• Domain Ontologies (O)• Known semantic
models
Construct Graph G=(V,E)
Generate mappings between attributes(S) and V
Generate and rank semantic models
1
2
3
4
Output• A ranked set of semantic models for
S
12
Learn Semantic Types• A CRF-based machine learning technique to learn Semantic Types for each
attribute from its data
• Semantic Type– Ontology Class: <class_uri>– Data Property + Domain Class: <class_uri, property_uri>
• Pick top K semantic types according to their confidence values
New source: S(title, label, image, type, artist)
title <aac:CulturalHeritageObject, dcterms:title> 0.19
<aac:CulturalHeritageObject, rdfs:label> 0.08
label <aac:CulturalHeritageObject, dcterms:description>
0.7
<aac:Person, ElementsGr2:note> 0.03
image <edm:WebResource> 0.58
<foaf:Document> 0.41
type <skos:Concept, skos:prefLabel> 0.82
<skos:Concept, rdfs:label> 0.15
name <foaf:Person, foaf:name> 0.27
<aac:Person, ElementsGr2:nameOfThePerson>
0.19
13
• Sample data from new source (S)
ApproachInput
Learn semantic types for attributes(s)
• Domain Ontologies (O)• Known semantic
models
Construct Graph G=(V,E)
Generate mappings between attributes(S) and V
Generate and rank semantic models
1
2
3
4
Output• A ranked set of semantic models for
S
14
Build Graph G: Add Known Models
15
Build Graph G: Add Semantic Types
16
Build Graph G: Expand with Paths from Ontologies
17
• Sample data from new source (S)
ApproachInput
Learn semantic types for attributes(s)
• Domain Ontologies (O)• Known semantic
models
Construct Graph G=(V,E)
Generate mappings between attributes(S) and V
Generate and rank semantic models
1
2
3
4
Output• A ranked set of semantic models for
S
18
Map Source Attributes to the GraphNew source: S(title, label, image, type, artist)
title <aac:CulturalHeritageObject, dcterms:title> <aac:CulturalHeritageObject, rdfs:label>
label <aac:CulturalHeritageObject, dcterms:description>
<aac:Person, ElementsGr2:note>
image <edm:WebResource> <foaf:Document>
type <skos:Concept, skos:prefLabel> <skos:Concept, rdfs:label>
name <foaf:Person, foaf:name> <aac:Person, ElementsGr2:nameOfThePerson>
19
Map Source Attributes to the GraphNew source: S(title, label, image, type, artist)
title <aac:CulturalHeritageObject, dcterms:title> <aac:CulturalHeritageObject, rdfs:label>
label <aac:CulturalHeritageObject, dcterms:description>
<aac:Person, ElementsGr2:note>
image <edm:WebResource> <foaf:Document>
type <skos:Concept, skos:prefLabel> <skos:Concept, rdfs:label>
name <foaf:Person, foaf:name> <aac:Person, ElementsGr2:nameOfThePerson>
20
Map Source Attributes to the GraphNew source: S(title, label, image, type, artist)
title <aac:CulturalHeritageObject, dcterms:title> <aac:CulturalHeritageObject, rdfs:label>
label <aac:CulturalHeritageObject, dcterms:description>
<aac:Person, ElementsGr2:note>
image <edm:WebResource> <foaf:Document>
type <skos:Concept, skos:prefLabel> <skos:Concept, rdfs:label>
name <foaf:Person, foaf:name> <aac:Person, ElementsGr2:nameOfThePerson>
21
Scalability Issue
• Multiple mappings from attributes(S) to nodes of G– Each attribute has more than one semantic type– Multiple matches for each semantic type
• Not feasible to generate all possible mappings– The size of graph may be large – The source may have many attributes
• Exponential in terms of number of attributes– N attributes, M match for each MN mappings
22
Prune the Mappings• Score the partial mappings after mapping each
attribute– Coherence: number of nodes in a mapping that belong to
same component– Confidence: average confidence of semantic types in m– Score = arithmetic mean of coherence and size reduction
• Beam Search – Keep only top K mappings after mapping each attribute
• Number of mappings will not exceed a fixed size after mapping each attribute
23
Score the MappingsNew source: S(title, label, image, type, artist)
title <aac:CulturalHeritageObject, dcterms:title>, 0.19 <aac:CulturalHeritageObject, rdfs:label>
label <aac:CulturalHeritageObject, dcterms:description>, 0.7
<aac:Person, ElementsGr2:note>
image <edm:WebResource>, , 0.58 <foaf:Document>
type <skos:Concept, skos:prefLabel>, 0.82 <skos:Concept, rdfs:label>
name <foaf:Person, foaf:name>, 0.27 <aac:Person, ElementsGr2:nameOfThePerson>
Coherence: 4/9 = 0.44Confidence: 0.56Score: 0.5
Example Mapping 2
24
Score the MappingsNew source: S(title, label, image, type, artist)
title <aac:CulturalHeritageObject, dcterms:title>, 0.19 <aac:CulturalHeritageObject, rdfs:label>
label <aac:CulturalHeritageObject, dcterms:description>, 0.7
<aac:Person, ElementsGr2:note>
image <edm:WebResource>, , 0.58 <foaf:Document>
type <skos:Concept, skos:prefLabel>, 0.82 <skos:Concept, rdfs:label>
name <foaf:Person, foaf:name> <aac:Person, ElementsGr2:nameOfThePerson>, 0.19
Coherence: 6/9 = 0.66Confidence: 0.55Score: 0.605
Example Mapping 1
This mapping gets higher score even though it uses the 2nd ranked semantic type for artist
25
• Sample data from new source (S)
ApproachInput
Learn semantic types for attributes(s)
• Domain Ontologies (O)• Known semantic
models
Construct Graph G=(V,E)
Generate mappings between attributes(S) and V
Generate and rank semantic models
1
2
3
4
Output• A ranked set of semantic models for
S
26
Generate Semantic Models
• Select top M mappings• Compute a Steiner tree for each
mapping– A minimal tree connecting nodes of
mapping
• Each tree is a candidate model• Rank candidate models (Steiner trees)
– Cost – Score of the corresponding mapping
27
Steiner Tree
28
Evaluation• Dataset
– 29 museum data sources– 332 attributes (average 11 per source)
• Domain ontologies– EDM ,SKOS, FOAF, ORE, ElementsGr2, AAC, DCTerms– 119 classes, 351 properties
• Compute precision and recall between learned models and correct models
How many of the learned relationships are correct?
How many of the correct relationships are learned?
29
Quality
k = 1 correct semantic type learned only for 62% of attributes k = 4 correct semantic type was among the 4 learned types for 87% of attributes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 280.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
precision (k=1)
recall (k=1)
precision (k=4)
recall (k=4)
precision (correct types)
recall (correct types)
Number of known semantic models
Performance
The previous approach was not able to learn semantic models for sources with more than 4 attributes in the
timeout of 1 hourExample: S16 with only 5 attributes 16,633,298 mappings (29*29*29*31*22)
0 5 10 15 20 25 300
10
20
30
40
50
60
Previous Approach
New Approach
Number of Attributes
Time
(Kbeam = 100)
31
Related Work• Schema matching & schema mapping
– iMAP [Dhamankar et al., 2004], Clio [Fagin et al., 2009]
• Mapping databases and spreadsheets to ontologies– Mapping languages: D2R [Bizer, 2003], D2RQ [Bizer and Seaborne, 2004],
R2RML [Das et al., 2012]– Tools: RDOTE [Vavliakis et al., 2010], RDF123 [Han et al., 2008], XLWrap
[Langegger and Woß, 2009]– String similarity between column names and ontology terms [Polfliet and Ichise,
2010]
• Understand semantics of Web tables– Use column headers and cell values to find the labels and relations from a
database of labels and relations populated from the Web [Wang et al., 2012] [Limaye et al., 2010] [Venetis et al., 2011]
• Exploit Linked Open Data (LOD)– Link the values to the entities in LOD to find the types of the values and their
relationships [Muoz et al., 2013] [Mulwad et al., 2013]
• Learn Semantic Definitions of Online Information Sources [Carman, Knoblock, 2007]
– Learns LAV rules from known sources– Only learns descriptions that are conjunctive combinations of known
descriptions
32
Future Work
• Scalability regarding number of the known models– Create a more compact graph– Consolidate overlapping segments of known models
• Leverage Linked Open Data (LOD)– Exploit the relationships between instances– Improve the accuracy of the learned relations
• Integrate the new approach in Karma– http://www.isi.edu/integration/karma– @KarmaSemWeb