Melinda: Methods and tools for Web Data Interlinking
-
Upload
francois-scharffe -
Category
Technology
-
view
1.177 -
download
1
description
Transcript of Melinda: Methods and tools for Web Data Interlinking
Introduction Framework Tools Application Conclusions
Melinda
Methods and tools for Web data Interlinking
François Schar�e
December @ STI Innsbruck
Introduction Framework Tools Application Conclusions
1 Introduction
2 Framework
3 Tools
4 Application
5 Conclusions
Introduction Framework Tools Application Conclusions
Publishing datasets on the Web
Four publication principles
1 Resources are identi�ed by URIs.
2 URIs are dereferenceable.
3 When a URI is dereferenced, a description of the identi�ed
resource should be returned, ideally adapted through content
negotiation.
4 Published Web datasets must contain links to other Web
datasets.
Introduction Framework Tools Application Conclusions
Interlinking datasets
Links are contained in speci�c datasets
<http://www.example.org/linkset/DBPedia-MB>
a void:Linkset ;
void:target <http://www.dpbedia.org>;
void:target <http://www.musicbrainz.org>;
<http://www.example.org/linkset/DBPedia-MB>
<http://www.dbpedia.org/resource/
Johann_Sebastian_Bach>
owl:sameAs
<http://www.musicbrainz.org/artist/
24f1766e-9635-4d58-a4d4-9413f9f98a4c> .
Introduction Framework Tools Application Conclusions
Web Data Cloud
Introduction Framework Tools Application Conclusions
Goodie : Open Data's coming up
data.gov, US Data Act
data.gov.co.uk, Sir TBL on the track
Other intitiatives around : from the EU, Open Data intitiatives
Introduction Framework Tools Application Conclusions
What do we do ?
We propose a framework capturing the various data
interlinking methods
We study existing tools and position them in the framework
We propose an architecture allowing to articulate ontology
alignment and interlinking tools
Introduction Framework Tools Application Conclusions
General approach
URI1 URI2Data interlinking
owl :sameAs
Fig.: The data interlinking problem.
Introduction Framework Tools Application Conclusions
Manual resource alignement
URI1 URI2
URI transformation
owl :sameAs
Fig.: URI transformation.
Introduction Framework Tools Application Conclusions
Matching identi�ers - Exemple
http://dbpedia.org/resource/Johann_Sebastian_Bach
http://www.lastfm.fr/music/Johann+Sebastian+Bach
owl:sameAs
URI alignment
Fig.: URI transformation exemple
Introduction Framework Tools Application Conclusions
Datasets sharing a common ontology
O1
URI1 URI2
Resource
matching of
datasets described
by the same
ontology
owl :sameAs
Fig.: Matching two datasets described according to the same ontology.
Introduction Framework Tools Application Conclusions
Datasets sharing a common ontology - Exemple
URI1 URI2first
mo:MusicArtist
last first last
Johann-Sebastian Bach
Jean-Sébastien Bach
Resource matching algorithm,datasets described according to a common ontology
type type
DBPedia Musicbrainz
Fig.: Matching data sharing a common ontology
Introduction Framework Tools Application Conclusions
Matching datasets having heterogeneous ontologies
O1 O2
URI1 URI2
Implicit alignment
Resource
matching of
datasets described
by di�erent
ontologies
owl :sameAs
Fig.: Two datasets matched using an implicit alignment.
Introduction Framework Tools Application Conclusions
Exemple
URI1 URI2
mo:MusicArtist
givenname
nameSebastianBach"
"Johann"Jean-Sébastien"
"Bach"
type type
OpenCyc Musicbrainz
Classical Music Performer
English ID
Introduction Framework Tools Application Conclusions
General interlinking framework
O1 O2
URI1 URI2
Ontology matching
Alignment
Data interlinking
owl :sameAs
Fig.: General framework for data interlinking involving ontology matching.
Introduction Framework Tools Application Conclusions
Processes and speci�cations
process result
instance link speci�cation linkset
class matcher alignment
Tab.: Matching process, interlinks, and their results.
Introduction Framework Tools Application Conclusions
Analysis criterion
Degree of Automation
Is the tool completely automatic ?
Does the tool need to be parametrized by the user ? What kind
of parameters (data matching techniques, ontology
alignment) ?
Used matching techniques
String matching ?
External functions (values conversion, data transformations) ?
Similarity propagation ?
Other techniques ?
Domain : Is the tool speci�c for a given domain ?
Introduction Framework Tools Application Conclusions
Analysis criterion
Ontologies
Does the tool take into account ontologies associated to the
datasets ?
Does the tool allow to interlink datasets described according
to di�erent ontologies ?
In the case the ontologies di�er, does the tool perform
ontology alignment ?
Output
What does the tool produce in output ?
Does the tool propose to merge the two input datasets ?
Postprocessing Does the tool perform any post-processing
operations ?
Introduction Framework Tools Application Conclusions
Six interlinking tools
RKB-CRS Coreference resolution service of the RKB RDF
Knowledge Base.
LD-mapper Interlinking tool for the music ontology MO.
ODD Linker Interlinking tool based on SQL record matching.
RDF-AI Interlinking and data fusion tool.
Silk et Silk LSL Interlinking tool and link speci�cation language.
Knofuss architecture Outil Interlinking and data fusion tool with
ontology alignment.
Introduction Framework Tools Application Conclusions
Six interlinking tools
owl:sameAs
URI 2
Resource comparison method
URI 1
O1 O2Implicit
Alignment
OntologyMatchingSystem
Silk
ODD-Linker LD-Mapper
RDF-AI Knofuss
ExplicitAlignment
RKB-CRS
Fig.: Tools positioned in the de�ned framework
Introduction Framework Tools Application Conclusions
Application
Let us consider a link speci�cation between DBPedia andGeonames :
<Silk><Prefix id="rdfs" namespace=
"http://www.w3.org/2000/01/rdf-schema#" /><Prefix id="dbpedia" namespace=
"http://dbpedia.org/ontology/" /><Prefix id="gn" namespace=
"http://www.geonames.org/ontology#" />
<DataSource id="dbpedia"><EndpointURI>http://demo_sparql_server1/sparql</EndpointURI><Graph>http://dbpedia.org</Graph>
</DataSource>
<DataSource id="geonames"><EndpointURI>http://demo_sparql_server2/sparql</EndpointURI><Graph>http://sws.geonames.org/</Graph>
</DataSource>
<Thresholds accept="0.9" verify="0.7" /><Output acceptedLinks="accepted_links.n3"
verifyLinks="verify_links.n3"mode="truncate" />
<Interlink id="cities"><LinkType>owl:sameAs</LinkType><SourceDataset dataSource="dbpedia" var="a"><RestrictTo>
?a rdf:type dbpedia:City</RestrictTo>
</SourceDataset><TargetDataset dataSource="geonames" var="b"><RestrictTo>
?b rdf:type gn:P</RestrictTo>
</TargetDataset><LinkCondition><AVG>
<Compare metric="jaroSimilarity"><Param name="str1" path="?a/rdfs:label" /><Param name="str2" path="?b/gn:name" />
</Compare><Compare metric="numSimilarity">
<Param name="num1"path="?a/dbpedia:populationTotal" />
<Param name="num2" path="?b/gn:population" /></Compare>
</AVG></LinkCondition>
</Interlink></Silk>
Introduction Framework Tools Application Conclusions
Application
The alignment implicitely contained in the link speci�cation.
:dbp-geo a align:Alignment;align:onto1 <http://dbpedia.org/ontology/>;align:onto2 <http://www.geonames.org/ontology#>;align:map [ :map1 a align:Cell;align:entity1 dbpedia:City;align:entity2 gn:P;align:relation align:subsumedBy.
];align:map [ :map2 a align:Cell;align:entity1 dbpedia:populationTotal;align:entity2 gn:population;align:relation align:equivalent.
];align:map [ :map3 a align:Cell;align:entity1 rdfs:label;align:entity2 gn:name;align:relation align:equivalent.
].
align:map [ :map2 a align:Cell;align:entity1 [ a align:Property;
edoal:and dbpedia:populationTotal.edoal:and [ a edoal:PropertyDomainRestriction;
edoal:domain dbpedia:City.];
align:entity2 [ a align:Property;edoal:and gn:population;
edoal:and [ a edoal:PropertyDomainRestriction;edoal:domain gn:P. ];
align:relation align:equivalent.];align:map [ :map2 a align:Cell;
align:entity1 [ a align:Property;edoal:and rdfs:label.
edoal:and [ a edoal:PropertyDomainRestriction;edoal:domain dbpedia:City.
];align:entity2 [ a align:Property;
edoal:and gn:name;edoal:and [ a edoal:PropertyDomainRestriction;
edoal:domain gn:P. ];align:relation align:equivalent.
].
Introduction Framework Tools Application Conclusions
Application
Using the alignment, the link speci�cation can be simpli�ed.
<UseAlignment rdf:resource="#dbp-geo" />
<Interlink id="cities"><LinkType>owl:sameAs</LinkType><LinkCell rdf:resource="#map1" /><LinkCondition><AVG>
<Compare metric="jaroSimilarity"><CellParam rdf:resource="#map2" />
</Compare><Compare metric="numSimilarity">
<CellParam rdf:resource="#map3" /></Compare>
</AVG></LinkCondition>
<Thresholds accept="0.9" verify="0.7" /><Output acceptedLinks="accepted_links.n3"verifyLinks="verify_links.n3"mode="truncate" />
</Interlink>
Introduction Framework Tools Application Conclusions
Conclusions
We propose a framework for data interlinking on the Web of
data.
We have presented existing tools and positioned them wrt the
framework.
We propose a simpli�cation of the interlinking task and
demonstrate it on an example.
Our current work goes towards more interoperability for link
speci�cations :
Is it possible to construct more generic link speci�cations ? ie
attached to datasets or ontologies
Is it possible to automatically �nd out the key properties
allowing to identify matching pairs ?
Introduction Framework Tools Application Conclusions
For more
http://melinda.inrialpes.fr
François Schar�e et Jérôme Euzenat. Linked data meets
ontology matching : enhancing data interlinking through
ontology alignments. (submitted WWW'2010).