[Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data approach”
-
Upload
data-beers -
Category
Data & Analytics
-
view
203 -
download
1
Transcript of [Databeers] 06/05/2014 - Boris Villazon: “Data Integration - A Linked Data approach”
Data Integration,A Linked Data Approach
Boris Villazón-Terrazas@boricles
Slides available at: http://www.slideshare.net/boricles/
2
ToC
» Introduction
» Linked Data
» Use Cases
3
Introduction
Current data systems combine data from a tremendous number of resources
….
4
Introduction
We use the term data shape to refer on how data is arranged and structured.resource data shape
1. Michael Hausenblas, Boris Villazon-Terrazas, Richard Cyganiak. Data shapes and data transformations. arXiv preprint arXiv:1211.1565
Fundamental data shapes
• tabular
• tree
• graph
5
Introduction
Data Integration
6
Classic Web
MovieDB
CIAWorld
FactBook
Data exposed to the Web via
HTML, pdf, etc.
© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
7
Classic Web
Information from single pages
can be found via search engines
Complex queries over multiple pages / data
sources?
© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
8
What do we actually want?
Use the Web like a single global databaseMove from a Web of documents to a Web of Data
MovieDB
CIAWorld
FactBook
© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
9
Linked Data enables such Web of Data
MovieDB
CIAWorld
FactBook
Global Identifier: URI (Uniform Resource Identifier), which is a string of characters used to identify a name or a resource on the Internet.
http://cia.../Boliviahttp://imdb.../TLLuvia
Data Model: RDF (Resource Description Framework), which is a standard model for data interchange on the Web
http://.../population
http://.../name
8000000
“Even the Rain”
Access Mechanism: HTTP
Connection: Typed Links
http://.../filming_location
© Slide adapted from “5min Introduction to Linked Data”- Olaf Hartig
11
Exploitation
Streaming resources
12
Enterprise Linked Data
Linked Data is not necessarily free data
Enterprises have many disparate data sources and data silos
Linked Data allows having global identifiers for data that can be accessed using the Web infrastructure and typed links between data possibly from different applications
Graph-based RDF data model allows consuming and merging data without having to do complex structural transformation
13
Enterprise Linked Data
Office AOffice B
Office CProducts
Company 1
Agency A
Services
Agency B
Company 2
14
GeoLinkedData Ecuador – http://geo.linkeddata.ec
Image taken from http://www.spatialytics.org/projects/geokettle/
RDF Generator Plugins
• GeoKettle- Spatially-enabled version of the generic ETL tool, Kettle (Pentaho Data
Integration)- Powerful, metadata-driven spatial ETL tool dedicates to the integration of
different geospatial data resources
Extract
Transform
Load
15
Publication and exploitation
Parliament
SPARQL
http://purl.org/Ecuador/geo/sparql
Rivers of the province of Guayas
PREFIX geo: <http://www.opengis.net/ont/geosparql#>PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX geof: <http://www.opengis.net/def/function/geosparql/>PREFIX sf: <http://www.opengis.net/ont/sf#>PREFIX units: <http://www.opengis.net/def/uom/OGC/1.0/>
SELECT distinct ?r ?label ?Figure ?r2 ?Figure2WHERE { ?r rdf:type <http://geo.linkeddata.ec/ontology/riosdobles_promsa>. ?r rdfs:label ?label. ?r geo:hasGeometry ?geo.?geo rdf:type ?geoType .?geo geo:asWKT ?Figure.
?r2 rdf:type <http://geo.linkeddata.ec/ontology/provincias_promsa>. ?r2 rdfs:label "GUAYAS"@es.?r2 geo:hasGeometry ?geo2.?geo2 rdf:type ?geoType2 .?geo2 geo:asWKT ?Figure2.
FILTER (geof:sfIntersects(?Figure2, ?Figure)). }
16
Publication and exploitation
Parliament
SPARQL
Rivers of the province of GuayasPREFIX geo: <http://www.opengis.net/ont/geosparql#>PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX geof: <http://www.opengis.net/def/function/geosparql/>PREFIX sf: <http://www.opengis.net/ont/sf#>PREFIX units: <http://www.opengis.net/def/uom/OGC/1.0/>
SELECT distinct ?r ?label ?Figure ?r2 ?Figure2WHERE { ?r a <http://geo.linkeddata.ec/ontology/riosdobles_promsa>. ?r rdfs:label ?label. ?r geo:hasGeometry ?geo.?geo rdf:type ?geoType .?geo geo:asWKT ?Figure.
?r2 a <http://geo.linkeddata.ec/ontology/provincias_promsa>. ?r2 rdfs:label "GUAYAS"@es.?r2 geo:hasGeometry ?geo2.?geo2 rdf:type ?geoType2 .?geo2 geo:asWKT ?Figure2.
FILTER (geof:sfIntersects(?Figure2, ?Figure)). }
http://200.0.31.28:8081/map4rdf-0.0.4-OL-SNAPSHOT/#dashboard
17
iSOCO, tentative exampleEnterprise Linked Data
Data SourceiSOCO ICM
Data SoourceiSOCO Lab
Data Source iSOCO ST
Linked DataPlatform
External data sources
Added value services
18
http://datosenlazados.org/cms/
http://linkeddata.ec/
19
Data Integration,A Linked Data Approach
Boris Villazón-Terrazas@boricles
Slides available at: http://www.slideshare.net/boricles/