Data Shapes and Data Transformations
-
Upload
boris-villazon-terrazas -
Category
Documents
-
view
1.765 -
download
0
Transcript of Data Shapes and Data Transformations
Data Shapes
and
Data Transformations
Michael Hausenblas1, Boris Villazón-Terrazas2, and Richard Cyganiak1 1 DERI, NUI Galway, Ireland
[email protected] 2 iSOCO, Madrid, Spain
Paper available at: http://arxiv.org/abs/1211.1565
4
Motivation
Current data systems combine data from a
tremendous number of resources 1.
extract transform
load
1. Pat Helland. If You Have Too Much Data, then 'Good Enough' Is Good Enough. Queue,
9:40:40-40:50, May 2011.
http://queue.acm.org/detail.cfm?id=1988603
5
Motivation
We use the term data shape to refer on how data is
arranged and structured.
resource data shape
7
Tabular
A tabular data shape organizes data items into a
table.
Location Environmental Services Carlow County Council 40
Cavan County Council 36
Clare County Council 38
Cork City Council 51
Cork County Council 47
Donegal County Council 45
Dublin City Council 43
8
Tree
A tree data shape organizes data items into a
hierarchy. A data item is designated to be the root of
the tree while the remaining data items are
partitioned into non-empty sets each of which is a
subtree of the root.
9
Graph
A graph data shape consists of a set of vertexes,
and a set of edges. An edge is a pair of vertexes.
The two vertexes are called edge endpoints.
TM
12
Features
Lossy transformation: all queries that are
possible on the original shape are also possible
on the resultant shape
13
Tabular - Tabular
• RDB – RDB
• SQL Select SELECT Location as Region, EServices as EnvServices
FROM services
• Declarative
• No Information loss
• No provenance
• Standard language, SQL
Location EServices Carlow County Council 40
Cavan County Council 36
Clare County Council 38
Cork City Council 51
Cork County Council 47
Donegal County Council 45
Dublin City Council 43
Data shape
transformation
Regjon EnvServices Carlow County Council 40
Cavan County Council 36
Clare County Council 38
Cork City Council 51
Cork County Council 47
Donegal County Council 45
Dublin City Council 43
14
Tabular - Tree
• RDB – XML
• XML representation of a relational database
• Operational
• No Information loss
Location EnvironmentalServices Carlow County Council 40
Cavan County Council 36
Clare County Council 38
Cork City Council 51
Cork County Council 47
Donegal County Council 45
Dublin City Council 43
Data shape
transformation
15
Tabular - Graph
• RDB – RDF
• W3C RDB2RDF WG – R2RML 1
• Declarative
• No Information loss
• W3C Recommendation
ID Name 10 Venus
20 Felipe
Data shape
transformation
R2RML Mapping
1. http://www.w3.org/TR/r2rml/
16
Tree - Tabular
• XML - RDB
• A technique and tool that rely on the XSD of the XML 1
• Operational
• No Information loss
Location EnvironmentalServices Carlow County Council 40
Cavan County Council 36
Clare County Council 38
Cork City Council 51
Cork County Council 47
Donegal County Council 45
Dublin City Council 43
Data shape
transformation
1. Amy Flik, Transforming XML into a Relational Database Using XML Schema Document Type, 2009.
http://scholarworks.gvsu.edu/cistechlib/48/
17
Tree - Tree
• XML - XML
• XSLT 1
• Declarative
• No Information loss
• W3C Recommendation
Data shape
transformation
1. http://www.w3.org/TR/xslt
18
Tree - Graph
• XML - RDF
• Gleaning Resource Descriptions from Dialects of Languages -
GRDDL 1
• Declarative
• No Information loss
• W3C Recommendation
Data shape
transformation
1. http://www.w3.org/TR/grddl/
19
Graph - Tabular
• RDF - RDB
• SPARQL 1 SELECT
• Declarative
• Information loss
• W3C Recommendation
Data shape
transformation
1. http://www.w3.org/TR/rdf-sparql-query/
20
Graph - Tree
• RDF - XML
• Rhizomik ReDeFer RDF2XHTML 1, relies on XSLT
• Declarative (XSLT)
• Information loss
• Ad-hoc tool
Data shape
transformation
1. http://rhizomik.net/html/redefer/
21
Graph - Graph
• RDF - RDF
• SPARQL 1 Construct
• Declarative
• No Information loss
• W3C Recommendation
Data shape
transformation
1. http://www.w3.org/TR/rdf-sparql-query/
24
Discussion
We can perform (loss-less) data shape transformations
between certain shapes.
A number of data shape transformations are already
standards
- For RDB2RDF, see R2RML and Direct Mapping.
- For XML2XML, see XSLT.
- For XML2RDF, see GRDDL.
Some data shape transformations are declarative in nature.
In certain cases we have to deal with lossy transformations.
Data Shapes
and
Data Transformations
Michael Hausenblas1, Boris Villazón-Terrazas2, and Richard Cyganiak1 1 DERI, NUI Galway, Ireland
[email protected] 2 iSOCO, Madrid, Spain
Paper available at: http://arxiv.org/abs/1211.1565