Data Shapes and Data Transformations

26
Data Shapes and Data Transformations Michael Hausenblas 1 , Boris Villazón-Terrazas 2 , and Richard Cyganiak 1 1 DERI, NUI Galway, Ireland [email protected] 2 iSOCO, Madrid, Spain [email protected] Paper available at: http://arxiv.org/abs/1211.1565

Transcript of Data Shapes and Data Transformations

Data Shapes

and

Data Transformations

Michael Hausenblas1, Boris Villazón-Terrazas2, and Richard Cyganiak1 1 DERI, NUI Galway, Ireland

[email protected] 2 iSOCO, Madrid, Spain

[email protected]

Paper available at: http://arxiv.org/abs/1211.1565

2

ToC

» Motivation

» Fundamental data shapes

» Data shapes transformations

» Discussion

3

ToC

» Motivation

» Fundamental data shapes

» Data shapes transformations

» Discussion

4

Motivation

Current data systems combine data from a

tremendous number of resources 1.

extract transform

load

1. Pat Helland. If You Have Too Much Data, then 'Good Enough' Is Good Enough. Queue,

9:40:40-40:50, May 2011.

http://queue.acm.org/detail.cfm?id=1988603

5

Motivation

We use the term data shape to refer on how data is

arranged and structured.

resource data shape

6

ToC

» Motivation

» Fundamental data shapes

» Data shapes transformations

» Discussion

7

Tabular

A tabular data shape organizes data items into a

table.

Location Environmental Services Carlow County Council 40

Cavan County Council 36

Clare County Council 38

Cork City Council 51

Cork County Council 47

Donegal County Council 45

Dublin City Council 43

8

Tree

A tree data shape organizes data items into a

hierarchy. A data item is designated to be the root of

the tree while the remaining data items are

partitioned into non-empty sets each of which is a

subtree of the root.

9

Graph

A graph data shape consists of a set of vertexes,

and a set of edges. An edge is a pair of vertexes.

The two vertexes are called edge endpoints.

TM

10

ToC

» Motivation

» Fundamental data shapes

» Data shapes transformations

» Discussion

11

Features

Input/Output, generic data shape, and specific

implementation

Declarative/Operational

12

Features

Lossy transformation: all queries that are

possible on the original shape are also possible

on the resultant shape

13

Tabular - Tabular

• RDB – RDB

• SQL Select SELECT Location as Region, EServices as EnvServices

FROM services

• Declarative

• No Information loss

• No provenance

• Standard language, SQL

Location EServices Carlow County Council 40

Cavan County Council 36

Clare County Council 38

Cork City Council 51

Cork County Council 47

Donegal County Council 45

Dublin City Council 43

Data shape

transformation

Regjon EnvServices Carlow County Council 40

Cavan County Council 36

Clare County Council 38

Cork City Council 51

Cork County Council 47

Donegal County Council 45

Dublin City Council 43

14

Tabular - Tree

• RDB – XML

• XML representation of a relational database

• Operational

• No Information loss

Location EnvironmentalServices Carlow County Council 40

Cavan County Council 36

Clare County Council 38

Cork City Council 51

Cork County Council 47

Donegal County Council 45

Dublin City Council 43

Data shape

transformation

15

Tabular - Graph

• RDB – RDF

• W3C RDB2RDF WG – R2RML 1

• Declarative

• No Information loss

• W3C Recommendation

ID Name 10 Venus

20 Felipe

Data shape

transformation

R2RML Mapping

1. http://www.w3.org/TR/r2rml/

16

Tree - Tabular

• XML - RDB

• A technique and tool that rely on the XSD of the XML 1

• Operational

• No Information loss

Location EnvironmentalServices Carlow County Council 40

Cavan County Council 36

Clare County Council 38

Cork City Council 51

Cork County Council 47

Donegal County Council 45

Dublin City Council 43

Data shape

transformation

1. Amy Flik, Transforming XML into a Relational Database Using XML Schema Document Type, 2009.

http://scholarworks.gvsu.edu/cistechlib/48/

17

Tree - Tree

• XML - XML

• XSLT 1

• Declarative

• No Information loss

• W3C Recommendation

Data shape

transformation

1. http://www.w3.org/TR/xslt

18

Tree - Graph

• XML - RDF

• Gleaning Resource Descriptions from Dialects of Languages -

GRDDL 1

• Declarative

• No Information loss

• W3C Recommendation

Data shape

transformation

1. http://www.w3.org/TR/grddl/

19

Graph - Tabular

• RDF - RDB

• SPARQL 1 SELECT

• Declarative

• Information loss

• W3C Recommendation

Data shape

transformation

1. http://www.w3.org/TR/rdf-sparql-query/

20

Graph - Tree

• RDF - XML

• Rhizomik ReDeFer RDF2XHTML 1, relies on XSLT

• Declarative (XSLT)

• Information loss

• Ad-hoc tool

Data shape

transformation

1. http://rhizomik.net/html/redefer/

21

Graph - Graph

• RDF - RDF

• SPARQL 1 Construct

• Declarative

• No Information loss

• W3C Recommendation

Data shape

transformation

1. http://www.w3.org/TR/rdf-sparql-query/

22

Summary

23

ToC

» Motivation

» Fundamental data shapes

» Data shapes transformations

» Discussion

24

Discussion

We can perform (loss-less) data shape transformations

between certain shapes.

A number of data shape transformations are already

standards

- For RDB2RDF, see R2RML and Direct Mapping.

- For XML2XML, see XSLT.

- For XML2RDF, see GRDDL.

Some data shape transformations are declarative in nature.

In certain cases we have to deal with lossy transformations.

25

Data Shapes

and

Data Transformations

Michael Hausenblas1, Boris Villazón-Terrazas2, and Richard Cyganiak1 1 DERI, NUI Galway, Ireland

[email protected] 2 iSOCO, Madrid, Spain

[email protected]

Paper available at: http://arxiv.org/abs/1211.1565