ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

13
Dandelion: from raw data to dataGEMs for developers Stefano Parmesan Tatiana Tarasova Ugo Scaiella Michele Barbera

description

This is the presentation showed during ISWC 2014 at Riva del Garda. The session was titled "Developers Workshop", and the focus was on how you solved practical problems for Linked Data. We presented dandelion platform and our data curation workflow, and the overall idea of dataGEM APIs.

Transcript of ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

Page 1: ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

Dandelion: from raw data to dataGEMs for

developers

Stefano Parmesan

Tatiana Tarasova

Ugo Scaiella

Michele Barbera

Page 2: ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

A bit of context

• SpazioDati s.r.l. • Italian startup: Pisa & Trento • Members of the DBpedia Association • Manage the italian DBpedia

Page 3: ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

Goal

• Close the gap between getting the data and using it

• Build a Knowledge Graph as-a-service: • Make it querable • Make it stable, make it scale • Support different access levels

Page 4: ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

How?

• Phase #1: PUT the data in • Data normalization • Entity deduplication

• Phase #2: GET the data out • Slices

Page 5: ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

How?

Data Normalisation Entity Deduplication Data Storage Data Access

Raw Data

Sample

Reconciliation Services

Source 1

Source N

Azkaban SilkFramework Titan Graph dandelion.eu

Linked Data

Slices

dataGEM

Page 6: ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

Why…

• … slices? • SQL-like APIs • Common knowledge, linked data

• … a graph at all? • Traversals • Data is centralized • Different sources, different access levels

Page 7: ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

Why…

• … titan/gremlin? • Scalable • Richer (multi-prop, undef-depth queries) • OpenSource • ElasticSearch powered

Page 8: ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

And now what?

• Still a prototype: • Private beta access to slices (demo) • English and italian DBpedia • Corporate private data

Page 9: ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

Future?

• Phase #1b: PUT the data in • Scalable entity deduplication

• Phase #2b: GET the data out • API for graph traversal • Text analysis tools (dataTXT) • Customizations

Page 10: ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

RDF mappings

<http://data.spaziodati.eu/resource/ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb> a code:ISTATAdministrativeDivision ; sd:childOf <http://data.spaziodati.eu/resource/7b7d45857f1372e1205bcfc87c19b2b2db2e0f59> ; sd:code "001001" ; sd:acheneID "ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb" ; code:cadastralCode "A074" ; sd:label "Agliè" ; code:elevation "315"^^xsd:int ; code:isCoastal "false"^^xsd:boolean ; code:isMountainous "false"^^xsd:boolean ; sd:level "60"^^xsd:int . !_:node194hhq904x1 rdf:subject <http://data.spaziodati.eu/resource/ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb> ; rdf:predicate code:population ; rdf:object "2574"^^xsd:int ; sd:acheneID "31e4104e62168ffc4c3d6d278ecc775effff6ebc" ; metaprop:validSince "2001-10-21"^^xsd:date . !_:node194hhq904x2 rdf:subject <http://data.spaziodati.eu/resource/ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb> ; rdf:predicate code:population ; rdf:object "2644"^^xsd:int ; sd:acheneID "f38e87252cc5614faeec4abbeedd6315f5d00e9f" ; metaprop:validSince "2011-10-09"^^xsd:date .

Page 11: ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

Graph structure

Provenance nodes

Type nodes

Bristle node

Achene node

Page 12: ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

Traversing

• v.as(‘x’).out(‘sd:childOf’) .loop(‘x’){ cur -> cur.outE(‘sd:childOf’).hasNext() }.path()

Page 13: ISWC 2014 - Dandelion: from raw data to dataGEMs for developers

Stefano Parmesan [email protected]