Order Dandelion Launchers, Dandelion Phonics Readers - Phonics Australia
ISWC 2014 - Dandelion: from raw data to dataGEMs for developers
-
Upload
spaziodati -
Category
Technology
-
view
556 -
download
1
description
Transcript of ISWC 2014 - Dandelion: from raw data to dataGEMs for developers
Dandelion: from raw data to dataGEMs for
developers
Stefano Parmesan
Tatiana Tarasova
Ugo Scaiella
Michele Barbera
A bit of context
• SpazioDati s.r.l. • Italian startup: Pisa & Trento • Members of the DBpedia Association • Manage the italian DBpedia
Goal
• Close the gap between getting the data and using it
• Build a Knowledge Graph as-a-service: • Make it querable • Make it stable, make it scale • Support different access levels
How?
• Phase #1: PUT the data in • Data normalization • Entity deduplication
• Phase #2: GET the data out • Slices
How?
Data Normalisation Entity Deduplication Data Storage Data Access
Raw Data
Sample
Reconciliation Services
Source 1
Source N
Azkaban SilkFramework Titan Graph dandelion.eu
Linked Data
Slices
dataGEM
Why…
• … slices? • SQL-like APIs • Common knowledge, linked data
• … a graph at all? • Traversals • Data is centralized • Different sources, different access levels
Why…
• … titan/gremlin? • Scalable • Richer (multi-prop, undef-depth queries) • OpenSource • ElasticSearch powered
And now what?
• Still a prototype: • Private beta access to slices (demo) • English and italian DBpedia • Corporate private data
Future?
• Phase #1b: PUT the data in • Scalable entity deduplication
• Phase #2b: GET the data out • API for graph traversal • Text analysis tools (dataTXT) • Customizations
RDF mappings
<http://data.spaziodati.eu/resource/ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb> a code:ISTATAdministrativeDivision ; sd:childOf <http://data.spaziodati.eu/resource/7b7d45857f1372e1205bcfc87c19b2b2db2e0f59> ; sd:code "001001" ; sd:acheneID "ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb" ; code:cadastralCode "A074" ; sd:label "Agliè" ; code:elevation "315"^^xsd:int ; code:isCoastal "false"^^xsd:boolean ; code:isMountainous "false"^^xsd:boolean ; sd:level "60"^^xsd:int . !_:node194hhq904x1 rdf:subject <http://data.spaziodati.eu/resource/ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb> ; rdf:predicate code:population ; rdf:object "2574"^^xsd:int ; sd:acheneID "31e4104e62168ffc4c3d6d278ecc775effff6ebc" ; metaprop:validSince "2001-10-21"^^xsd:date . !_:node194hhq904x2 rdf:subject <http://data.spaziodati.eu/resource/ef4a83008d4ffd9e85c1fcf6dfe59c8758226cdb> ; rdf:predicate code:population ; rdf:object "2644"^^xsd:int ; sd:acheneID "f38e87252cc5614faeec4abbeedd6315f5d00e9f" ; metaprop:validSince "2011-10-09"^^xsd:date .
Graph structure
Provenance nodes
Type nodes
Bristle node
Achene node
Traversing
• v.as(‘x’).out(‘sd:childOf’) .loop(‘x’){ cur -> cur.outE(‘sd:childOf’).hasNext() }.path()
Stefano Parmesan [email protected]