agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria...
-
Upload
ciard-movement -
Category
Education
-
view
312 -
download
3
description
Transcript of agINFRA work on germplasm and soil Linked Data by Luca Matteus, Giovanni L’Abate and Maria...
agINFRA work ongermplasm and soilLinked DataPublishing agricultural databases as RDF
Luca Matteis1, Valeria Pesce2, Giovanni L’Abate3, Maria Antonietta Polombi31Bioversity InternationalVia dei Tre Denari 472/a00057 Maccarese (Fiumicino) Rome, [email protected]
2GFAR - The Global Forum on Agricultural Research c/o FAO,Viale delle Terme di Caracalla - 00153, Roma (Italy)
3Consiglio per la Ricerca e la sperimentazione in AgricolturaCentro di ricerca per l’agrobiologia e la pedologia (CRA-ABP)Piazza M. D’azeglio, 30 - 50121 Florence (Italy)
20/09/2014
Motivation – Why Linked Data?
Data coming from different sources is hard tointegrate because:
– Data is published using different formats(CSV, Excel, XML, JSON)
– Data is described using different standards(vocabularies, ontologies, taxonomies)
– Data is not linked together
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Motivation – Why Linked Data?
Data coming from different sources is hard tointegrate because:
– Data is published using different formats(CSV, Excel, XML, JSON)
– Data is described using different standards(vocabularies, ontologies, taxonomies)
– Data is not linked together
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Motivation – Why Linked Data?
Data coming from different sources is hard tointegrate because:
– Data is published using different formats(CSV, Excel, XML, JSON)
– Data is described using different standards(vocabularies, ontologies, taxonomies)
– Data is not linked together
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Motivation – Why Linked Data?
Data coming from different sources is hard tointegrate because:
– Data is published using different formats(CSV, Excel, XML, JSON)
– Data is described using different standards(vocabularies, ontologies, taxonomies)
– Data is not linked together
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Motivation – Why Linked Data?
Linked Data provides the principles that enablesmarter integration of data:
– Publish your data as RDF (JSON-LD, Turtle, RDFa,etc...)
– Resources should be identified using resolvableHTTP URIs
– Link your resources to other RDF resources usingHTTP URIs
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Motivation – Why Linked Data?
Linked Data provides the principles that enablesmarter integration of data:
– Publish your data as RDF (JSON-LD, Turtle, RDFa,etc...)
– Resources should be identified using resolvableHTTP URIs
– Link your resources to other RDF resources usingHTTP URIs
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Motivation – Why Linked Data?
Linked Data provides the principles that enablesmarter integration of data:
– Publish your data as RDF (JSON-LD, Turtle, RDFa,etc...)
– Resources should be identified using resolvableHTTP URIs
– Link your resources to other RDF resources usingHTTP URIs
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Motivation – Why Linked Data?
Linked Data provides the principles that enablesmarter integration of data:
– Publish your data as RDF (JSON-LD, Turtle, RDFa,etc...)
– Resources should be identified using resolvableHTTP URIs
– Link your resources to other RDF resources usingHTTP URIs
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
5 stars Linked Data
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Data source analysis
PostgreSQL SISI
MSAccess
a) CRA CNCPsoil data
PlantaRes
MySQL
b) CRA PlantaResgermplasm data
original source
� intermediate source
� published data
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
1st Step: RDF Conversion
PostgreSQL SISI
MSAccess
D2RQ
a) CRA CNCPsoil data
original source
� intermediate source
� published data
published linked data
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
D2RQD2RQ automatically converts relational databases intoRDF. It also publishes the data as Linked Data alongwith a SPARQL endpoint.
https://aginfra-sg.ct.infn.it/rdf/cncp/
2nd Step: Mapping to RDF Vocabularies
ID type Latitudine_WGS84 Longitudine_WGS84 ...
16.4LPhk1-1 observation 42.57 12.93 ...
becomes...
{
" @id " : "http://rdf.entecra.it/soil/ 16.4LPhk1-1 ",
" @type " : " soil:ObservedSoilSite ",
" geo:lat " : "42.57",
" geo:long " : "12.93",
...}
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
2nd Step: Mapping to RDF Vocabularies
ID type Latitudine_WGS84 Longitudine_WGS84 ...
16.4LPhk1-1 observation 42.57 12.93 ...
becomes...
{
" @id " : "http://rdf.entecra.it/soil/ 16.4LPhk1-1 ",
" @type " : " soil:ObservedSoilSite ",
" geo:lat " : "42.57",
" geo:long " : "12.93",
...}
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
3rd Step: Linking Data
CNCP
GeoNames
DBpedia
16.4LPhk1-1
“42.57”
geo:lat
“12.93”
geo:
long
gn:6541462
gn:locatedIn
“Rieti”
gn:n
ame
46187
gn:pop
ulation
dbpedia:Rieti
rdfs
:see
Also
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
3rd Step: Linking Data
CNCP
GeoNames
DBpedia
16.4LPhk1-1
“42.57”
geo:lat
“12.93”
geo:
long
gn:6541462
gn:locatedIn
“Rieti”
gn:n
ame
46187
gn:pop
ulation
dbpedia:Rieti
rdfs
:see
Also
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
3rd Step: Linking Data
CNCP
GeoNames
DBpedia
16.4LPhk1-1
“42.57”
geo:lat
“12.93”
geo:
long
gn:6541462
gn:locatedIn
“Rieti”
gn:n
ame
46187
gn:pop
ulation
dbpedia:Rieti
rdfs
:see
Also
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
3rd Step: Linking Data
CNCP
GeoNames
DBpedia
16.4LPhk1-1
“42.57”
geo:lat
“12.93”
geo:
long
gn:6541462
gn:locatedIn
“Rieti”
gn:n
ame
46187
gn:pop
ulation
dbpedia:Rieti
rdfs
:see
Also
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Apps
We can now build applications that
– make queries via SPARQL endpoints
– crawl data by following links
– integrate various RDF dumps
– query data from multiple sources
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Apps
We can now build applications that
– make queries via SPARQL endpoints
– crawl data by following links
– integrate various RDF dumps
– query data from multiple sources
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Apps
We can now build applications that
– make queries via SPARQL endpoints
– crawl data by following links
– integrate various RDF dumps
– query data from multiple sources
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Apps
We can now build applications that
– make queries via SPARQL endpoints
– crawl data by following links
– integrate various RDF dumps
– query data from multiple sources
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Future work
– More interlinks between datasets
– Interlinks with AGROVOC and integration in theAGRIS portal
– Move URIs under the entecra.it domain
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Future work
– More interlinks between datasets
– Interlinks with AGROVOC and integration in theAGRIS portal
– Move URIs under the entecra.it domain
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Future work
– More interlinks between datasets
– Interlinks with AGROVOC and integration in theAGRIS portal
– Move URIs under the entecra.it domain
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Acknowledgements
Riccardo Bruno (from INFN) and Gino Barreca (fromCRA) for the help with configuring the serverinfrastructure.
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014
Thank you!
Linked Open Data for germplasm and soil Luca Matteis 20/09/2014