Publish and use your data
Transcript of Publish and use your data
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
1st Summer School on Smart Ci2es and Linked Open Data (LD4SC-‐15)
Hands-‐on 4 Publish and use your data
Álvaro Sicilia, Filip Radulovic
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Background
• Álvaro Sicilia ([email protected])
• Background: Computer Science
• From: Architecture, RepresentaMon & ComputaMon (ARC) Engineering and Architecture La Salle (FUNITEC) Universitat Ramon Llull Barcelona, Spain
• Since 2008 working with SemanMc Web technologies
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Background
-‐ IntUBE 2008-‐2011 7th Framework Programme Intelligent use of building’s energy informa2on
-‐ RÉPENER 2009-‐2012 Spanish NaMonal RDI Plan Control and improvement of energy efficiency in buildings through the use of repositories
-‐ SEMANCO 2011-‐2014 7th Framework Programme Seman2c Tools for Carbon Reduc2on in Urban Planning
Project Coordinator: VTT, Finland
Project Coordinator: ARC Engineering and Architecture La Salle, Spain
Project Coordinator: ARC Engineering and Architecture La Salle, Spain
-‐ OPTIMUS 2013-‐2016 7th Framework Programme Op2mising the energy use in ci2es with smart decision support system
Project Coordinator: NaMonal Technical University of Athens, Greece
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Index
• Requirements • Guidelines for the PublicaMon of Linked Data • Guidelines for the ExploitaMon of Linked Data • Hands-‐on Session
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Linked Data life cycle
Specification
Modelling
Generation Publication
Exploitation
Linking
5
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Requirements
• Legal framework of the dataset The publicaMon of energy related data requires that these data are equipped with a proper license framework in order to be later re-‐used and exploited by the wide public. Linked Open Data legal compliancy is closely related to data protecMon issues, IPR (Intellectual Property Rights) and copyright (legal enMtlements for work creaMons), and privacy.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Requirements type License
Comments Suitability for
LOD principles
Public Domain
CreaMve Commons CCZero (CC0)
-‐ The least restricMve CC license. -‐ Removes all copyright restricMons from content. -‐ Can be only applied by authors/neighbouring rights over the content. -‐ The Public Domain Mark can be applied by anyone to content that is already free of copyright restricMons .
Ideal
Open Data Commons Public Domain DedicaMon and Licence (PDDL)
-‐ Allows data and database unrestricted sharing, reuse, reproducMon and adapMon with no restricMons.
AfribuMon AfribuMon
CreaMve Commons AfribuMon 4.0 (CC-‐BY-‐4.0)
-‐ Allows the users to copy or remix the work in any way. -‐ The users must afribute the work to the original creator. -‐ The creator should provide a link to the CreaMve Commons page explaining the user’s responsibiliMes and provide an easily-‐accessed list of creators.
AfribuMon requirements may lead to afribuMon stacking
Open Data Commons AfribuMon License (ODC-‐BY)
-‐ Allows users to copy, distribute, and use the database. -‐ Allows users to produce works from the database, modify, transform and built upon the database. -‐ Users must afribute any public use of the database or works produced from the database and make clear to others the license of the database .
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Requirements type License
Comments Suitability for LOD
principles
AfribuMon Share-‐Alike
CreaMve Commons AfribuMon Share-‐Alike 4.0 (CC-‐BY-‐SA-‐4.0)
-‐ AfribuMon restricMons are applied. -‐ Users transforming the work into something new must distribute that work under the CC BY-‐SA license, or a similarly open license.
AfribuMon requirements may lead to afribuMon stacking. Share-‐alike requirements may lead to interoperability issues
Open Data Commons Open Database License (ODbL)
-‐ Users must keep the database open technologically and offer any adapted version of the database or works produced from it under the ODbL. -‐ Limited commercial reuse of the database or its contents. -‐ A separate Database Contents License (DbCL) to the contents of a database licensed under ODbL, which waives all rights in the individual contents.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Requirements
+ Easy to use + Widespread adopted + Flexible + Available to human &machine readable forms + Direct links between the resource and its license + Symbolic representaMon of the license to recognize usage terms
-‐CC licenses are copyright based and designed to protect creaMve works (content) – databases are not creaMve works but facts of this work -‐ Third party rights material included in the data may require addiMonal clearances and is not provided as informaMon in the license -‐Cannot be revoked once applied. CC licenses and ODC-‐By as well as ODC-‐ODbl are irrevocable
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Requirements
• Quality requirements according to ISO/IEC 25012 -‐ Accuracy:
-‐ The publishing dataset must be semanMcally and syntacMcally accurate. -‐ The dataset should not contain repeatedly redundant values. -‐ Accuracy is also expected for the reuse of the published dataset.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Requirements
• Quality requirements according to ISO/IEC 25012 -‐ Completeness:
-‐ The data items published are necessary to support the applicaMon for which it is intended.
-‐ Consistency: -‐ The dataset before published should be complete and consistent. -‐ ConflicMng statements and errors should be detected before publicaMon.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Requirements
• Quality requirements according to ISO/IEC 25012 -‐ Credibility:
-‐ Linked Open Data must be credible and fully compliant with the license, policy and terms of use derived from the provenance source. -‐ Agreements and afribuMons should be defined where appropriate to clarify users whether they can or cannot trust the data.
-‐ Timeless: -‐ The publicaMon process should be designed for its maintenance. -‐ The dataset should be Mmelessly handled by the responsible dataset supporter in order to maintain, update and enable the usage and exploitaMon of data. -‐ The processes and tools should be able to support maintaining the dataset over Mme
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Requirements
• Requirements according to AENOR PNE 178301 standard on Smart Ci2es and Open Data -‐ Open data must be published through persistent URLs and using standard, structured, open, and non-‐proprietary formats that allow the unique idenMficaMon of resources. -‐ Vocabularies used in open data must be published online through persistent URLs. -‐ Open datasets must be included in relevant open data catalogues. -‐ The organizaMon must promote the reuse of open data by providing supporMng documents and materials.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Requirements
• Requirements according to AENOR PNE 178301 standard on Smart Ci2es and Open Data -‐ Open data must be available by downloading the respecMve files and through web APIs or SPARQL endpoints. -‐ Non-‐discriminatory access to open data must be ensured by: not requiring administraMve procedures or user registraMon and by guaranteeing equal rights, non-‐discriminaMon and accessibility. AdministraMve procedures or user registraMon could be allowed in jusMfied cases. -‐ The access and use of open data must be periodically measured.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Requirements
• Requirements according to AENOR PNE 178301 standard on Smart Ci2es and Open Data -‐ Open data must be documented using metadata. -‐ Vocabularies used in open data must be documented using metadata. -‐ Open data licenses and use condiMons must be documented and published online. -‐ Open data licenses must be standard, self-‐documented, based on exisMng standards, and preferably in a machine-‐processable format.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Requirements
Requirements summary: • Legal compliance aspects à rights protecMon, license terms
• Quality of data and metadata ! accuracy, completeness, consistency, credibility, and sustainability
• Publica2on requirements ! data and vocabularies accessible
• Social requirements ! maintenance and promoMon
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Index
• Requirements • Guidelines for the Publica2on of Linked Data • Guidelines for the ExploitaMon of Linked Data • Hands-‐on Session
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Linked Data life cycle
Specification
Modelling
Generation Publication
Exploitation
Linking
18
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Ensure Legal compliance Two possible cases: a) RDF dataset is generated from a data source that permits the
use of the data, and when the publicaMon of the produced RDF dataset complies to the license and legal terms of the original data source
à RDF dataset can be published without obstacles
b) Or, to ensure legal compliance. Usually, legal aspects can be addressed by preserving the privacy of the data
à data anonymizaMon
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Ensure Legal compliance ! Data anonymiza2on Privacy-‐preserving data publishing :
a) Explicit idenMfiers are afributes that explicitly idenMfy the enMty of interest (e.g., id of a person, property number of a building)
b) Quasi idenMfiers (QID) are sets of afributes that can potenMally idenMfy the enMty of interest. (e.g., date of birth, postal code, gender…)
Radulovic F., García-‐Castro R., Gómez-‐Pérez A.: Towards the AnonymisaMon of RDF Framework. In Proceedings of the 27th InternaMonal Conference on Sotware Engineering and Knowledge Engineering (SEKE2015), Pifsburg, Pennsylvania, USA. July 2015.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Ensure Legal compliance ! Data anonymiza2on Techniques: • Suppression is a technique in which some values or complete
records in a dataset are replaced with some other specific value or record.
Id Building use Consump2on
1 School 12352
2 ResidenMal 2334
3 School 15121
4 Office 5252
Id Building use
1 School
2 ResidenMal
3 School
4 Office
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Ensure Legal compliance ! Data anonymiza2on
Techniques: • Generaliza2on is a technique that transforms pieces of data
into more general data or sets of data, and is suited for transformaMon of categorical afributes and discrete numerical afributes à use of postal codes instead of addresses, use range of values instead of specific values
Id Postal Code Consump2on
1 08006 12352
2 08022 2334
3 08021 15121
Id Postal Code Consump2on
1 0800* 12352
2 0802* 2334
3 0802* 15121
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Ensure Legal compliance ! Data anonymiza2on
Techniques: • Generaliza2on is a technique that transforms pieces of data
into more general data or sets of data, and is suited for transformaMon of categorical afributes and discrete numerical afributes à use of postal codes instead of addresses, use range of values instead of specific values
Id Postal Code Consump2on
1 08006 12352
2 08022 2334
3 08021 15121
Id Postal Code Consump2on
1 08006 10k-‐14k
2 08022 1k-‐5k
3 08021 14k-‐20k
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Ensure Legal compliance ! Data anonymiza2on Techniques: • Data aggrega2on: For example aggregate energy consumpMon
of buildings by type of buildings
Id Building use Consump2on
1 School 12352
2 ResidenMal 2334
3 School 15121
4 Office 5252
5 Office 5623
6 ResidenMal 3452
Building use Total Consump2on
School 13736
ResidenMal 2893
Office 5437
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Ensure Legal compliance ! Data anonymiza2on Techniques: • Data aggrega2on: For example aggregate energy consumpMon
of buildings by type of buildings
Id Building use Consump2on
1 School 12352
2 ResidenMal 2334
3 School 15121
4 Office 5252
5 Office 5623
6 ResidenMal 3452
Building use Ave. Consump2on
School 6836
ResidenMal 1443
Office 2677
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Ensure Legal compliance ! Data anonymiza2on Techniques: • Anatomiza2on and permuta2on are techniques that, unlike
suppression and generalizaMon, do not modify the data but rather remove the relaMonship between the quasi idenMfiers and sensiMve values.
• Perturba2on is a technique in which original data are replaced with noise or syntheMc data in such a way that staMsMcal analyses based on the perturbed data do not significantly differ from the staMsMcal analysis of the original data. à to replace observed values with the average computed on a small group of units
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Ensure Legal compliance ! Data anonymiza2on Steps to ensure legal aspects of the dataset
1. To iden2fy explicit idenMfiers, quasi idenMfiers, and sensiMve afributes in the dataset.
2. To select the techniques to use on the previously idenMfied afributes in order to ensure legal compliance.
3. To apply the selected techniques over the idenMfied afributes.
4. To modify the ontology. In the case that data anonymizaMon implies some changes in the data model
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Ensure Legal compliance ! Examples Leeds City Council example energy consumpMon of council sites within the Leeds jurisdicMon is licensed under the Open Government License, which permits the use and modificaMon of the data à it is not necessary to perform any addiMonal step in order to ensure legal compliance
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Ensure Legal compliance ! Examples BECA energy consump2on example Needed anonymizaMon: QIDs and sensi2ve a`ributes
Anonymiza2on Technique
{evaluaMon number} {tenant idenMfier} {residence idenMfier}
QIDs. Those afributes will be generalized (e.g., value 35698456 is generalized to 356*****).
{building idenMfier, residence idenMfier, tenant number}
QID. Since residence idenMfier is already generalized, addi2onal generaliza2on will be performed for tenant number
{comment} QID because in some cases it contains informaMon about tenant idenMfiers in natural language. Therefore, this afribute will be completely suppressed.
{evaluaMon number, residence size}
The residence size afribute will be generalized by taking into account the interval to which the size belongs to and then by assigning the mean value of the interval (e.g., if the size of residence is 38 square meters, the corresponding interval is 30-‐49 and, therefore, value 35 will be assigned).
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Publish the dataset and the ontology The goal of this step is to make accessible through the Web the ontology and the RDF dataset following Linked Data principles. 1. Use URIs to name (idenMfy) things. 2. Use HTTP URIs so that these things can be looked up
(interpreted, "dereferenced"). 3. Provide useful informaMon about what a name idenMfies
when it's looked up, using open standards such as RDF, SPARQL, etc.
4. Refer to other things using their HTTP URI-‐based names when publishing data on the Web.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Publish the dataset and the ontology Steps: 1. To store the RDF data into a persistent repository where
data can be then accessed and queried à RDF repository 2. To enable resolvable HTTP URIs and content negoMaMon,
i.e., the mechanisms for accessing the data through the Web.
3. To enable a SPARQL HTTP endpoint.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Publish the dataset and the ontology Steps: 1. To store the RDF data into a persistent repository where
data can be then accessed and queried à RDF repository
dataset Rdf dump Triple store
Sparql queries
dataset SQL RDF wrapper
Sparql queries
• Fast • Not up to date
• Not fast • Updated
R2RML mappings Rela/onal database
Virtuoso server
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Publish the dataset and the ontology Steps: 2. To enable resolvable HTTP URIs and content negoMaMon, i.e.,
the mechanisms for accessing the data through the Web
Pubby: Image taken from: hfp://wifo5-‐03.informaMk.uni-‐mannheim.de/pubby/
Humans à HTML content
Computers à RDF content
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Publish the dataset and the ontology Steps: 2. To enable resolvable HTTP URIs and content negoMaMon, i.e.,
the mechanisms for accessing the data through the Web
PUBBY configura2on Go to: tomcat/webapps/pubby/WEB-‐INF/config.n3
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Publish the dataset and the ontology Steps: 2. To enable resolvable HTTP URIs and content negoMaMon, i.e.,
the mechanisms for accessing the data through the Web
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon Content nego2a2on: HTML " for humans
Content nego2a2on: RDF " for computers
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Publish the dataset and the ontology Steps: 2. To enable resolvable HTTP URIs and content negoMaMon,
i.e., the mechanisms for accessing the data through the Web
Alterna2ves: • Linked Data API ! Elda or Puelia
• W3C Linked Data Plaeorm (LDP) specifica2on ! LDP4j or Apache Marmo`a
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Publish the dataset and the ontology ! Examples Leeds City Council & BECA energy consump2on examples 1. RDF dataset à Openlink's Virtuoso Open Source repository.
Ontology à hfp://smartcity.linkeddata.es/lcc/ontology/EnergyConsumpMon#,
hfp://smartcity.linkeddata.es/BECA/ontology/EnergyConsumpMon#
2. HTTP access to the data à Linked Data API (ELDA)
3. SPARQL endpoint à Virtuoso at hfp://smartcity.linkeddata.es/lcc/sparql hfp://smartcity.linkeddata.es/BECA/sparql
RDF dumps à hfp://smartcity.linkeddata.es/lcc/lcc-‐dataset.fl hfp://smartcity.linkeddata.es/BECA/BECA-‐dataset.fl
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Publish metadata and online documenta2on The goal of this step is to create and publish the documentaMon of the RDF dataset and the ontology. This documentaMon is oriented to both humans and machines and its purpose is to facilitate the usage of the dataset that is being made available.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Publish metadata and online documenta2on To create and publish human-‐readable metadata descripMons:
To create and publish a human-‐readable documentaMon of dataset and ontology. Providing documentaMon about the dataset and the ontology can ease the data usage to consumers
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Publish metadata and online documenta2on To create and publish machine-‐readable metadata descripMons: Two vocabularies published by the W3C allow describing datasets and data catalogues in RDF: -‐ VoID (Vocabulary of Interlinked Datasets) -‐ DCAT (Data Catalogue Vocabulary)
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Publish metadata and online documenta2on To create and publish machine-‐readable metadata descripMons: -‐ DCAT (Data Catalogue Vocabulary) @prefix os: <hfp://a9.com/-‐/spec/opensearch/1.1/> .
@prefix dct: <hfp://purl.org/dc/terms/> . @prefix xsd: <hfp://www.w3.org/2001/XMLSchema#> . @prefix api: <hfp://purl.org/linked-‐data/api/vocab#> . @prefix rdf: <hfp://www.w3.org/1999/02/22-‐rdf-‐syntax-‐ns#> . @prefix xhv: <hfp://www.w3.org/1999/xhtml/vocab#> . <h`p://smartcity.linkeddata.es/lcc> a dct:Dataset ; dct:license <h`p://purl.org/NET/rdflicense/ukogl1.0>; dct:source “We acknowledge that this dataset uses data coming from the Leeds City Council by including, please
check the machine-‐readable licenses provided here and further informa2on at h`p://www.leeds.gov.uk/opendata/pages/developer-‐datasets.aspx" ;
<hfp://www.w3.org/2002/07/owl#sameAs> <h`p://datahub.io/dataset/lcc-‐leeds-‐city-‐council-‐energy-‐consump2on-‐linked-‐data> . dct:publisher “The publisher of the dataset”; dct:language <h`p://id.loc.gov/vocabulary/iso639-‐1/en> ; dct:accrualPeriodicity <h`p://purl.org/linked-‐data/sdmx/2009/code#freq-‐W> ;
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Publish metadata and online documenta2on To create and publish machine-‐readable metadata descripMons: -‐ VoID (Vocabulary of Interlinked Datasets) @prefix rdf: <hfp://www.w3.org/1999/02/22-‐rdf-‐syntax-‐ns#> .
@prefix rdfs: <hfp://www.w3.org/2000/01/rdf-‐schema#> . @prefix foaf: <hfp://xmlns.com/foaf/0.1/> . @prefix dcterms: <hfp://purl.org/dc/terms/> . @prefix void: <hfp://rdfs.org/ns/void#> . @prefix xsd: <hfp://www.w3.org/2001/XMLSchema#> . ## your dataset <h`p://your.dataset.com/> rdf:type void:Dataset ; foaf:homepage <h`p://your.dataset.com/homepage> ; dcterms:Mtle “Title of your dataset" ; dcterms:descripMon “Descrip2on of your dataset." ; void:sparqlEndpoint <h`p://your.dataset.com/sparql> ; void:uriSpace "h`p://your.dataset.com/resource/"; void:exampleResource <h`p://your.dataset.com/resource/URI/XXXX> . dcterms:source " Descrip2on of the dataset source" ; dcterms:created “XXXX-‐XX-‐XX"^^xsd:date; dcterms:license <h`p://crea2vecommons.org/licenses/by/3.0/> dcterms:subject <h`p://dbpedia.org/resource/Building>;
void:triples 150297 ; void:enMMes 18890 ; void:classes 65 ; void:properMes 100 ; void:disMnctSubjects 18962 ; void:disMnctObjects 26097 ; ## datasets you link to :Anotherdataset rdf:type void:Dataset ; foaf:homepage < h`p://another.dataset.com/homepage> ; dcterms:Mtle “Another 2tle" ; dcterms:descripMon “Another descrip2on." ; void:exampleResource < h`p://another.dataset.com/resource/URI/XXXX > . :Yourdataset-‐Anotherdataset rdf:type void:Linkset ; void:linkPredicate <h`p://your.dataset.com/predicate used for linking> ; void:target <h`p://your.dataset.com/> ; void:target :Anotherdataset .
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Publish metadata and online documenta2on ! Examples Leeds City Council example DCAT descripMon: hfp://smartcity.linkeddata.es/lcc/dcat.fl
Ontology: hfp://smartcity.linkeddata.es/lcc/ontology/EnergyConsumpMon/
BECA energy consump2on example DCAT descripMon: hfp://smartcity.linkeddata.es/BECA/dcat.fl
Ontology: hfp://smartcity.linkeddata.es/BECA/ontology/EnergyConsumpMon
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Enable dataset discovery to enable the mechanisms to complement the efforts from the previous step and to allow both human and machines to discover and befer use the dataset. 1. To create a sitemap. 2. To register the dataset in a dataset catalogue. 3. To ensure the fulfillment of requirements for addiMon to the
LOD cloud.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Enable dataset discovery 1. To create a sitemap. A sitemap is a mechanism to inform search engines about the page structure of a certain web site in order to allow for a more efficient crawling. It is widely used and adopted by major search engines and it is therefore recommended for any type of web site including datasets.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Enable dataset discovery 1. To create a sitemap with sitemap4rdf.
There are tools like sitemap4rdf that can generate a sitemap based on the contents of a sparql endpoint: hfp://lab.linkeddata.deri.ie/2010/sitemap4rdf/
sitemap4rdf {your_sparql_endpoint} {prefix_of your_url}
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Enable dataset discovery 1. To create a sitemap with sitemap4rdf.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Enable dataset discovery 2. To register the dataset in a dataset catalogue.
There are available several online data catalogues that range from general to corporate iniMaMves:
-‐Datahub: this data management pla�orm covers a wide range of topics. It offers data collecMons, some of which are linked and open. -‐ Reegle: the gateway has already established itself as a popular informaMon portal in the fields of renewable energy and energy efficiency. It offers all of its data under W3C standards, i.e., it is open and Linked Data in a non-‐proprietary format (RDF). -‐ OpenEI: a collaboraMve knowledge-‐sharing pla�orm with free and open access to energy-‐ related data, models, tools, and informaMon. OpenEI features over 55,000 content pages, more than 600 downloadable datasets, regional gateways on a variety of energy-‐related topics, and numerous online tools.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Enable dataset discovery 2. To register the dataset in a dataset catalogue.
There are available several online data catalogues that range from general to corporate iniMaMves:
-‐DataCatalogs: a comprehensive list of open data catalogues in the world including representaMves from local, regional and naMonal governments, internaMonal organisaMons and numerous NGOs. NaMonal energy related data are contained in the listed datasets. -‐ Google Public Data: a corporate iniMaMve for large datasets publicaMon enabling exploraMon easiness, visualizaMon and communicaMon. -‐ READY4SmartCi2es: One of the direct outcomes of the project is a web-‐portal providing an extended list of ontologies and datasets for smart ciMes published both in human-‐readable (HTML web site) and machine-‐processable (RDF Format) formats.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Enable dataset discovery 2. To register the dataset in a dataset catalogue.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Enable dataset discovery 3. To ensure the fulfillment of requirements for addiMon to the LOD
cloud.
The dataset is checked with the record validator (hfp://validator.lod-‐cloud.net) provided by the LOD cloud website.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Enable dataset discovery 3. To ensure the fulfillment of requirements for addiMon to the LOD
cloud.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Dataset promo2on The promoMon is important in order to ensure that people are aware of the existence of the dataset and to ensure its usage. This way it can be used by third-‐parMes by querying, linking to other datasets and visualizing. The dataset can be promoted using different channels: Twifer, LinkedIn, Mailing lists, Workshops, Conferences, VoCamps….
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for publicaMon
Data set support Dataset support is a conMnuous process in which the creators and publishers of the dataset provide support in terms of possible errors correcMon (both related to data themselves and technical errors), data updates in the case that new data become available, and technical support in terms of solving any problem that can affect the accessibility of the dataset. Dataset support is usually provided by the persons that parMcipated in data generaMon and publicaMon.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Index
• Requirements • Guidelines for the PublicaMon of Linked Data • Guidelines for the Exploita2on of Linked Data • Hands-‐on Session
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Linked Data life cycle
Specification
Modelling
Generation Publication
Exploitation
Linking
64
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 1 -‐ SME use of public Linked Energy Data Contributors: Filip Radulovic (UPM) Use of different energy related data sources such as energy consumpMon, weather condiMons, building and HVAC system features, socio-‐economic indicators... To enable SMEs to develop new business models. LD benefits: data integraMon/interlinking, data-‐driven decision-‐making. Challenges: • Manual access to data. • Obtaining heterogeneous data from various sources. • Different data formats and structures. • Dynamically updated data. • Instance specific data with different detail levels. • No open license associated with data. • Data not available for reuse. • Privacy protecMon. • ParMal or missing data.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 1 -‐ SME use of public Linked Energy Data
EECITIES example
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use Case 1 -‐ SME use of public Linked Energy Data
Data connected through the Seman&c Energy Informa&on Framework
Energy asse
ssment (SAP, U
EP…) Energ
y simulaMon (UR
SOS, …)
Energy ana
lysis (data m
ining,..)
GIS model (
geometric d
ata)
DATA TOOLS
CADASTER
GIS
ENERGY PERFORMANCE
SOCIOECONOMIC
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use Case 1 -‐ SME use of public Linked Energy Data
Once a baseline reflec2ng the current state of the urban energy model has been created, different visualiz2on tools can be used to iden2fy problem areas.
Cluster view Table view
Performance indicators filtering
Mul2ple scale visualiza2on
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use Case 1 -‐ SME use of public Linked Energy Data
informa2on concerning the selected building which have not yet assessed
Building geometry obtained from the 3D model
Street address obtained from Google GeolocaMon services
Performance values to be calculated with energy assessment tool
Year of construcMon obtained from the cadastre
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use Case 1 -‐ SME use of public Linked Energy Data
Interface of the URSOS tool. The input data is automa2cally filled thanks to the seman2c integra2on of different data sources. Users can modify the input data in case there are errors.
Year of construcMon from the Cadastre
Geometry obtained from the 3D model
Street address name and Street view from Google GeolocaMon services
Wall, ground and roof properMes from the building typologies database
VenMlaMon from the building typologies database
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use Case 1 -‐ SME use of public Linked Energy Data
URSOS Energy calcula2on engine
GIS data
Census Cadastre Climate
Typology Socio-‐Economic
Energy-‐related data Seman2c Energy Informa2on Framework
Integrated Plaeorm
ELITE Federa&on engine
Ontology OWL-‐DL liteA
URSOS Input form
3D Maps
1
2
3 5
4
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use Case 1 -‐ SME use of public Linked Energy Data
URSOS Energy calcula2on engine
GIS data
Census Cadastre Climate
Typology Socio-‐Economic
Energy-‐related data Seman2c Energy Informa2on Framework
Integrated Plaeorm
ELITE Federa&on engine
Ontology OWL-‐DL liteA
URSOS Input form
3D Maps
1
2
3 5
41. The user selects a building 2. The ID of the selected
building is used to retrieve the building parameters form the data sources using SPARQL: Cadastre Census Building typologies
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use Case 1 -‐ SME use of public Linked Energy Data
URSOS Energy calcula2on engine
GIS data
Census Cadastre Climate
Typology Socio-‐Economic
Energy-‐related data Seman2c Energy Informa2on Framework
Integrated Plaeorm
ELITE Federa&on engine
Ontology OWL-‐DL liteA
URSOS Input form
3D Maps
1
2
3 5
41. The user selects a building 2. The ID of the selected
building is used to retrieve the building parameters form the data sources using SPARQL: Cadastre Census Building typologies
prefix sumo: <http://www.ontologyportal.org/SUMO.owl#> prefix semanco: http://www.semanco-project.eu/2012/5/SEMANCO.owl# SELECT DISTINCT ?year WHERE { ?b a sumo:Building; semanco:hasAge [semanco:year_Of_ContructionValue ?year]; semanco:hasBuilding_Cadastral_Data [semanco:hasCadastral_Reference ?ref]. ?ref semanco:cadref1Value "2402012". }
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use Case 1 -‐ SME use of public Linked Energy Data
URSOS Energy calcula2on engine
GIS data
Census Cadastre Climate
Typology Socio-‐Economic
Energy-‐related data Seman2c Energy Informa2on Framework
Integrated Plaeorm
ELITE Federa&on engine
Ontology OWL-‐DL liteA
URSOS Input form
3D Maps
1
2
3 5
41. The user selects a building 2. The ID of the selected
building is used to retrieve the building parameters form the data sources using SPARQL: Cadastre Census Building typologies
prefix sumo: <http://www.ontologyportal.org/SUMO.owl#> prefix semanco: <http://www.semanco-project.eu/2012/5/SEMANCO.owl#> SELECT DISTINCT ?age ?to ?from WHERE { ?age a semanco:Age_Class . ?age semanco:hasTo_Year ?age_to_instance . ?age_to_instance semanco:toYearValue ?to . filter(?to >= '1885') . ?age semanco:hasFrom_Year ?age_from_instance2 . ?age_from_instance2 semanco:fromYearValue ?from . filter(?from <= '1885') . }
prefix sumo: <http://www.ontologyportal.org/SUMO.owl#> prefix semanco: http://www.semanco-project.eu/2012/5/SEMANCO.owl# SELECT DISTINCT ?year WHERE { ?b a sumo:Building; semanco:hasAge [semanco:year_Of_ContructionValue ?year]; semanco:hasBuilding_Cadastral_Data [semanco:hasCadastral_Reference ?ref]. ?ref semanco:cadref1Value "2402012". }
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use Case 1 -‐ SME use of public Linked Energy Data
URSOS Energy calcula2on engine
GIS data
Census Cadastre Climate
Typology Socio-‐Economic
Energy-‐related data Seman2c Energy Informa2on Framework
Integrated Plaeorm
ELITE Federa&on engine
Ontology OWL-‐DL liteA
URSOS Input form
3D Maps
1
2
3 5
41. The user selects a building 2. The ID of the selected
building is used to retrieve the building parameters form the data sources using SPARQL: Cadastre Census Building typologies
prefix sumo: <http://www.ontologyportal.org/SUMO.owl#> prefix semanco: <http://www.semanco-project.eu/2012/5/SEMANCO.owl#> SELECT DISTINCT ?age ?to ?from WHERE { ?age a semanco:Age_Class . ?age semanco:hasTo_Year ?age_to_instance . ?age_to_instance semanco:toYearValue ?to . filter(?to >= '1885') . ?age semanco:hasFrom_Year ?age_from_instance2 . ?age_from_instance2 semanco:fromYearValue ?from . filter(?from <= '1885') . }
prefix sumo: <http://www.ontologyportal.org/SUMO.owl#> prefix semanco: http://www.semanco-project.eu/2012/5/SEMANCO.owl# SELECT DISTINCT ?year WHERE { ?b a sumo:Building; semanco:hasAge [semanco:year_Of_ContructionValue ?year]; semanco:hasBuilding_Cadastral_Data [semanco:hasCadastral_Reference ?ref]. ?ref semanco:cadref1Value "2402012". }
prefix sumo: <http://www.ontologyportal.org/SUMO.owl#> prefix semanco: <http://www.semanco-project.eu/2012/5/SEMANCO.owl#> SELECT DISTINCT ?uvalue where { ?b semanco:hasSpace [ semanco:hasCS_Envelope [semanco:hasBottom_Floor ?bf]]; semanco:hasAge <http://www.semanco-project.eu/manresa/age_class/1>. ?bf semanco:hasBottom_Floor_U-value [semanco:bottom_Floor_U-valueValue ?uvalue]. ?bf semanco:hasBottom_Floor_Type [semanco:bottom_Floor_TypeValue "Bottom"]. }
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use Case 1 -‐ SME use of public Linked Energy Data
URSOS Energy calcula2on engine
GIS data
Census Cadastre Climate
Typology Socio-‐Economic
Energy-‐related data Seman2c Energy Informa2on Framework
Integrated Plaeorm
ELITE Federa&on engine
Ontology OWL-‐DL liteA
URSOS Input form
3D Maps
1
2
3 5
41. The user selects a building 2. The ID of the selected
building is used to retrieve the building parameters form the data sources using SPARQL: Cadastre Census Building typologies
prefix sumo: <http://www.ontologyportal.org/SUMO.owl#> prefix semanco: <http://www.semanco-project.eu/2012/5/SEMANCO.owl#> SELECT DISTINCT ?age ?to ?from WHERE { ?age a semanco:Age_Class . ?age semanco:hasTo_Year ?age_to_instance . ?age_to_instance semanco:toYearValue ?to . filter(?to >= '1885') . ?age semanco:hasFrom_Year ?age_from_instance2 . ?age_from_instance2 semanco:fromYearValue ?from . filter(?from <= '1885') . }
prefix sumo: <http://www.ontologyportal.org/SUMO.owl#> prefix semanco: http://www.semanco-project.eu/2012/5/SEMANCO.owl# SELECT DISTINCT ?year WHERE { ?b a sumo:Building; semanco:hasAge [semanco:year_Of_ContructionValue ?year]; semanco:hasBuilding_Cadastral_Data [semanco:hasCadastral_Reference ?ref]. ?ref semanco:cadref1Value "2402012". }
prefix sumo: <http://www.ontologyportal.org/SUMO.owl#> prefix semanco: <http://www.semanco-project.eu/2012/5/SEMANCO.owl#> SELECT DISTINCT ?uvalue where { ?b semanco:hasSpace [ semanco:hasCS_Envelope [semanco:hasBottom_Floor ?bf]]; semanco:hasAge <http://www.semanco-project.eu/manresa/age_class/1>. ?bf semanco:hasBottom_Floor_U-value [semanco:bottom_Floor_U-valueValue ?uvalue]. ?bf semanco:hasBottom_Floor_Type [semanco:bottom_Floor_TypeValue "Bottom"]. }
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use Case 1 -‐ SME use of public Linked Energy Data
Applying improvements. For example, renova2ng the exis2ng windows or replacing them with new ones
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use Case 1 -‐ SME use of public Linked Energy Data
Results aqer applying the improvement measures
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use Case 1 -‐ SME use of public Linked Energy Data
Projects can be compared with a mul2-‐criteria decision tool included in the plaeorm. Users can select the weight (importance) of the performance indicators.
Besides, other indicators defined by users can be included in the analysis, for example: foreseen funding.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use Case 1 -‐ SME use of public Linked Energy Data
Services: • Assessing the current energy performance of buildings in towns and cities. • Identifying priority areas and buildings for energy efficiency interventions. • Evaluating the impact of proposed new buildings at the urban level. • Evaluating the impact of refurbishing buildings at the urban level. • Evaluating the impact of potential of local policies and interventions on
Sustainable Energy Action Plans (SEAP). • Generating missing data to enable the classification of buildings according to
their energy performance. • Integrating new data to be visualized and analysed.
SERVICE PLATFORM TO SUPPORT PLANNING OF ENERGY EFFICIENT CITIES
www.eeciMes.com
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 2 -‐ The energy data portal Contributors: Filip Radulovic (UPM) The energy portal is a portal that integrates energy data and provides access to various energy indicators from different data sources (e.g., heaMng consumpMon, water consumpMon, electricity consumpMon for different geographical regions etc.). LD benefits: data integraMon, data search, shareable data, reusable data, extensible data, mulMlingual support, data discovery, transparency, and high degree of automaMon. Challenges: • Quality of data and metadata. • Inconsistency between different sources. • Wide variety of data formats. • Data provenance. • No open license associated with data. • Licenses complexity and diversity. • Tracking changes in data.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 2 -‐ The energy data portal
SEÍS example
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use Case 2 -‐ The energy data portal
Energy Model
Energy Performance Benchmarking
Examples of Energy Efficient Building Energy Efficient Design Paferns
Enter a Building SimulaMon
SEÍS system Data portal
Ontology Repository
Climate Geographic Monitoring CerMficaMon
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use Case 2 -‐ The energy data portal
Efficient buildings and their performance indicators divided in two groups: Energy (hea2ng and cooling demand, total primary energy, CO2 emissions) and Indoor Space (2me above and below comfort).
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use Case 2 -‐ The energy data portal
The informa2on to be uploaded is divided in five categories: Project Data, Building Proper2es, Outdoor Environment, Opera2on and Performance. The data uploaded through this service will be assigned to the terms of the ontology thus ensuring the compa2bility of the new data with the exis2ng data.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use Case 2 -‐ The energy data portal
This service calculates the value for the indicators for energy efficient and other buildings. The graphics display the minimun, maximum, and median values for each indicator and type of building. The values of the indicators of the selected buildings are shown in the orange circles.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 3 -‐ Energy data infographic Contributors: Filip Radulovic (UPM) Energy data infographic allows the visualizaMon of energy data, using a variety of visual analyMcs techniques, related to different aspects of these data, for example geographic regions, energy indicators LD benefits: data integraMon, mulMlingual support, data discovery, and transparency. Challenges: • Data provenance. • No open license associated with data. • ParMal or missing data. • Different data formats and structures. • Quality of data and metadata. • Inconsistency between different sources. • Deprecated data. • Data persistence. • Data integraMon from several resources with diverse licenses and formats.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 3 -‐ Energy data infographic
A script to create charts based on the results of a SPARQL query using Google Visualiza2on API: Example 1
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 3 -‐ Energy data infographic
A script to create charts based on the results of a SPARQL query using Google Visualiza2on API: Example 2
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
A script to create charts based on the results of a SPARQL query using Google Visualiza2on API: Example 3
Generic Linked Data use cases Use Case 3 -‐ Energy data infographic
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 4 -‐ Energy consumpMon map Contributors: Filip Radulovic (UPM) Energy consumpMon data from various sources and regions can be interacMvely shown on a geographic map LD benefits: data integraMon, data discovery, and transparency. Challenges: • Data provenance. • No open license associated with data. • Quality of data and metadata. • Inconsistency between different sources. • Deprecated data. • Different data formats and structures..
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 4 -‐ Energy consumpMon map
Energy indicators shown in a geographical map. Leq: SEMANCO plaeorm with neighborhood boundaries. Right: SEIS system with a heat map of energy efficiency buildings.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 5 -‐ Linked Energy Data quality improvement Contributors: Filip Radulovic (UPM) A large number of Linked Data datasets is being published on the web, and it can be expected that the published data are prone to errors, which can originate either in the Linked Data generaMon process or in the original data source LD benefits: reusable data, extensible data, and transparency. Challenges: • Enabling user feedback and updates. • Machine-‐readable descripMon of data quality improvements. • ParMal or missing data. • Tracking changes in data. • ReincorporaMng such improvements.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 6 -‐ Energy situaMon awareness Contributors: Filip Radulovic (UPM) The collecMon and analysis of Linked Energy Data may provide insight into situaMons that might lead to technical or environmental hazards and that would require human acMon LD benefits: sharable data, data discovery, and transparency. Challenges: • Quality of data and metadata. • Inconsistency between different sources. • Wide variety of data formats. • Data provenance. • Deprecated data.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 6 -‐ Energy situaMon awareness
Newcastle case study: Fuel poverty issue visualized and quan2fied through the SEMANCO plaeorm.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 6 -‐ Energy situaMon awareness
Energy data analysis
RDF dataset Sparql queries
Machine learning methods
New insights
Decision making
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 6 -‐ Energy situaMon awareness
Rapidminer a suit for implemen2ng Machine Learning methods using a graphic interface. Includes Weka repository (clustering, regression, classifica2on…)
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 6 -‐ Energy situaMon awareness
Operators for retrieving RDF data from an endpoint with SPARQL queries
SELECT disMnct ?uprn ?label ?suburb ?gas1213 WHERE { ?feature <hfp://schema.org/containedIn> [rdf:label ?suburb]. ?feature rdf:label ?label. ?feature lcc:uprn ?uprn. ?observaMon ssnx:featureOfInterest ?feature. ?observaMon ssnx:observedProperty [rdf:label "Gas"]. ?observaMon ssnx:observaMonResult
[ssnx:hasValue [lcc:hasQuanMtyValue ?gas1213]]. ?observaMon ssnx:observaMonSamplingTime [rdf:label ?Mme]. FILTER regex(?Mme, "2012/2013"). }
Gas consump2on 12/13
Gas consump2on 11/12
Gas consump2on 10/11
Electricity consump2on 12/13
Electricity consump2on 11/12
Electricity consump2on 10/11
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 6 -‐ Energy situaMon awareness
Operators for joining the results of the queries
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 6 -‐ Energy situaMon awareness
Operators for cleaning the data items
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 6 -‐ Energy situaMon awareness
Operator for clustering examples based on their proper2es
Minimum number of clusters
Maximum number of clusters
Methods for calcula2ng the distance between items (similitude)
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 6 -‐ Energy situaMon awareness
Operator for preparing the final results
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Guidelines for the ExploitaMon
Generic Linked Data use cases Use Case 6 -‐ Energy situaMon awareness
Cluster 0
Cluster 1
Cluster 2
The buildings of each cluster have similar performance over the period 2010-‐2013. " The City Council can propose a public funding for refurbishing the bad performing buildings based on the results of the clustering
Results of the process: 4 clusters
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Index
• Requirements • Guidelines for the PublicaMon of Linked Data • Guidelines for the ExploitaMon of Linked Data • Hands-‐on Session
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
What are we going to do?
Specification
Modelling
Generation Publication
Exploitation
Linking
106
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
PublicaMon Index
1. Ensure legal compliance 2. Publish the dataset and the ontology 3. Publish metadata and online documentaMon 4. Enable dataset discovery 5. Dataset promoMon 6. Dataset support
107
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Ensure Legal compliance
• If the source of your dataset permits the use of data then the RDF dataset has to comply the license and legal terms of the original data source.
• The main issue to take into account is the data privacy. That is: private enMMes described in the dataset cannot be idenMfied. (e.g. people)
108
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Ensure Legal compliance
• TASK 1: Check if your dataset preserves privacy No explicit iden&fiers are used as literals and in URIs (e.g. Na&onal ID numbers, credit card numbers….)
109
In the case your dataset does not preserves privacy go back to RDF genera2on session and apply the following anonymiza2on techniques: -‐ Suppression -‐ Generaliza2on -‐ Anatomiza2on -‐ Perturba2on -‐ Aggrega2on
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Publish the dataset and the ontology
• Make accessible through the Web the ontology and the RDF dataset (also the links to other datasets) following Linked Data principles.
– To store the RDF data into a persistent repository – To enable resolvable HTTP URIs and content negoMaMon – To enable a SPARQL HTTP endpoint
110
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Publish the dataset and the ontology
• TASK 2: To store the RDF data into a persistent repository Upload the RDF dataset and the ontology to the OpenLink Virtuoso server
111
Deliverable: a report with the result of some SPARQL queries using the Virtuoso server endpoint
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Publish the dataset and the ontology
• TASK 2: To store the RDF data into a persistent repository Upload the RDF dataset and the ontology to the OpenLink Virtuoso server
112
Op2on a) Using conductor (Web interface) for small datasets ( < 10 MB)
RDF file in RDF/XML format
Name of the graph (e.g name of the dataset)
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Publish the dataset and the ontology
• TASK 2: To store the RDF data into a persistent repository Upload the RDF dataset and the ontology to the OpenLink Virtuoso server
113
Op2on b) Using Virtuoso console for big datasets ( > 20 MB)
1. Store your RDF dump file in RDF/XML format in a folder of the server where Virtuoso is installed 2. Go to ISQL console 3. Invoke this method:
DB.DBA.RDF_LOAD_RDFXML_MT (file_to_string_output (‘path_to/rdf_dump.rdf'),'', '{your_graph_name}'); 4. In case you need, clear the graph with this method:
SPARQL CLEAR GRAPH <{your_graph_name}>;
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Publish the dataset and the ontology
• TASK 3: To enable resolvable HTTP URIs and content negoMaMon Install Pubby to enable resolvable HTTP URIs
114
Deliverable: dataset with resolvable URIs
op2onal
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Publish the dataset and the ontology
• TASK 3: To enable resolvable HTTP URIs and content negoMaMon Install Pubby to enable resolvable HTTP URIs
115
PUBBY configura2on Go to: tomcat/webapps/pubby/WEB-‐INF/config.n3
op2onal
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Publish metadata and online documentaMon
116
• Create and publish the documentaMon of the RDF dataset and the ontology. This documentaMon is oriented to both human and machine users and its purpose is to facilitate the usage of the dataset that is being made available.
– DescripMon of the datasets in DCAT and VoID vocabularies – Human-‐readable documentaMon of dataset and ontology
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Publish metadata and online documentaMon
• TASK 4: To describe the dataset in DCAT /VoID vocabularies
117
Deliverable: a DCAT or VoID file describing your dataset
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Publish metadata and online documentaMon
• TASK 4: To describe the dataset in DCAT /VoID vocabularies Following the DCAT example:
118
@prefix os: <hfp://a9.com/-‐/spec/opensearch/1.1/> . @prefix dct: <hfp://purl.org/dc/terms/> . @prefix xsd: <hfp://www.w3.org/2001/XMLSchema#> . @prefix api: <hfp://purl.org/linked-‐data/api/vocab#> . @prefix rdf: <hfp://www.w3.org/1999/02/22-‐rdf-‐syntax-‐ns#> . @prefix xhv: <hfp://www.w3.org/1999/xhtml/vocab#> . <h`p://your.dataset.com/> a dct:Dataset ; dct:license <h`p://purl.org/NET/rdflicense/ukogl1.0> ; dct:source “Descrip2on of the dataset source" ; <hfp://www.w3.org/2002/07/owl#sameAs> <h`p://datahub.io/dataset/XXX> . dct:publisher “The publisher of the dataset”; dct:language <h`p://id.loc.gov/vocabulary/iso639-‐1/en> ; dct:accrualPeriodicity <h`p://purl.org/linked-‐data/sdmx/2009/code#freq-‐W> ;
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Publish metadata and online documentaMon
• TASK 4: To describe the dataset in DCAT /VoID vocabularies Following the VoID example:
119
@prefix rdf: <hfp://www.w3.org/1999/02/22-‐rdf-‐syntax-‐ns#> . @prefix rdfs: <hfp://www.w3.org/2000/01/rdf-‐schema#> . @prefix foaf: <hfp://xmlns.com/foaf/0.1/> . @prefix dcterms: <hfp://purl.org/dc/terms/> . @prefix void: <hfp://rdfs.org/ns/void#> . @prefix xsd: <hfp://www.w3.org/2001/XMLSchema#> . ## your dataset <h`p://your.dataset.com/> rdf:type void:Dataset ; foaf:homepage <h`p://your.dataset.com/homepage> ; dcterms:Mtle “Title of your dataset" ; dcterms:descripMon “Descrip2on of your dataset." ; void:sparqlEndpoint <h`p://your.dataset.com/sparql> ; void:uriSpace "h`p://your.dataset.com/resource/"; void:exampleResource <h`p://your.dataset.com/resource/URI/XXXX> . dcterms:source " Descrip2on of the dataset source" ; dcterms:created “XXXX-‐XX-‐XX"^^xsd:date; dcterms:license <h`p://crea2vecommons.org/licenses/by/3.0/> dcterms:subject <h`p://dbpedia.org/resource/Building>;
void:triples 150297 ; void:enMMes 18890 ; void:classes 65 ; void:properMes 100 ; void:disMnctSubjects 18962 ; void:disMnctObjects 26097 ; ## datasets you link to :Anotherdataset rdf:type void:Dataset ; foaf:homepage < h`p://another.dataset.com/homepage> ; dcterms:Mtle “Another 2tle" ; dcterms:descripMon “Another descrip2on." ; void:exampleResource < h`p://another.dataset.com/resource/URI/XXXX > . :Yourdataset-‐Anotherdataset rdf:type void:Linkset ; void:linkPredicate <h`p://your.dataset.com/predicate used for linking> ; void:target <h`p://your.dataset.com/> ; void:target :Anotherdataset .
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Publish metadata and online documentaMon
• TASK 4: To describe the dataset in DCAT /VoID vocabularies
120
Resources: -‐ hSp://www.w3.org/TR/vocab-‐dcat/ -‐ hSp://www.w3.org/TR/void/ -‐ hSps://code.google.com/p/void-‐impl/wiki/SPARQLQueriesForSta&s&cs
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Publish metadata and online documentaMon
• TASK 5: To create human-‐oriented documentaMon With Widoco
121
Deliverable: a HTML document describing your ontology
op2onal
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Publish metadata and online documentaMon
• TASK 5: To create human-‐oriented documentaMon With Widoco
122
1. Setup the config file: config/config.proper&es
2. Invoke this method:
java -‐jar widoco-‐0.0.1-‐jar-‐with-‐dependencies.jar -‐ontFile {you_ontology_file.owl}
op2onal
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Enable dataset discovery
123
• To enable the mechanisms to allow both human and machines to discover and befer use the dataset.
– To create a sitemap to inform search engines about the page structure.
– To register the dataset in dataset catalogues (READY4SmartCiMes, Datahub, Reegle, OpenEI…)
– To ensure the fulfilment of requirements for addiMon to the LOD cloud.
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Enable dataset discovery
• TASK 6: To create a sitemap With sitemap4rdf
124
Deliverable: a XML document with the site map of your dataset
op2onal
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Enable dataset discovery
• TASK 6: To create a sitemap With sitemap4rdf
125
1. Invoke this method:
sitemap4rdf {your_sparql_endpoint} {prefix_of your_url}
op2onal
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Enable dataset discovery
• TASK 7: To register the dataset in dataset catalogues In READY4SmartCi&es, Datahub, Reegle, OpenEI
126
Deliverable: a new record in a dataset catalogue for your dataset
op2onal
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Enable dataset discovery
• TASK 7: To register the dataset in dataset catalogues In READY4SmartCi&es, Datahub, Reegle, OpenEI
127
1. Go to : hfp://smartcity.linkeddata.es/datasets/ 2. Click on through a detailed form and fill the form
op2onal
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Enable dataset discovery
• TASK 8: To ensure the fulfilment of requirements for addiMon to the LOD cloud Using Data Hub LOD Datasets
Deliverable: a report describing the level of fulfilment of the LOD requirements of your dataset
op2onal
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Enable dataset discovery
• TASK 8: To ensure the fulfilment of requirements for addiMon to the LOD cloud Using Data Hub LOD Datasets
129
1. Go to : hfp://validator.lod-‐cloud.net/ 2. Validate your dataset (previously uploaded in Data Hub repository) using the name of the dataset
op2onal
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
What are we going to do?
Specification
Modelling
Generation Publication
Exploitation
Linking
130
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
ExploitaMon Index
1. Define a use case 2. Use your data
131
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Define a use case
• To describe a use case for energy data exploitaMon.
– To define the use case for data exploitaMon and its requirements
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
• TASK 9: To select an exisMng use case or define a new use case for your dataset
Deliverable: a detailed descrip2on of a use case for your dataset.
Define a use case
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
• TASK 9: To select an exisMng use case or define a new use case for your dataset
Define a use case
Fill the template: • Use case 2tle: • Reference: • Use case descrip2on: • Objec2ves: • Domain: • Stakeholders: • Requirements: • Linked Data benefits: • Challenges: • External sources:
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Use your data
• To implement the use case.
– To provide a report/visualizaMon for your data
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
• TASK 10: To provide a report/visualizaMon for your data
Deliverable: a report/visualiza2on in HTML/PDF explaining something about your use case using your dataset and the links to other datasets.
Define a use case
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
• TASK 10: To provide a report/visualizaMon for your data
Define a use case
Materials/sparqlhtml/Example 1.html
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Define a use case
Materials/sparqlhtml/Example 2.html
• TASK 10: To provide a report/visualizaMon for your data
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Define a use case
Materials/sparqlhtml/Example 3.html
• TASK 10: To provide a report/visualizaMon for your data
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Define a use case
Materials/LodLive
• TASK 10: To provide a report/visualizaMon for your data
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Define a use case
• TASK 10: To provide a report/visualizaMon for your data
Materials/rapidminer/rapidminer_clustering_leeds.xml
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
Define a use case: summary
• TASK 1: Check if your dataset preserves privacy
• TASK 2: To store the RDF data into a persistent repository
• TASK 3: To enable resolvable HTTP URIs and content negoMaMon
• TASK 4: To describe the dataset in DCAT /VoID vocabularies
• TASK 5: To create human-‐oriented documentaMon
• TASK 6: To create a sitemap
• TASK 7: To register the dataset in dataset catalogues
• TASK 8: To ensure the fulfilment of requirements for addiMon to the LOD cloud
• TASK 9: To select an exisMng use case or define a new use case for your dataset
• TASK 10: To provide a report/visualizaMon for your data
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
References / Bibliography
The contents of the presentaMon are based on: • F. Radulovic, R. García-‐Castro, V. Suero, T. Tryferidis, K,Z, Tsagkari, D.
Meskos. Deliverable D4.2: Requirements and guidelines for energy data publicaMon. Technical report, READY4SmartCiMes ConsorMum. (January 2015)
• F. Radulovic, R. García-‐Castro, M. Poveda. Deliverable D4.3: Requirements and guidelines for energy data exploitaMon. Technical report, READY4SmartCiMes ConsorMum. (January 2015)
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
References / Bibliography
References • EECITIES -‐ hfp://www.eeciMes.com
• SEÍS system -‐ hfp://www.seis-‐system.org
LD4SC Summer School 7th -‐ 12th June, Cercedilla, Spain
References / Bibliography
Links • Virtuoso Server -‐ hfps://github.com/openlink/virtuoso-‐opensource • Pubby -‐ hfp://wifo5-‐03.informaMk.uni-‐mannheim.de/pubby/ • Elda -‐ hfps://github.com/epimorphics/elda • Puelia -‐ hfps://code.google.com/p/puelia-‐php/ • LDP4j -‐ hfp://www.ldp4j.org/ • Apache Marmofa -‐ hfp://marmofa.apache.org/ • Sitemap4rdf -‐ hfp://lab.linkeddata.deri.ie/2010/sitemap4rdf/ • LOD cloud validator -‐ hfp://validator.lod-‐cloud.net • SPARQL proxy: hfps://logd.tw.rpi.edu/ws/sparqlproxy.php • Google VisualizaMon API -‐ hfps://developers.google.com/chart/ • Rapidminer -‐ hfps://rapidminer.com/products/studio/ • Widoco -‐ hfps://github.com/dgarijo/Widoco • LOD live app – hfp://en.lodlive.it