Linked Statistical Data 101
-
Upload
oscar-corcho -
Category
Government & Nonprofit
-
view
55 -
download
0
Transcript of Linked Statistical Data 101
Linked Statistical Data 101ESS Workshop on dissemination of official
statistics as open data18-19 January 2017, Malta
Oscar CorchoEscuela Técnica Superior de Ingenieros Informáticos
Universidad Politécnica de MadridCampus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net/ [email protected]
2
Contents
• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular
• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background- W3C RDF DataCube
• Preparing the discussion on benefits for different types of stakeholders
3
Contents
• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular
• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background- W3C RDF DataCube
• Preparing the discussion on benefits for different types of stakeholders
What is Open Data?
• Open data is data that can be freely used, re-used and redistributed by anyone - subject only, at most, to the requirement to attribute and share alike
• Key aspects:- Availability and access: the data must be available as a
whole and at no more than a reasonable reproduction cost, preferably by downloading over the Internet. The data must also be available in a convenient and modifiable form.
- Re-use and redistribution: the data must be provided under terms that permit re-use and redistribution including the intermixing with other datasets.
- Universal participation: everyone must be able to use, re-use and redistribute - there should be no discrimination against fields of endeavour or against persons or groups
[source: Open Data Handbook, http://opendatahandbook.org/en/what-is-open-data/ ]
Relevant Legislation. Europe and Spain
• Open Access Initiative (2001). Scientific information; > 510 orgs• Aarhus Convention (1998). Right to participate and access; 41
countries and the EU• PSI Directives. PSI reuse (2003/98/EC and 2013/37/UE)• Convention about access to official documentation (2009)
- 12 countries
• Law 37/2007. PSI reuse (transposition of directive 2003/98/EC)- Modified in law 18/2015 (BOE 10/07/2015, directive 2013/37/UE )
• Law 11/2007. Citizen access to public services, and rights to good quality services
• RD 4/2010 Esquema Nacional de Interoperabilidad- Open standards, technology neutral, open source
• RD 1495/2011 It develops Law 37/2007 for national agencies• Norma Técnica de Interoperabilidad (19/02/2013, BOE 4/3/2013)
[source: based on a presentation from Antonio Rodríguez Pascual (CNIG)]
An Explosion of Open Data Portals
Open Data and how to publish it
1) In a posterboard- For those with a lot of free time available- Or those who happen to be there at the right time
Adapted from: Antonio Rodríguez Pascual (IGN)
Open Data and how to publish it
2) On a Web page or mobile app- For people, but not downloadable
Adapted from: Antonio Rodríguez Pascual (IGN)
Open Data and how to publish it
3) In files- These can be downloaded and use by humans in
information systems (XML, HTML, CSV, GTFS, etc.)- Luckily, it is not a scanned PDF
Adapted from: Antonio Rodríguez Pascual (IGN)
Open Data and how to publish it
4) Via Web Services- They can be used by systems (sometimes persons)- They allow generating added value- Ease of integration in the application logic
Adapted from: Antonio Rodríguez Pascual (IGN)
All together…, Shaken, not stirred…
What is Linked Data?
1. Use URIs to identify rsources
2. Use HTTP URIs, so that they can be found
3. Use de-referenceable URIs, that is, provide useful data (RDF, JSON, SPARQL)
4. Include links to other URIs.
• http://www.w3.org/DesignIssues/LinkedData.html
Open Data and how to publish it
5) Via APIs (semantically enhanced) and linked- To be used by systems (and sometimes persons)- It allows generating added-value services- Standardised formats (JSON, JSON-LD, RDF)- Standardised models (vocabularies, ontologies)
Difficult to reuse
√ Reusable. Not open
√ Reusable, open Difficult to link together
√ Reusable, open, complete, easier to link
Data representation formats
And many more: JSON, JSON-LD, Shapefiles, KMZ, KML, PC-Axis, etc.
Recap: The 5-star categorisation from TBL
16
Contents
• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular
• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background- W3C RDF DataCube
• Preparing the discussion on benefits for different types of stakeholders
INFRASTRUCTURE
MICRODATA
MACRODATA
i
Cartography, streets,directories, codes…
ANALYSTSJOURNALISTS
CITIZENS
RESEARCHERS
NON
PUBL
IC
PUBL
ICMETADATA
Which type of data and which (re)users?
[source: Alberto González Yanes (ISTAC)]
18
Our use case: Aragón
• IAEST - Instituto Aragonés de
Estadística
• Good open data ecosystem- Aragón Open Data
• http://opendata.aragon.es/ - Zaragoza
• http://datos.zaragoza.es/
Reports and templates from Oracle BI
Current Web application for local statistics
Statistics about municipalities
Statistics about municipalities
• At IAEst Web- http://www.aragon.es/DepartamentosOrganismosPublicos/In
stitutos/InstitutoAragonesEstadistica/AreasGenericas/ci.EstadisticaLocal.detalleDepartamento
• At OpenDataAragón- http://opendata.aragon.es/catalogo/edificios-superficie-y-vivi
enda-comarcas
Reports and templates from Oracle BI
Current Web application for local statistics
What have we done?
SPARQL
Elda
Linked Data
Transformation process
API
Publication process
General architecture
This is not the purpose of my talk
https://github.com/aragonopendata/local-data-aragopedia
URIs for datasets
• Let’s look for the dataset on “Number of homes per owner per municipality”- Número de hogares por tipo de propietario por municipio
• The dataset has a URI- http://opendata.aragon.es/recurso/iaest/dataset/01-
010013TM
24
What is behind that URI?
This is not the purpose of my talk
25
URIs for each observation
• And now we can point to specific observations in this dataset- In 2001, the number of buildings owned by one person in the
municipality of Ilche• http://opendata.aragon.es/recurso/iaest/observacion/01-010013
TM/00794aab-964f-35c7-8e7c-156c9bc60133
26
URIs for each observation
27
And links to other URIs in Aragón
• The municipality of Ilche- http://opendata.aragon.es/recurso/territorio/Municipio/Ilche - This information is owned by another department of the
Government of Aragón
28
And links to codelists
• Types of owners- http://opendata.aragon.es/kos/iaest/clase-de-propietario
• The community• A person• A society• A public organisation
SPARQL endpoint
The women population in Zaragoza in the age range of 0-15 years growed until 2013 and then reduced
select distinct ?year ?personaswhere { ?x a qb:Observation . ?x qb:dataSet <http://opendata.aragon.es/recurso/iaest/dataset/03-030005TM> . ?x <http://purl.org/linked-data/sdmx/2009/dimension#refPeriod> ?year . ?x <http://purl.org/linked-data/sdmx/2009/dimension#refArea>
<http://opendata.aragon.es/recurso/territorio/Municipio/Zaragoza>. ?x <http://opendata.aragon.es/def/iaest/dimension#edad-grandes-grupos>
<http://opendata.aragon.es/kos/iaest/edad-grandes-grupos/0-a-15> . ?x <http://opendata.aragon.es/def/iaest/dimension#sexo>
<http://opendata.aragon.es/kos/iaest/sexo/mujeres>. ?x <http://opendata.aragon.es/def/iaest/medida#personas> ?personas .} ORDER BY ?year
Examples at https://github.com/aragonopendata/local-data-aragopedia/blob/master/consultas.md
30
Contents
• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular
• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background- W3C RDF DataCube
• Preparing the discussion on benefits for different types of stakeholders
W3C Data Cube
3131
http://www.w3.org/TR/vocab-data-cube/
W3C Data Cube
3232
DataSets and Observations
33
34
Observations in a dataset
qb:DataSet
qb:Observation
qb:dataSet
rdf:type
iaest-data:01-010003M/22001/030-045 aod:Abiego
sdmx:refArea
Iaest-codelist:superficie030-045
iaest:superficieUtil
“1”^^xsd:int
Iaest:numeroHogares
iaest:01-010003M
qb:dataSetrdf:type
DataCube Structure Definition
35
36
Describing the dataset
qb:DataSet
qb:DataStructureDefinition
qb:ComponentSpecification
qb:ComponentProperty
sdmx:refArea
iaest:superficieUtil
qb:structure qb:component qb:componentProperty
rdf:type rdf:type
iaest:01-010003M iaest--dsd:01-010003M
qb:structure qb:component
qb:measureiaest:numeroHogares
qb:dimension
qb:dimension
rdf:typerdf:type
37
Dimensions
qb:DataSet
qb:DataStructureDefinition
rdfs:rangeqb:concept
qb:DimensionProperty
qb:MeasureProperty
qb:Observation
esadm:Municipio
Iaest:SuperficieUtil
qb:ComponentSpecification
qb:ComponentProperty
rdfs:subClassOf
qb:dataSet
iaest:numeroHogaressdmx:refAreaiaest:superficieUtil
rdf:type rdf:type
rdfs:range
xsd:int
rdfs:range
qb:structure qb:component
qb:componentProperty
38
SKOS Codelists
rdfs:subClassOf
sdmx:CodeList
skos:Concept
skos:ConceptScheme
iaest:SuperficieUtil
qb:codeListIaest-codelist:SuperficieUtil
rdf:type
Iaest-codelist:superficie030-045
skos:hasTopConceptrdf:type
Iaest-codelist:superficie046-060
Iaest-codelist:superficie180-mas
…
39
Contents
• Foundations of (Linked) Open Data- For public administrations, in general- For statistical offices, in particular
• Linked Statistical Data by example- A use case from IAEST (Aragón Statistical Office)
• A bit of technical background- W3C RDF DataCube
• Preparing the discussion on benefits for different types of stakeholders
Why Linked Statistical Data? (I)
• Facilitate data (re)use by developers outside our organisation• Data access APIs (according to standards)• Do they prefer CSVs, PCAxis, SDMX, RDF?• Fine-grained data granularity (refer to specific facts)
• Integration with other data sources from other public or private organisations- E.g., Government of Aragón for municipalities
• Allow for queries across datasets- E.g., tell me how many municipalities may benefit from this
funding that I am making available with these restrictions: number of registered companies lower than 5 and unemployed population higher than 15%
41
Why Linked Statistical Data? (II)
• Internal benefits as well- Codelists are made available and more visible internally
- Methodology and metadata explicitly described as part of the RDF DataCube data (e.g., reference years in datasets)
Linked Statistical Data 101ESS Workshop on dissemination of official
statistics as open data18-19 January 2017, Malta
Oscar CorchoEscuela Técnica Superior de Ingenieros Informáticos
Universidad Politécnica de MadridCampus de Montegancedo sn, 28660 Boadilla del Monte, Madrid
http://www.oeg-upm.net/ [email protected]