How to Publish Open Data
-
Upload
richard-cyganiak -
Category
Technology
-
view
3.887 -
download
5
description
Transcript of How to Publish Open Data
Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
Digital Enterprise Research Institute www.deri.ie
How to publish Open Data
Richard CyganiakOpening Up Government Data – Galway, 8 Nov 2011
[email protected]://www.StefanDecker.org/
Digital Enterprise Research Institute www.deri.ie
TimBL’s 5-star plan for open data
★ Make your stuff available on the Web
★★ Make it available as structured data(e.g., an Excel sheet instead of image scan of a table)
★★★ Use a non-proprietary format(e.g., a CSV file instead of an Excel sheet)
★★★★ Use linked data format(i.e., URIs to identify things, and RDF to represent data)
★★★★★ Link your data to other people’s data to provide contextSource: http://inkdroid.org/journal/2010/06/04/the-5-stars-of-open-linked-data/
Digital Enterprise Research Institute www.deri.ie
Five-shamrock scheme
Digital Enterprise Research Institute www.deri.ie
Five-shamrock scheme
1. Publish data on the web
Digital Enterprise Research Institute www.deri.ie
Five-shamrock scheme
1. Publish data on the web
2. Publish data in a machine-processable format
Digital Enterprise Research Institute www.deri.ie
Five-shamrock scheme
1. Publish data on the web
2. Publish data in a machine-processable format
3. Use an open standard format
Digital Enterprise Research Institute www.deri.ie
Five-shamrock scheme
1. Publish data on the web
2. Publish data in a machine-processable format
3. Use an open standard format
4. Publish under an open license
Digital Enterprise Research Institute www.deri.ie
Five-shamrock scheme
1. Publish data on the web
2. Publish data in a machine-processable format
3. Use an open standard format
4. Publish under an open license
5. List your data in a data catalog
Digital Enterprise Research Institute www.deri.ie
1. Publish data on the web
Digital Enterprise Research Institute www.deri.ie
Why?
The web is where people look for it first Google can index it Less phone calls and emails (and FoI requests) to
answer
Digital Enterprise Research Institute www.deri.ie
Lots of data is already there
Databases Reports Spreadsheets Maps
Digital Enterprise Research Institute www.deri.ie
2. Publish data in a machine-processable
format
Digital Enterprise Research Institute www.deri.ie
Why?
Allow others to do their own processing, analysis and visualisation of your data
New services, new ideas
Digital Enterprise Research Institute www.deri.ie
Examples
CSO Quarterly National Household Survey http://cso.ie/qnhs/calendar_quarters_qnhs.htm
EPA enforcement files and ScraperWiki http://www.epa.ie/whatwedo/enforce/lic/info/ https://views.scraperwiki.com/run/irish-epa-visuals/
Galway and Fingal planning applications http://lab.linkeddata.deri.ie/2010/planning-apps/ Getting the data: 210 lines of code vs. 30 lines of code
Digital Enterprise Research Institute www.deri.ie
Symptom: screenscraping
People use tools like ScraperWiki to get at data that isn't machine-readable https://scraperwiki.com/tags/ireland
Scraping is not the right way of doing this Expensive Brittle Strain on computing resources
Digital Enterprise Research Institute www.deri.ie
Formats
Good: MS Excel, CSV, XML, JSON, Microdata Not so good: Pure websites, MS Word Bad: PDF Really bad: Only charts/maps without numbers
Digital Enterprise Research Institute www.deri.ie
Good practices
Publish in multiple formats, at least one machine-readable
Publish Excel files alongside large PDF reports Publish CSV alongside database-backed web
applications
Digital Enterprise Research Institute www.deri.ie
3. Use an open standard format
Digital Enterprise Research Institute www.deri.ie
Why?
Not all formats are created equal Some formats bring many tools and applications
that people can already use
Digital Enterprise Research Institute www.deri.ie
Quick tour of formats
CSV – Comma-Separated Values More open (and simpler) alternative to Excel format Can be opened in and exported from Excel, Google
Spreadsheets, Google Refine, … KML – Keyhole Markup Language
Simple format for presenting geographic data Can be opened in Google Maps
RSS – Really Simple Syndication Notifications of updates of any kind Can be opened in RSS readers and many email clients
Digital Enterprise Research Institute www.deri.ie
Developer-oriented formats
XML – Extensible Markup Language W3C (World Wide Web Consortium) standard, 1997 established, reliable, ubiquitous
JSON – Javascript Object Notation IETF (Internet Engineering Task Force) standard, 2006 great for web APIs very simple; very fashionable right now
RDF – Resource Description Framework W3C standard, 2004 great for data integration steeper learning curve
Digital Enterprise Research Institute www.deri.ie
Also: standard classifications
Within your data, use the same categories as everybody else
CSO http://www.cso.ie/surveysandmethodologies/
classifications_stan.htm StatCentral list of classifications
http://www.statcentral.ie/classifications.asp
Digital Enterprise Research Institute www.deri.ie
Also: standard identifiers
Example: School roll numbers Department of Education publishes an Excel file with all
school roll numbers Can be used to Google the same school on other
websites, school evaluation reports etc Example: Ordnance Survey UK geo identifiers
Uses URIs (web addresses) as identifiers http://data.ordnancesurvey.co.uk/doc/7000000000037256 Great for use in RDF
Digital Enterprise Research Institute www.deri.ie
Linked Open Data Cloud
Digital Enterprise Research Institute www.deri.ie
Summary
Prefer open, widely used standards But: also prefer what you know best Support multiple formats for different audiences
where it makes sense Great: CSV, KML, RSS, XML, JSON
Digital Enterprise Research Institute www.deri.ie
4. Publish under anopen license
Digital Enterprise Research Institute www.deri.ie
Why?
Regulates what others can and cannot do with the data
For re-users, uncertainty about rights is a major concern
A good way to ensure that your organisation gets acknowledged
You need some non-discriminatory policy for giving rights to the data anyway (PSI directive)
Digital Enterprise Research Institute www.deri.ie
Complex topic
Destroying a potential income stream? Content licenses vs database licenses Mixing and compatibility of licenses
Wikipedia, OpenStreetMap
Digital Enterprise Research Institute www.deri.ie
Irish PSI License
Created in response to PSI Directive Available at http://psi.gov.ie/ Problems: Documents may not be used “for the
principal purpose of advertising or promoting a particular product or service” Can't be combined with Wikipedia or OpenStreetMap
Not an open license according to Open Definition http://opendefinition.org/
Digital Enterprise Research Institute www.deri.ie
Open database licenses
http://opendefinition.org/licenses/
Digital Enterprise Research Institute www.deri.ie
License features
You're allowed to do pretty much anything, provided you…
Attribution (“By”) – give credit ShareAlike (“SA”) – adapted data must be
published in the same way
Digital Enterprise Research Institute www.deri.ie
Does Open Data have to be free?
Many would say yes A matter of terminology and definitions Either way there is nothing wrong with charging
for certain data
Digital Enterprise Research Institute www.deri.ie
Data protection
Personal information is not open data Freedom of Information legislation
http://foi.gov.ie/
Digital Enterprise Research Institute www.deri.ie
Summary
Stating an explicit license is important Irish PSI License: It's readily available, but not
“open enough” for some applications Open Data Commons licenses with various
constraints
Digital Enterprise Research Institute www.deri.ie
5. List your data in adata catalog
Digital Enterprise Research Institute www.deri.ie
Why?
So that people know it exists This is how the world learns about available data This is how you learn what they do and need
Digital Enterprise Research Institute www.deri.ie
Some key information about a dataset
What data is being published? What's the license? When was the data collected? When will it be updated, if at all? How was/is this data collected? What was/is the data used for? Contact person? Where to give feedback?
Digital Enterprise Research Institute www.deri.ie
How to do this in practice?
Have a simple page on your website Use an open community data catalog Set up your own catalog Use a national Irish data catalog???
Digital Enterprise Research Institute www.deri.ie
Open community catalogs
The Data Hub http://thedatahub.org
Irish CKAN http://ie.ckan.net
Digital Enterprise Research Institute www.deri.ie
Set up your own catalog
Requires a budget Roll your own software?
data.fingal.ie Use open source, e.g., CKAN?
data.gov.uk Berlin Open Data …
Digital Enterprise Research Institute www.deri.ie
National Irish data catalog?
CSO's StatCentral? Marine Institute's ISDE? Who publishes the catalog in other countries?
UK: Cabinet Office US: White House Australia: Dept of Finance and Deregulation New Zealand: Dept of Internal Affairs
Digital Enterprise Research Institute www.deri.ie
Summary
Data catalogs make it easy to find data Basic metadata, how to give feedback etc Important: How often are datasets accessed? “Request a dataset” feature Also: Open Data Ireland Google Group
http://groups.google.com/group/open-data-ireland
Digital Enterprise Research Institute www.deri.ie
Five-shamrock scheme
1. Publish data on the web
2. Publish data in a machine-processable format
3. Use an open standard format
4. Publish under an open license
5. List your data in a data catalog