How to Publish Open Data

43
Copyright 2010 Digital Enterprise Research Institute. All rights reserved. Digital Enterprise Research Institute www.deri.i e How to publish Open Data Richard Cyganiak Opening Up Government Data – Galway, 8 Nov 2011 [email protected] http://www.StefanDecker.org/

description

A practical guide to publishing open data, presented at the Galway event of Irish Open Data Week 2011. Introducing the “five-shamrock scheme”!

Transcript of How to Publish Open Data

Page 1: How to Publish Open Data

Copyright 2010 Digital Enterprise Research Institute. All rights reserved.

Digital Enterprise Research Institute www.deri.ie

How to publish Open Data

Richard CyganiakOpening Up Government Data – Galway, 8 Nov 2011

[email protected]://www.StefanDecker.org/

Page 2: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

TimBL’s 5-star plan for open data

★ Make your stuff available on the Web

★★ Make it available as structured data(e.g., an Excel sheet instead of image scan of a table)

★★★ Use a non-proprietary format(e.g., a CSV file instead of an Excel sheet)

★★★★ Use linked data format(i.e., URIs to identify things, and RDF to represent data)

★★★★★ Link your data to other people’s data to provide contextSource: http://inkdroid.org/journal/2010/06/04/the-5-stars-of-open-linked-data/

Page 3: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Five-shamrock scheme

Page 4: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Five-shamrock scheme

1. Publish data on the web

Page 5: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Five-shamrock scheme

1. Publish data on the web

2. Publish data in a machine-processable format

Page 6: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Five-shamrock scheme

1. Publish data on the web

2. Publish data in a machine-processable format

3. Use an open standard format

Page 7: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Five-shamrock scheme

1. Publish data on the web

2. Publish data in a machine-processable format

3. Use an open standard format

4. Publish under an open license

Page 8: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Five-shamrock scheme

1. Publish data on the web

2. Publish data in a machine-processable format

3. Use an open standard format

4. Publish under an open license

5. List your data in a data catalog

Page 9: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

1. Publish data on the web

Page 10: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Why?

The web is where people look for it first Google can index it Less phone calls and emails (and FoI requests) to

answer

Page 11: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Lots of data is already there

Databases Reports Spreadsheets Maps

Page 12: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

2. Publish data in a machine-processable

format

Page 13: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Why?

Allow others to do their own processing, analysis and visualisation of your data

New services, new ideas

Page 14: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Examples

CSO Quarterly National Household Survey http://cso.ie/qnhs/calendar_quarters_qnhs.htm

EPA enforcement files and ScraperWiki http://www.epa.ie/whatwedo/enforce/lic/info/ https://views.scraperwiki.com/run/irish-epa-visuals/

Galway and Fingal planning applications http://lab.linkeddata.deri.ie/2010/planning-apps/ Getting the data: 210 lines of code vs. 30 lines of code

Page 15: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Symptom: screenscraping

People use tools like ScraperWiki to get at data that isn't machine-readable https://scraperwiki.com/tags/ireland

Scraping is not the right way of doing this Expensive Brittle Strain on computing resources

Page 16: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Formats

Good: MS Excel, CSV, XML, JSON, Microdata Not so good: Pure websites, MS Word Bad: PDF Really bad: Only charts/maps without numbers

Page 17: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Good practices

Publish in multiple formats, at least one machine-readable

Publish Excel files alongside large PDF reports Publish CSV alongside database-backed web

applications

Page 18: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

3. Use an open standard format

Page 19: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Why?

Not all formats are created equal Some formats bring many tools and applications

that people can already use

Page 20: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Quick tour of formats

CSV – Comma-Separated Values More open (and simpler) alternative to Excel format Can be opened in and exported from Excel, Google

Spreadsheets, Google Refine, … KML – Keyhole Markup Language

Simple format for presenting geographic data Can be opened in Google Maps

RSS – Really Simple Syndication Notifications of updates of any kind Can be opened in RSS readers and many email clients

Page 21: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Developer-oriented formats

XML – Extensible Markup Language W3C (World Wide Web Consortium) standard, 1997 established, reliable, ubiquitous

JSON – Javascript Object Notation IETF (Internet Engineering Task Force) standard, 2006 great for web APIs very simple; very fashionable right now

RDF – Resource Description Framework W3C standard, 2004 great for data integration steeper learning curve

Page 22: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Also: standard classifications

Within your data, use the same categories as everybody else

CSO http://www.cso.ie/surveysandmethodologies/

classifications_stan.htm StatCentral list of classifications

http://www.statcentral.ie/classifications.asp

Page 23: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Also: standard identifiers

Example: School roll numbers Department of Education publishes an Excel file with all

school roll numbers Can be used to Google the same school on other

websites, school evaluation reports etc Example: Ordnance Survey UK geo identifiers

Uses URIs (web addresses) as identifiers http://data.ordnancesurvey.co.uk/doc/7000000000037256 Great for use in RDF

Page 24: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Linked Open Data Cloud

Page 25: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Summary

Prefer open, widely used standards But: also prefer what you know best Support multiple formats for different audiences

where it makes sense Great: CSV, KML, RSS, XML, JSON

Page 26: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

4. Publish under anopen license

Page 27: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Why?

Regulates what others can and cannot do with the data

For re-users, uncertainty about rights is a major concern

A good way to ensure that your organisation gets acknowledged

You need some non-discriminatory policy for giving rights to the data anyway (PSI directive)

Page 28: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Complex topic

Destroying a potential income stream? Content licenses vs database licenses Mixing and compatibility of licenses

Wikipedia, OpenStreetMap

Page 29: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Irish PSI License

Created in response to PSI Directive Available at http://psi.gov.ie/ Problems: Documents may not be used “for the

principal purpose of advertising or promoting a particular product or service” Can't be combined with Wikipedia or OpenStreetMap

Not an open license according to Open Definition http://opendefinition.org/

Page 30: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Open database licenses

http://opendefinition.org/licenses/

Page 31: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

License features

You're allowed to do pretty much anything, provided you…

Attribution (“By”) – give credit ShareAlike (“SA”) – adapted data must be

published in the same way

Page 32: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Does Open Data have to be free?

Many would say yes A matter of terminology and definitions Either way there is nothing wrong with charging

for certain data

Page 33: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Data protection

Personal information is not open data Freedom of Information legislation

http://foi.gov.ie/

Page 34: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Summary

Stating an explicit license is important Irish PSI License: It's readily available, but not

“open enough” for some applications Open Data Commons licenses with various

constraints

Page 35: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

5. List your data in adata catalog

Page 36: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Why?

So that people know it exists This is how the world learns about available data This is how you learn what they do and need

Page 37: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Some key information about a dataset

What data is being published? What's the license? When was the data collected? When will it be updated, if at all? How was/is this data collected? What was/is the data used for? Contact person? Where to give feedback?

Page 38: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

How to do this in practice?

Have a simple page on your website Use an open community data catalog Set up your own catalog Use a national Irish data catalog???

Page 39: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Open community catalogs

The Data Hub http://thedatahub.org

Irish CKAN http://ie.ckan.net

Page 40: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Set up your own catalog

Requires a budget Roll your own software?

data.fingal.ie Use open source, e.g., CKAN?

data.gov.uk Berlin Open Data …

Page 41: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

National Irish data catalog?

CSO's StatCentral? Marine Institute's ISDE? Who publishes the catalog in other countries?

UK: Cabinet Office US: White House Australia: Dept of Finance and Deregulation New Zealand: Dept of Internal Affairs

Page 42: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Summary

Data catalogs make it easy to find data Basic metadata, how to give feedback etc Important: How often are datasets accessed? “Request a dataset” feature Also: Open Data Ireland Google Group

http://groups.google.com/group/open-data-ireland

Page 43: How to Publish Open Data

Digital Enterprise Research Institute www.deri.ie

Five-shamrock scheme

1. Publish data on the web

2. Publish data in a machine-processable format

3. Use an open standard format

4. Publish under an open license

5. List your data in a data catalog