Linking Open Government Data at Scale

68
Extend Your Reach. Linking Open Government Data at Scale YOW! 2016 Conference Melbourne December 1-2 ~ Brisbane December 5-6 Sydney December 8-9 Bernadette Hyland CEO & co-founder 3 Round Stones, Inc. @BernHyland [email protected]

Transcript of Linking Open Government Data at Scale

Page 1: Linking Open Government Data at Scale

Extend Your Reach.

Linking Open Government Data at Scale

YOW! 2016 Conference Melbourne December 1-2 ~ Brisbane December 5-6

Sydney December 8-9

Bernadette HylandCEO & co-founder3 Round Stones, Inc.@[email protected]

Page 2: Linking Open Government Data at Scale

@BernHyland

Page 3: Linking Open Government Data at Scale

@BernHyland

Page 4: Linking Open Government Data at Scale

Some data is expensive to

collect …

@BernHyland

Page 5: Linking Open Government Data at Scale

@BernHyland

Page 6: Linking Open Government Data at Scale

Data on the Web today

@BernHyland

Page 7: Linking Open Government Data at Scale

Lack of Context

credit: http://mhausenblas.info

Page 8: Linking Open Government Data at Scale

Required Context

credit: http://mhausenblas.info

Page 9: Linking Open Government Data at Scale

Linked data is

intentionally for reuse

Page 10: Linking Open Government Data at Scale

Refers to a set of best practices for publishing and interlinking data for access by both humans and machines.

The RDF family of syntaxes (e.g., JSON-LD, N3, Turtle) and HTTP URIs.

Linked Data

@BernHyland

Page 11: Linking Open Government Data at Scale

Linked Data can be published by a person or organization behind the firewall or on the

public Web.

Linked Data published on the public Web is generally called Linked Open Data.

- W3C Linked Data Glossary

@BernHyland

Page 12: Linking Open Government Data at Scale

Something Something elsea relationship

@BernHyland

Page 13: Linking Open Government Data at Scale

UQ Universityis a

@BernHyland

Page 14: Linking Open Government Data at Scale

UQ

The University of Queensland

label

Universityis a

Group of 8

affiliation

@BernHyland

Page 15: Linking Open Government Data at Scale

UQ

The University of Queensland

label

affiliationGroup of 8

34228

number of undergraduate students

48771

number of students

@BernHyland

Page 16: Linking Open Government Data at Scale

credit: http://json-ld.org/

Page 17: Linking Open Government Data at Scale

credit: https://callimachusproject.org

Page 18: Linking Open Government Data at Scale

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

@BernHyland

Page 19: Linking Open Government Data at Scale

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/>select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

@BernHyland

Page 20: Linking Open Government Data at Scale

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergradswhere { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

@BernHyland

Page 21: Linking Open Government Data at Scale

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> .?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

@BernHyland

Page 22: Linking Open Government Data at Scale

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name .OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

@BernHyland

Page 23: Linking Open Government Data at Scale

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students}OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads}FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

@BernHyland

Page 24: Linking Open Government Data at Scale

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" )} ORDER BY DESC (?students)

@BernHyland

Page 25: Linking Open Government Data at Scale

# G8 universities ordered by the number of students # at each university. PREFIX dbo:<http://dbpedia.org/ontology/> select ?name ?students ?undergrads where { ?s dbo:affiliation <http://dbpedia.org/resource/Group_of_Eight_(Australian_universities)> . ?s rdfs:label ?name . OPTIONAL {?s dbo:numberOfStudents ?students} OPTIONAL {?s dbo:numberOfUndergraduateStudents ?undergrads} FILTER ( lang(?name) = "en" ) } ORDER BY DESC (?students)

@BernHyland

Page 26: Linking Open Government Data at Scale

@BernHyland

Page 27: Linking Open Government Data at Scale

my data

collector

collected by

measurement

Michael

first name

Hausenblaslast name

Person

a

a measurement

2011-01-01date

0

valueunits of measure

degrees Centigrade

...

Galway Airport

collected at

or

Linked Data on the Web

@BernHyland

Page 28: Linking Open Government Data at Scale

“Linked Data was part of my initial vision for the Web and is an important part of the Web’s

future. The Web took off as a web of hyperlinked documents which were exciting to read, but which could not be effectively used as data.

“Linked Data was part of my initial vision for the Web and is an important part of the Web’s future. The Web

took off as a web of hyperlinked documents which were exciting to read, but which could not be

effectively used as data.”

- Tim Berners-Lee

Page 29: Linking Open Government Data at Scale

“Linked Data was part of my initial vision for the Web and is an important part of the Web’s

future. The Web took off as a web of hyperlinked documents which were exciting to read, but which could not be effectively used as data.

The Semantic Web morphed when it hit the marketplace

Page 30: Linking Open Government Data at Scale
Page 31: Linking Open Government Data at Scale

Governments & NGOs publishing & consuming Linked Data

Page 32: Linking Open Government Data at Scale

07 Nov 2007

Page 33: Linking Open Government Data at Scale

10 Nov 2007

Page 34: Linking Open Government Data at Scale

28 Feb 2008

Page 35: Linking Open Government Data at Scale

31 Mar 2008

Page 36: Linking Open Government Data at Scale

18 Sep 2008

Page 37: Linking Open Government Data at Scale

05 Mar 2009

Page 38: Linking Open Government Data at Scale

27 Mar 2009

Page 39: Linking Open Government Data at Scale

14 Jul 2009

Page 40: Linking Open Government Data at Scale

22 Sep 2009

Page 41: Linking Open Government Data at Scale

22 Sep 2010

Page 42: Linking Open Government Data at Scale
Page 43: Linking Open Government Data at Scale
Page 44: Linking Open Government Data at Scale

• Widens EPA’s audience (justifies relevance), for research, environmental justice

• More cost-effective than relational backed web portals

• Used for scientific R&D, green chemistry, ++ • Increased transparency

https://opendata.epa.gov

@BernHyland

Page 45: Linking Open Government Data at Scale

7 Steps to Publish Linked Data

Source: W3C Best Practices for Publishing Linked Data, see https://www.w3.org/TR/ld-bp/

Page 46: Linking Open Government Data at Scale

Step #1 - IdentifyIdentify the dataset(s) to be modeled • Request a copy of the logical and physical model of the

database(s)• Obtain data extracts (i.e., databases and/or

spreadsheets) or create data in a way that can be replicated.

@BernHyland

Page 47: Linking Open Government Data at Scale

Step #2 - Model Data Model data without context to allow for reuse and easier merging of data sets

• Traditional DBAs organize data for specified Web services or applications

• In Linked Data, application logic does not drive the data schema, concepts, etc

@BernHyland

Page 48: Linking Open Government Data at Scale

Step #2 - Modeling (cont)

Look for real world objects of interest (e.g., people, places, things, locations, etc.) and model them.• Investigate how others are already modeling similar or

related data.• Look for duplication & normalize the data• Use common sense to decide whether or not to make

link

@BernHyland

Page 49: Linking Open Government Data at Scale

• Connect data from different sources & authoritative vocabularies

• Use URIs as names for your objects• Put aside immediate needs of any application• Don’t think about how an application will use your data• Do think about time and how the data will change over

time.

Step #2 - Modeling (cont)

@BernHyland

Page 50: Linking Open Government Data at Scale

Identifiers are at the heart of how things become useful as linked data.

We use the same mechanism for connecting data as the Web — the humble HTTP URI

The Web is formed by HTTP URIs that are essentially connections linking pieces of information together.

Step #3 & 4 Name & Describe

@BernHyland

Page 51: Linking Open Government Data at Scale

5. Write a script or process to convert the data set repeatedly

6. Publish to the Web and announce it!

7. Maintenance strategy

Steps #5, 6 & 7 Convert, Publish & Maintain

@BernHyland

Page 52: Linking Open Government Data at Scale

Take an iterative approach1. Review of modeling decisions

2. Review vocabularies chosen and developed

3. Modify/update data conversion scripts

4. Do a maintenance walk-through with real use cases

5. Show how to explore data with SPARQL and visualizations

6. Discuss a persistent identifier strategy (think PURLs)@BernHyland

Page 53: Linking Open Government Data at Scale

@BernHyland

Page 54: Linking Open Government Data at Scale

@BernHyland

Page 55: Linking Open Government Data at Scale

Technical DNA of EPA Linked Data Services

• Built on Open Source Software • Provides downloadable Linked Open Data (RDF,

JSON-LD) • Developer guide includes RESTful API, persistent

URLs strategy • Sample apps on GitHub (https://github.com/

USEPA)

@BernHyland

Page 56: Linking Open Government Data at Scale

Power of LOD Combining data sets in a day with Linked Open Data from DBpedia & EPA. Next the EPA wanted more chemical data linked to their data…

@BernHyland

Page 57: Linking Open Government Data at Scale

Specialist knowledge as Linked Open Data

@BernHyland

Page 58: Linking Open Government Data at Scale

PubChem, the world’s largest open molecular

database Used by healthcare / life sciences industry worldwide - all Linked

Open Data

@BernHyland

Page 59: Linking Open Government Data at Scale

Use of shared vocabularies, including

SKOS, RDFS, OWL. Other key vocabularies

include Dublin Core, Geo, FOAF, ORG, Vcard are the “lingua franca” of

data interoperability

Page 60: Linking Open Government Data at Scale

https://opendata.epa.gov

@BernHyland

Page 61: Linking Open Government Data at Scale

Contractor (3 Round Stones, Inc.)

Public

Application, Script or automated client

Web Browser

SPARQL endpointREST APIResource URIs

Linked Data management systemlocated at a Tier 1 Cloud Provider

(FISMA compliant)

RDF Database

Registered developer@BernHyland

Page 62: Linking Open Government Data at Scale

• A worldwide system of linked information systems • Global addressing scheme for data integration that scales to the

Web • Nearly immediate data integration to billions of facts

Linked Data is a gift …

@BernHyland

Page 63: Linking Open Government Data at Scale

http://LinkedDataDeveloper.com

@BernHyland

Page 65: Linking Open Government Data at Scale

How do I get started?

https://www.w3.org/TR/ld-bp/

Page 66: Linking Open Government Data at Scale

https://www.w3.org/2012/ldp/charter

Enterprise data interoperability

Page 67: Linking Open Government Data at Scale

Use your super powers for good!

@WhoGiveACrapTP