Briefing on US EPA Open Data Strategy using a Linked Data Approach

48
Bernadette Hyland CEO & co-founder David Wood CTO & co-founder 1400 Key Blvd, Ste 100 Arlington VA 22209 Tel. +1-877-290-2127 [email protected] @BernHyland [email protected] @prototypo [email protected] @3RoundStones Extend Your Reach. Better Data. Smarter Decisions This presentation delivered 18-Nov-2014 & is available at http://slideshare.net/3RoundStones

description

An overview presented by Ms. Bernadette Hyland on 18-Nov 2014 on the US EPA Open Data strategy, focusing on the Resource Conservation & Recovery Act (RCRA) dataset to be published as linked data . This work is in support of Presidential Memorandum M13-13 - Open Data Policy and Managing Information as an Asset.

Transcript of Briefing on US EPA Open Data Strategy using a Linked Data Approach

Page 1: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Bernadette Hyland CEO & co-founder

David Wood CTO & co-founder

1400 Key Blvd, Ste 100

Arlington VA 22209

Tel. +1-877-290-2127

[email protected] @BernHyland

[email protected] @prototypo

[email protected] @3RoundStones

Extend Your Reach.

Better Data. Smarter Decisions

This presentation delivered 18-Nov-2014 & is available at http://slideshare.net/3RoundStones

Page 2: Briefing on US EPA Open Data Strategy using a Linked Data Approach

!

With everything else happening in the world, why does

Open Data matter anyway??

Page 3: Briefing on US EPA Open Data Strategy using a Linked Data Approach

3

Credits: !WV Chemical spill: http://www.nytimes.com/2014/01/11/us/west-virginia-chemical-spill.html!Hurricane Sandy: http://www.nytimes.com/2012/10/28/us/hurricane-sandy-on-collision-course-with-winter-storm.html!Ebola: http://www.nytimes.com/interactive/2014/07/31/world/africa/ebola-virus-outbreak-qa.html

Page 4: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Discovery, access & re-use goes beyond government transparency &

accountability … !

Access to timely, accurate data is vital to

first responders, legislators, scientists, policy makers,

journalists & the general public

Page 5: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Taxpayers spend billions of dollars for our government to

collect data

We, the people, expect government authorities to treat information

as an asset. It complies with regulations (Quality

of Information Act, Section 508, protects PII) & is:

public, accessible, described, reusable, complete, timely, sustainable over

election cycles

Page 6: Briefing on US EPA Open Data Strategy using a Linked Data Approach

US Federal Government is listening … “Open Data” per M13-13*

• Public

• Accessible

• Described

• Reusable

• Complete

• Timely

• Managed Post-Release

• Project Open Data

• OMB & OSTP online tools, best practices & schema to help agencies implement M13-13. See Project Open Data

• May 9, 2014 the Digital Accountability & Transparency Act (DATA Act) became Public Law 113-101

Page 7: Briefing on US EPA Open Data Strategy using a Linked Data Approach

The goal of treating Information as an asset is

not new …

Page 8: Briefing on US EPA Open Data Strategy using a Linked Data Approach

“Linked Data was part of my initial vision for the Web and is an important part of the Web’s

future. The Web took off as a web of hyperlinked documents which were exciting to read, but which could not be effectively used as

data.

“Linked Data was part of my initial vision for the Web and is an important part of the Web’s future. The Web

took off as a web of hyperlinked documents which were exciting to read, but which could not be

effectively used as data.” !

- Tim Berners-Lee

Page 9: Briefing on US EPA Open Data Strategy using a Linked Data Approach

We all know the ground truth of data on the Web

Page 10: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Lots of [government] open data without labels or context

Page 11: Briefing on US EPA Open Data Strategy using a Linked Data Approach

What is needed is … data that describes itself

Page 12: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Linked Open Data is called “self-describing” data

Linked Data is “A method of publishing structured data so that it can be interlinked &

become more useful. … Extends Web pages to share information in a way that can be

read automatically by computers.”

- Sir Tim Berners-Lee

Page 13: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Linked Data on the Web

my data

collector

collected by

measurement

Michael

first name

Hausenblaslast name

Person

a

a measurement

2011-01-01date

0

valueunits of measure

degrees Centigrade

...

Galway Airport

collected at

or

Page 14: Briefing on US EPA Open Data Strategy using a Linked Data Approach

• Linked Data has one amazing property: it can easily be combined with other Linked Data to form new knowledge.

• Based on 20+ year old idea

• A system of linked information systems

M A N N I N G

David WoodMarsha ZaidmanLuke RuthWITH Michael Hausenblas

FOREWORD BY Tim Berners-Lee

Structured data on the Web

Page 15: Briefing on US EPA Open Data Strategy using a Linked Data Approach
Page 16: Briefing on US EPA Open Data Strategy using a Linked Data Approach

We’re making progress with US Government

Open Data

Page 17: Briefing on US EPA Open Data Strategy using a Linked Data Approach
Page 18: Briefing on US EPA Open Data Strategy using a Linked Data Approach

• On our third iteration of a catalog of datasets, using CKAN. • >500k datasets from 200+ USG authorities

• Sustained executive support for data.gov via OMB & OSTP - Project Open Data • GSA team who are engaging with Open Data / OSS /

standards community • Health, Energy, Law, Education & Public Safety specific

communities in place. • Agencies are [beginning] to name Chief Data Officers !

But we still have a lot to do …

Page 19: Briefing on US EPA Open Data Strategy using a Linked Data Approach

RCRA = Resource

Conservation and Recovery

Act

A search for “EPA RCRA” shows displayed

the first dataset 6th position :-(

This dataset is just one piece of a complex set

of data in understanding solid

waste reporting

First 5 results are for

Facilities Registry

Service …

Page 20: Briefing on US EPA Open Data Strategy using a Linked Data Approach

For example, The Right-to-Know Network is a

consumer of EPA open data from

data.gov

Page 21: Briefing on US EPA Open Data Strategy using a Linked Data Approach

They’ve build some nice

visualizations!

Page 22: Briefing on US EPA Open Data Strategy using a Linked Data Approach

But the Toxics Release Inventory (TRI) is

complicated data . The RTF Network

would have benefited from more context

had it been available from the EPA…

Page 23: Briefing on US EPA Open Data Strategy using a Linked Data Approach

RTK Network provides access

to machine readable content (as XML) but … it lacks context

This data does not use shared vocabularies!

:-( No units of

measure, No definition of codes

Page 24: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Linked Data Management System For government open data publishing

Funded by

Page 25: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Landing page for new EPA Open Data site

Page 26: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Search for facilities in your neighborhood… !Click through to an individual facility

Page 27: Briefing on US EPA Open Data Strategy using a Linked Data Approach

View by map or by table layout …

Page 28: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Another sample linked data app shows nuclear power plants regulated by the US EPA

Page 29: Briefing on US EPA Open Data Strategy using a Linked Data Approach

The power of Open … A useful app developed in 2 days using Open

Government Data + Open Source + Open Web Standards …

!Deployed on the cloud.

Page 30: Briefing on US EPA Open Data Strategy using a Linked Data Approach

4

5

6

7

8

1

2

Key to data sources: 1 Wikidata Project (description) 2 Open Street Maps (OSS) 3 Wikimedia Commons (photo) 4 Raw data available for developers (RDF/XML) 5 EPA Resource Conservation and Recovery Act (RCRA) 6 EPA Facilities (FRS) 7 EPA Toxic Release Inventory (TRI) 8 ABT Enviro Consultants (from a spreadsheet)

3

Page 31: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Pollution reports using linked data … Developed in < 1 week using open data & OSS

Page 32: Briefing on US EPA Open Data Strategy using a Linked Data Approach

List of input reports for this pollution report … Possible because of a linked data approach

Page 33: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Use of shared vocabularies, e.g. Places, Geographis, Dublin Core, Geo, FOAF, ORG, Vcard are the “lingua franca” of data interoperability

Page 34: Briefing on US EPA Open Data Strategy using a Linked Data Approach

WeatherHealth A mobile app for chronic asthma/COPD

patients with weather alerts

Funded by

Page 35: Briefing on US EPA Open Data Strategy using a Linked Data Approach
Page 36: Briefing on US EPA Open Data Strategy using a Linked Data Approach

User

NOAA US EPA AirNow

DBpediaNational Library of Medicine

US EPA SunWise

Page 37: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Orgpedia An open organizational data project

on public & private companies

Funded by

Page 38: Briefing on US EPA Open Data Strategy using a Linked Data Approach
Page 39: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Callimachus apps allow for crowdsourcing

Page 40: Briefing on US EPA Open Data Strategy using a Linked Data Approach

How did we handle data publishing & application

development US EPA, Sentara Healthcare &

Orgpedia?

Page 41: Briefing on US EPA Open Data Strategy using a Linked Data Approach

!

The leading Web application server for Linked Data !

Fanatically standards compliant ** !

Used to creating data-driven applications that combine data across silos

** http://www.w3.org/2013/data/

Page 42: Briefing on US EPA Open Data Strategy using a Linked Data Approach

<HTML>

Enterprise Data Documents

Read/ Write

Point to, include

Our customers use Callimachus to:

Create responsive apps with many different data sources &

types of data

Page 43: Briefing on US EPA Open Data Strategy using a Linked Data Approach

CONTENT MANAGEMENT

SYSTEM

LINKED DATA MANAGEMENT

SYSTEM

UN

ST

RU

CT

UR

ED

T

EX

T

TE

XT

ST

RU

CT

UR

ED

D

AT

A

DA

TA

Page 44: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Customers are creating data-driven applications with data in leading graph databases:

Page 45: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Callimachus conveniently

supports in-browser

development for faster iteration

Page 46: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Do not recreate the wheel!

Page 47: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Summary

• Billions of dollars are spent by taxpayers for government to collect useful information - e.g., geospatial data, population, healthcare, medicine & clinical trials, environment, energy, law, education …

• Data consumers must help government to fulfill its goal to treat “information as an asset” by participating & giving feedback

• Steady forward progress has been made however, take care to not re-create the wheel!

• Use Open Data, Open Source, Web standards & published best practices whenever possible

• More work to be done …

Page 48: Briefing on US EPA Open Data Strategy using a Linked Data Approach

Addi%onal  Resources

• “Open  by  Default”  presenta%on  by  Dr.  David  Wood  to  Virginia  Commonwealth  officials  10/7/2014,  see  hKp://www.slideshare.net/3roundstones/open-­‐by-­‐default-­‐39976290  

– Open  Data  is  the  idea  that  "certain  data  should  be  freely  available  to  everyone  to  use  and  republish  as  they  wish,  without  restric%ons  from  copyright,  patents  or  other  mechanisms  of  control”.  Open  Data  follows  similar  “open”  concepts  that  have  proven  to  be  valuable  in  the  informa%on  economy  such  as  Open  Standards,  Open  Source  SoRware,  Open  Content  and  has  been  followed  more  recently  by  varia%ons  on  the  theme  such  as  Open  Science  and  Open  Government.    

–  Linked  Data  Developer  website,  see  hKp://linkeddatadeveloper.com/  

–  Linked  Data:  Structured  Data  on  the  Web,  see  hKp://books.google.com/books/about/Linked_Data.html?id=rA8-­‐mQEACAAJ  

–  Add  Linked  Data  to  HTML  with  RDFa.info,  see  hKp://seman%cweb.com/new-­‐resource-­‐for-­‐web-­‐developers-­‐announced-­‐add-­‐linked-­‐data-­‐to-­‐html_b28813  

–See  also  RDFa  website  on  GitHub,  see  hKps://github.com/rdfa/rdfa-­‐website

48