Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R....
-
Upload
natalie-scott -
Category
Documents
-
view
217 -
download
1
Transcript of Data-gov Wiki: Towards Linking Government Data Li Ding, Dominic DiFranzo, Alvaro Graves, James R....
Data-gov Wiki: Towards Linking Government Data
Li Ding, Dominic DiFranzo, Alvaro Graves, James R. Michaelis, Xian Li,
Deborah L. McGuinness and Jim Hendler
Tetherless World ConstellationMarch 22, 2010
Outline
• Background– Open Government Data Initiative– data.gov
• The Data-gov Wiki– Making Government Linkable– Linking and Using Government Data– Provenance Issues
Background
http://www.data.gov/
http://www.whitehouse.gov/open
Open Government Data Initiative
Open Government Data Initiative • Transparency• Participation• Collaboration
Open Government Directive (Dec 8, 2009)• Publish Government Information Online • Improve the Quality of Government Information • Create and Institutionalize a Culture of Open Government • Create an Enabling Policy Framework for Open Government
data.gov, data.gov.uk and beyond
What’s next?•More datasets•More links•More provenance
£30 million to fund "Institute of Web Science"
Statistics about data.gov
50 participating agencies: USDA, DOC, DOD, ED, DOE, HHS, DHS, HUD, DOI, DOJ, DOL, STATE, DOT, TREAS, VA, EPA, GSA, NASA, NSF, NRC, OPM, SBA, SSA, USAID, BBG, CFTC, CNS, EXIM, EOP, FCC, FDIC, FEC, FRB, IMLS, MSPB, NARA, NEA, NEH, NLRB, NTSB, OSHRC, ONHIR, OPIC, PBGC, RRB, SEC, SSS, TVA, CPSC, EEOC
Source: http://www.data.gov/metric accessed March 21, 2010
The Data-gov Wikihttp://data-gov.tw.rpi.edu/
About the data-gov Wiki
MissionThe data-gov project investigates the role of semantic web technologies, esp. linked data, in producing, processing and utilizing government data found in data.gov.
Objectives• Support linked government data publishing, applications and
provenance using semantic technologies• Educate potential developers and users • Enable social collaborations on linked government data
This project is run by the Tetherless World Constellation at RPI, headed by Profressor Jim Hendler and Deborah McGuinness and led by Li Ding. Other team members include: Dominic DiFranzo, Sarah Magidson ,James Michaelis, Alvaro Graves, Adam Bell, Jin Guang Zheng, Xian Li, Tim Lebo, Gregory Todd Williams, and Peter Coons.
Data-gov Wiki Architecture
Data Web
Linked Data
Linked Data
LGDin RDF
Enhancement
ConversionKn
ow
led
ge P
roven
an
ce
…
Usage
LGD: Linked government data
Data-gov Cloud (Oct 2009)
US-COMMUNITY(2005-2007)
CASTNET(1990 – Present)
RECS(2005)
GOV-BUDGET(1962-2014)
TOXIC-RELEASE(2005-2008)
EARTHQUAKE(Present)
STATE-LIB(2006-2007)
PUBLIC-LIB(1992-2006)
MED-COST(1994-2009)
LABOR-STAT(19xx-Present)
DATA-GOV-CATALOG(present)
Government
Community
Services
Environment
CASTNET sites
RECS code
US agency US location
Linked Data
USAspending(2008-2010)
GeoNamesGeoNames
http://data-gov.tw.rpi.edu/wiki/demos
data.gov + uk gov data + NY times + DBpedia
http://data-gov.tw.rpi.edu/wiki/Demo:_Comparing_US-USAID_and_UK-DFID_Global_Foreign_Aid
From Open Government Data (OGD) to Linked Government Data (LGD)
Make government data linkable
Account name Agency name
Donations, Donations for the Official Residence of the Vice President
Executive Office of the President
RDF Conversion*Minimal and extensible * Web accessible
<rdf:Description rdf:about="#entry262"><dgp401:account_name>Donations, Donations for the Official Residence of the Vice President</dgp401:account_name> …<dgp401:agency_name>Executive Office of the President</dgp401:agency_name></rdf:Description>
Raw RDF: http://data-gov.tw.rpi.edu/raw/403/data-403.rdf
Raw Data: http://www.whitehouse.gov/omb/budget/fy2010/assets/receipts.csv
Linking at Conversion TimeReuse Property
<rdf:Description rdf:about="#entry840"><dgp401:account_name>Defense Vessel Transfer Receipt Account</dgp401:account_name> …<dgp401:agency_name>Department of Defense--Military</dgp401:agency_name></rdf:Description>
Raw RDF: http://data-gov.tw.rpi.edu/raw/402/data-402.rdf
<rdf:Description rdf:about="#entry262"><dgp401:account_name>Donations, Donations for the Official Residence of the Vice President</account_name> …<dgp401:agency_name>Executive Office of the President</dgp401:agency_name></rdf:Description>
Raw RDF: http://data-gov.tw.rpi.edu/raw/403/data-403.rdf
Linking using Semantic Wikienrich ontology definition
Property Definition: http://data-gov.tw.rpi.edu/vocab.php?property=92/title
[[rdfs:subPropertyOf::Property:rdfs:label]]
<owl:DatatypeProperty rdf:about="http://data-gov.tw.rpi.edu/vocab/p/92/title"> <rdfs:label>92/title</rdfs:label> <rdfs:subPropertyOf rdf:resource="http://www.w3.org/2000/01/rdf-schema#label"/> <rdfs:subPropertyOf rdf:resource="http://xmlns.com/foaf/0.1/name"/> <rdfs:subClassOf rdf:resource="http://www.w3.org/1999/02/22-rdf-syntax-ns#Property"/> …</owl:DatatypeProperty>
Property Definition: http://data-gov.tw.rpi.edu/wiki/Property:92/title
Linking using Semantic Wikiconnect entities using owl:sameAs
X Wrong Wikipedia Name Correct Wikipedia Name
Incremental Data Enhancement
<rdf:Description rdf:about="http://data-gov.tw.rpi.edu/raw/403/data-403.rdf#entry262">…..<agency_name_link rdf:resource="http://data-gov.tw.rpi.edu/vocab/Executive_Office_of_the_President"/></rdf:Description>
Enhance raw RDF with links: http://data-gov.tw.rpi.edu/linked/403/agency_403.rdf
Link to DBpedia: http://data-gov.tw.rpi.edu/vocab/Executive_Office_of_the_President <swivt:Subject rdf:about="http://data-gov.tw.rpi.edu/vocab/Executive_Office_of_the_President"><rdfs:label>Executive Office of the President</rdfs:label><rdf:type rdf:resource="http://data-gov.tw.rpi.edu/vocab/c/Agencies_of_the_United_States_government"/><owl:sameAs rdf:resource="http://dbpedia.org/resource/Executive_Office_of_the_President"/> ……</swivt:Subject>
Runtime Linking in Applications
• Link datasets by common literal value
• Link datasets by overlapping time– Align multiple time series– Support users to comment on time series data
Provenance Issues
Provenance Annotation
• Descriptions
• Relations
DatasetDemo
Agency
Provenance Events
CSV2RDF
SemDiff
Archive
Enhance
visualize
derive derive
create
derive
revision
Results from Revision Provenance
The number of datasets published at data.gov has been tripled since July 2009Dataset updates on data.gov are not limited to additions.
Conclusion
Conclusion - Observations
Minimal and extensible RDF conversion is useful for generate linked government data in a timely fashion
Literal name is still useful in linking data, especially if we know the context of data
Social semantic web technologies can help distributing high cost tasks, e.g. mapping entity names, to the crowd.
Provenance is a growing requirement to the transparency of open data applications
Conclusions – Ongoing Workbuild hub datasets
GOV-BUDGET(1962-2014)
PUBLIC-LIB(1992-2006)
CASTNET sites
US agency US location
USAspending(2008-2010)
Employment statistics
Medicare cost
IRS annualTax report
DATA-GOV-CATALOG(present)
US CensusState population
Blah, blah………..
skos:altLabel owl:sameAs
Conclusions – Ongoing Work Making Sense of LGD
AI + CI !
To appear in Web Sci 2010 conference – co-located with WWW 2010
Conclusions – Ongoing Workincremental knowledge on social semantic web
• A social semantic web website can substantially promote collaborations on knowledge accumulation (ontology as well as instance linkage)
• We need a tradeoff on costly high quality conversion and ugly minimal conversion
#a dgp92:title “my title”
dgp92:title rdfs:subPropertyOf rdfs:label
#a rdfs:label “my title”
#a skos:prefLabel “my title”
#a foaf:name “my title”
?
Conclusions – Ongoing Workprovenance is everywhere
Evaluate issues on exposing provenance data and improve semantic-difference computation.
provenance vocabulary provenance awareness provenance reasoning provenance mining …
Ok, it is really the final conclusion
• The data-gov project does not use much AI for now (most on representation side), but even little semantics goes a long way
• The massive knowledge accumulated in this project is now raising a number of challenges to AI (especially the computation side)
• Semantic technologies are not far from us, undergraduate students can build a demo quickly!
BTW,….
Questions?
Shameless self-promotions• Link: http://data-gov.tw.rpi.edu/• “Browsing and Finding Linked Data” by
Shangguan this afternoon• See us at demo/poster session, we have more
exciting demos to show you• IPAW 2010 (June 2010, Troy, NY) will be
looking for late breaking news from you!