2011linked science4mccuskermcguinnessfinal

12
Where did you hear that? Information and the Sources They Come From James P. McCusker, 1 Timothy Lebo, 1 Li Ding, 1 Cynthia Chang, 1 Paulo Pinheiro da Silva, 2 and Deborah L. McGuinness 1 1 Tetherless World Constellation 2 CyberShARE Center, University of Texas at El Paso

description

Linked Science 2011 talk on the importance of modeling sources and their usage, such as PML's source usage, and how it can be generalized using FRBR

Transcript of 2011linked science4mccuskermcguinnessfinal

Page 1: 2011linked science4mccuskermcguinnessfinal

Where did you hear that?

Information and the Sources They Come From

James P. McCusker,1 Timothy Lebo,1 Li Ding,1 Cynthia Chang,1 Paulo Pinheiro da Silva,2 and Deborah L. McGuinness1

1Tetherless World Constellation2CyberShARE Center, University of Texas at El Paso

Linked Science 2011, Bonn, Germany

Page 2: 2011linked science4mccuskermcguinnessfinal

Background

To do Linked Science, you need to know where your data comes

from.

Many studies show people will not use the applications if they do not have access to the information relied on

Studies done for DARPA (CALO), IARPA (NIMD), NSF (VSTO), AT&T (PROSE & FindUR), …

Page 3: 2011linked science4mccuskermcguinnessfinal

Selected Dimensions of Provenance

Derivational Provenance•x derived_from y using rule z•agent initiated and controlled process•process used x•y generated_by process•subprocess triggered_by process

This work came from our work on provenance languages and their environments, in particular on PML: Proof Markup LanguageAndInference WebAnd nowPROV: Provenance Model (W3C, in progress)

One perspective on Abstractive Provenance:•Work•Expression•Manifestation•ItemFRBR: Functional Requirements for Bibliographic ReferencesPROV: wasComplementOf

Exploring in another area of Cell Lines vs. Cell Line Colonies

Page 4: 2011linked science4mccuskermcguinnessfinal

FRBR Stack for PML Primer

WorkThe web page URL.

ExpressionThe page content

ManifestationThe bytes that were downloaded.

ItemThe specific physical copy.

Page 5: 2011linked science4mccuskermcguinnessfinal

Information, Source, SourceUsage in PML

PML provides an explicit mechanism for linking Information to its Source.

Includes:

• Location (URI)

• Date of access

• Extraction method (file offsets, cell contents, etc).

Page 6: 2011linked science4mccuskermcguinnessfinal

Why FRBR for Information Sources?

Work – SourceExpression – Abstract InformationManifestation – Concrete InformationItem – Specific Copy

These constructs can be reused for other provenance:• File copy• Format conversion• Reproducibility

Page 7: 2011linked science4mccuskermcguinnessfinal

Source and Information in FRBR

Page 8: 2011linked science4mccuskermcguinnessfinal

Discussion

• Abstractive + derivational provenance is really powerful.• Ability to identify content regardless of format and across

multiple copies of data.• Reproducibility is verifiable across different file formats and

algorithms as long as the Expressions are the same.• We have found that FRBR generalizes to information

resources.• Could be considered FRIR (Functional Requirements for

Information Resources)• We’ve implemented it in our LOGD converter

(csv2rdf4lod).• Maintain content links (same Expression) even when

manually changing the data, like converting Excel to CSV.• Particularly useful in much of our Linked Open Data work.

Page 9: 2011linked science4mccuskermcguinnessfinal

Conclusions

• Any serious provenance model for linked science must provide a mechanism for describing information sources and their usage.– Achieved with modeling primitives in PML.

• Abstractive + derivational provenance can express nuanced explanations of data access, transformation, and analysis.

• Links from information to source can be modeled using a combination of FRBR and a derivational provenance model.– Allows for unambiguous descriptions of data and

information access and transformation.

Page 10: 2011linked science4mccuskermcguinnessfinal

Questions?

Thanks!

Also, come to SemantAqua demo on Tues and talk on Wed aft. 5:30

The Tetherless World Constellation is partially funded by DARPA, U.S. Department of Energy, Fujitsu, LGS, Lockheed Martin, Microsoft Research,

NASA, National Ecological Observatory Network (NEON), the National Science Foundation, Qualcomm, and the Woods Hole Oceonographic

Institution (WHOI). This research was partially funded by the National Science Foundation under CREST Grant No. HRD-0734825.

Page 11: 2011linked science4mccuskermcguinnessfinal

Source, Information, and SourceUsage in PML

Page 12: 2011linked science4mccuskermcguinnessfinal

Implementation: pcurl.py

Part of csv2rdf4lod:$ pcurl.py --help

usage: pcurl.py [--help|-h] [--format|-f xml|turtle|n3|nt] [url ...]

Download a URL and compute Functional Requirements for Bibliographic Resources (FRBR) stacks using cryptograhic digests for the resulting content.

Refer to http://purl.org/twc/pub/mccusker2012parallel for more information and examples.

optional arguments:

url url to compute a FRBR stack for.

-h, --help Show this help message and exit.

-f, --format File format for FRBR stacks. One of xml, turtle, n3, or nt.