Workflows and challenges

58
Data-driven Papers and Grand Challenges Anita de Waard, [email protected] Disruptive Technologies Director, Elsevier Labs August 26, 2010

Transcript of Workflows and challenges

Data-driven Papers and Grand Challenges

Anita de Waard, [email protected] Disruptive Technologies Director, Elsevier Labs

August 26, 2010

Science is made of information...

Science is made of information...

...that gets created...

Science is made of information...

...that gets created... ... and destroyed.

What is the problem?

What is the problem?

1. Researchers can’t keep track of their data.

What is the problem?

1. Researchers can’t keep track of their data.

2. Data is not stored in a way that is easy for authors.

What is the problem?

1. Researchers can’t keep track of their data.

2. Data is not stored in a way that is easy for authors.

3. For readers, article text is not linked to the underlying data.

The Vision Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan

The Vision Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan

1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.

metadata

metadata

metadata

metadata

metadata

The Vision Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan

1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.

metadata

metadata

metadata

metadata

metadata

2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.

The Vision Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan

1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.

metadata

metadata

metadata

metadata

metadata

2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.

Rats were subjected to two grueling tests(click on fig 2 to see underlying data). These results suggest that the neurological pain pro-

3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document.

The Vision Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan

1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.

metadata

metadata

metadata

metadata

metadata

2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.

4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated.

Review

EditRevise

Rats were subjected to two grueling tests(click on fig 2 to see underlying data). These results suggest that the neurological pain pro-

3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document.

The Vision Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan

1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.

metadata

metadata

metadata

metadata

metadata

5. Publishing and distribution: When a paper is published, a collection of validated information is exposed to the world. It remains connected to its related data item, and its heritage can be traced.

2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.

4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated.

Review

EditRevise

Rats were subjected to two grueling tests(click on fig 2 to see underlying data). These results suggest that the neurological pain pro-

3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document.

Some other publisher

6. User applications: distributed applications run on this ‘exposed data’ universe.

The Vision Work done with Ed Hovy, Phil Bourne, Gully Burns and Cartic Ramakrishnan

1. Research: Each item in the system has metadata (including provenance) and relations to other data items added to it.

metadata

metadata

metadata

metadata

metadata

5. Publishing and distribution: When a paper is published, a collection of validated information is exposed to the world. It remains connected to its related data item, and its heritage can be traced.

2. Workflow: All data items created in the lab are added to a (lab-owned) workflow system.

4. Editing and review: Once the co-authors agree, the paper is ‘exposed’ to the editors, who in turn expose it to reviewers. Reports are stored in the authoring/editing system, the paper gets updated, until it is validated.

Review

EditRevise

Rats were subjected to two grueling tests(click on fig 2 to see underlying data). These results suggest that the neurological pain pro-

3. Authoring: A paper is written in an authoring tool which can pull data with provenance from the workflow tool in the appropriate representation into the document.

What is needed to get there?

What is needed to get there? Workflow tools: Linked-data-based workflow tools for all

sciences: scalable, safe, and user-friendly

What is needed to get there? Workflow tools: Linked-data-based workflow tools for all

sciences: scalable, safe, and user-friendlyAuthoring and reviewing tools: that enable use of rich and

provenance-tracked elements

What is needed to get there? Workflow tools: Linked-data-based workflow tools for all

sciences: scalable, safe, and user-friendlyAuthoring and reviewing tools: that enable use of rich and

provenance-tracked elementsMetadata standards: Standards that allow exchange of

information on any knowledge item created in a lab, including provenance/privacy/IPR rights

What is needed to get there? Workflow tools: Linked-data-based workflow tools for all

sciences: scalable, safe, and user-friendlyAuthoring and reviewing tools: that enable use of rich and

provenance-tracked elementsMetadata standards: Standards that allow exchange of

information on any knowledge item created in a lab, including provenance/privacy/IPR rights

Semantic/Linked Data XML repositories.

What is needed to get there? Workflow tools: Linked-data-based workflow tools for all

sciences: scalable, safe, and user-friendlyAuthoring and reviewing tools: that enable use of rich and

provenance-tracked elementsMetadata standards: Standards that allow exchange of

information on any knowledge item created in a lab, including provenance/privacy/IPR rights

Semantic/Linked Data XML repositories. Publishing systems that are application servers

What is needed to get there? Workflow tools: Linked-data-based workflow tools for all

sciences: scalable, safe, and user-friendlyAuthoring and reviewing tools: that enable use of rich and

provenance-tracked elementsMetadata standards: Standards that allow exchange of

information on any knowledge item created in a lab, including provenance/privacy/IPR rights

Semantic/Linked Data XML repositories. Publishing systems that are application serversSocial change: Scientists store, track and annotate their

work.

What is needed to get there? Workflow tools: Linked-data-based workflow tools for all

sciences: scalable, safe, and user-friendlyAuthoring and reviewing tools: that enable use of rich and

provenance-tracked elementsMetadata standards: Standards that allow exchange of

information on any knowledge item created in a lab, including provenance/privacy/IPR rights

Semantic/Linked Data XML repositories. Publishing systems that are application serversSocial change: Scientists store, track and annotate their

work.

tool builders

What is needed to get there? Workflow tools: Linked-data-based workflow tools for all

sciences: scalable, safe, and user-friendlyAuthoring and reviewing tools: that enable use of rich and

provenance-tracked elementsMetadata standards: Standards that allow exchange of

information on any knowledge item created in a lab, including provenance/privacy/IPR rights

Semantic/Linked Data XML repositories. Publishing systems that are application serversSocial change: Scientists store, track and annotate their

work.

tool builders

tool builders

What is needed to get there? Workflow tools: Linked-data-based workflow tools for all

sciences: scalable, safe, and user-friendlyAuthoring and reviewing tools: that enable use of rich and

provenance-tracked elementsMetadata standards: Standards that allow exchange of

information on any knowledge item created in a lab, including provenance/privacy/IPR rights

Semantic/Linked Data XML repositories. Publishing systems that are application serversSocial change: Scientists store, track and annotate their

work.

tool builders

standards bodies

tool builders

What is needed to get there? Workflow tools: Linked-data-based workflow tools for all

sciences: scalable, safe, and user-friendlyAuthoring and reviewing tools: that enable use of rich and

provenance-tracked elementsMetadata standards: Standards that allow exchange of

information on any knowledge item created in a lab, including provenance/privacy/IPR rights

Semantic/Linked Data XML repositories. Publishing systems that are application serversSocial change: Scientists store, track and annotate their

work.

tool builders

standards bodiespublishers

tool builders

What is needed to get there? Workflow tools: Linked-data-based workflow tools for all

sciences: scalable, safe, and user-friendlyAuthoring and reviewing tools: that enable use of rich and

provenance-tracked elementsMetadata standards: Standards that allow exchange of

information on any knowledge item created in a lab, including provenance/privacy/IPR rights

Semantic/Linked Data XML repositories. Publishing systems that are application serversSocial change: Scientists store, track and annotate their

work.

tool builders

standards bodiespublisherspublishers

tool builders

What is needed to get there? Workflow tools: Linked-data-based workflow tools for all

sciences: scalable, safe, and user-friendlyAuthoring and reviewing tools: that enable use of rich and

provenance-tracked elementsMetadata standards: Standards that allow exchange of

information on any knowledge item created in a lab, including provenance/privacy/IPR rights

Semantic/Linked Data XML repositories. Publishing systems that are application serversSocial change: Scientists store, track and annotate their

work.

tool builders

standards bodies

institutes, funding bodies, individuals

publisherspublishers

tool builders

A. Workflow tools are emerging

A. Workflow tools are emerging

http://MyExperiment.org

A. Workflow tools are emerging

http://MyExperiment.org

http://VisTrails.org

A. Workflow tools are emerging

http://wings.isi.edu/

http://MyExperiment.org

http://VisTrails.org

SWAN Semantic Relationships

MSWORD file

Excel file

personpublication

publication

Claim

commentPrivate makes hasEvidence

hasEvidence

describes

describes

annotates

authoredBy

shareWith

authorOf

B. Authoring ‘ecosystems’: e.g., SWAN

Slide by Tim Clark

SWAN Semantic Relationships

MSWORD file

Excel file

personpublication

publication

Claim

commentPrivate makes hasEvidence

hasEvidence

describes

describes

annotates

authoredBy

shareWith

authorOf

B. Authoring ‘ecosystems’: e.g., SWAN

Slide by Tim Clark

person

group

hypothesis Claim

Claim

Public

makes

makes

hasEvidence

hasEvidence

hasEvidence

PDFs

publication

publication

publication

gene

comment

concept

describes

annotates

annotates

discussedIn

authoredBy

shareWith

foaf:person rdf:Type

June 1, 2010

Atomic

http://www.ht.org/foaf.rdf#me

pav:createdOn

pav:createdBy

rdf:Type

http://anyurl.com/sf_pat01.htmlann:annotates

ann:contextonDocument

InitEndCornerSelector

ImageSelector

rdf:Type

rdfs:SubClassOf(304, 507)

(380, 618)

init

end

Other annotations on the same document:1. Atomic annotation on image (tag: “hematoma”)2. General annotation (tag: “injury”)

Other annotations on similar documents:1. General annotation (tag: “skull fracture”)

hasTag

Tag

Linear skull fracture

tag FMA:skull

hasTopic

C. Example of Metadata: Harvard’s Annotation Ontology

Slide by Tim Clark

D. Linked Data at Elsevier

D. Linked Data at Elsevier

<ce:section id=#123>

D. Linked Data at Elsevier

<ce:section id=#123> mice like cheesethis says

D. Linked Data at Elsevier

<ce:section id=#123>

said @anita on May 31 2010

mice like cheesethis says

but we all know she was jetlagged then

D. Linked Data at Elsevier

<ce:section id=#123>

said @anita on May 31 2010

mice like cheesethis says

but we all know she was jetlagged then

D. Linked Data at Elsevier

<ce:section id=#123>

said @anita on May 31 2010

immutable, $$, proprietary

mice like cheesethis says

dynamic, personal, task-driven, - open?

but we all know she was jetlagged then

D. Linked Data at Elsevier

<ce:section id=#123>

said @anita on May 31 2010

immutable, $$, proprietary

mice like cheesethis says

E. ScienceDirect Application Server

F. Social Change. Some next Steps:

• 2010 - 2011: Try to gather resources, current leaders, etc. for ‘Future of Research Communication’ effort

F. Social Change. Some next Steps:

• 2010 - 2011: Try to gather resources, current leaders, etc. for ‘Future of Research Communication’ effort–Fall 2010: Develop virtual community (with Harvard)

F. Social Change. Some next Steps:

• 2010 - 2011: Try to gather resources, current leaders, etc. for ‘Future of Research Communication’ effort–Fall 2010: Develop virtual community (with Harvard)–August 2011: Dagstuhl Workshop:

F. Social Change. Some next Steps:

• 2010 - 2011: Try to gather resources, current leaders, etc. for ‘Future of Research Communication’ effort–Fall 2010: Develop virtual community (with Harvard)–August 2011: Dagstuhl Workshop:

• Involve key people (include funding bodies, libraries, institutions) to see where bottlenecks are

F. Social Change. Some next Steps:

• 2010 - 2011: Try to gather resources, current leaders, etc. for ‘Future of Research Communication’ effort–Fall 2010: Develop virtual community (with Harvard)–August 2011: Dagstuhl Workshop:

• Involve key people (include funding bodies, libraries, institutions) to see where bottlenecks are

• Write white paper, implement

F. Social Change. Some next Steps:

• 2010 - 2011: Try to gather resources, current leaders, etc. for ‘Future of Research Communication’ effort–Fall 2010: Develop virtual community (with Harvard)–August 2011: Dagstuhl Workshop:

• Involve key people (include funding bodies, libraries, institutions) to see where bottlenecks are

• Write white paper, implement• 2011: ICCS ‘Executable Paper Challenge’?

F. Social Change. Some next Steps:

Scope: Tools and processes to:

- Improve the process of creating, reviewing and editing scientific content

- Interpret, visualize or connect science knowledge

- Provide tools/ideas for measuring the impact of these improvements.

Scope: Tools and processes to:

- Improve the process of creating, reviewing and editing scientific content

- Interpret, visualize or connect science knowledge

- Provide tools/ideas for measuring the impact of these improvements.

June 2008: 71 Submissions from 15 countries.

Scope: Tools and processes to:

- Improve the process of creating, reviewing and editing scientific content

- Interpret, visualize or connect science knowledge

- Provide tools/ideas for measuring the impact of these improvements.

June 2008: 71 Submissions from 15 countries.August 2008: 10 Semi-finalists teams, access to:

- 500,000 full text articles - Plus EMTREE, EmBase, Scopus

- Created tool/demo- Presented to the Judges

- Wrote a paper (accepted for JWeb Semantics)

Scope: Tools and processes to:

- Improve the process of creating, reviewing and editing scientific content

- Interpret, visualize or connect science knowledge

- Provide tools/ideas for measuring the impact of these improvements.

June 2008: 71 Submissions from 15 countries.August 2008: 10 Semi-finalists teams, access to:

- 500,000 full text articles - Plus EMTREE, EmBase, Scopus

- Created tool/demo- Presented to the Judges

- Wrote a paper (accepted for JWeb Semantics)April 2009: Judges selected 4 Finalist teams.

Scope: Tools and processes to:

- Improve the process of creating, reviewing and editing scientific content

- Interpret, visualize or connect science knowledge

- Provide tools/ideas for measuring the impact of these improvements.

June 2008: 71 Submissions from 15 countries.August 2008: 10 Semi-finalists teams, access to:

- 500,000 full text articles - Plus EMTREE, EmBase, Scopus

- Created tool/demo- Presented to the Judges

- Wrote a paper (accepted for JWeb Semantics)April 2009: Judges selected 4 Finalist teams.And the winners were:

Scope: Tools and processes to:

- Improve the process of creating, reviewing and editing scientific content

- Interpret, visualize or connect science knowledge

- Provide tools/ideas for measuring the impact of these improvements.

June 2008: 71 Submissions from 15 countries.August 2008: 10 Semi-finalists teams, access to:

- 500,000 full text articles - Plus EMTREE, EmBase, Scopus

- Created tool/demo- Presented to the Judges

- Wrote a paper (accepted for JWeb Semantics)April 2009: Judges selected 4 Finalist teams.And the winners were:

Scope: Tools and processes to:

- Improve the process of creating, reviewing and editing scientific content

- Interpret, visualize or connect science knowledge

- Provide tools/ideas for measuring the impact of these improvements.

June 2008: 71 Submissions from 15 countries.August 2008: 10 Semi-finalists teams, access to:

- 500,000 full text articles - Plus EMTREE, EmBase, Scopus

- Created tool/demo- Presented to the Judges

- Wrote a paper (accepted for JWeb Semantics)April 2009: Judges selected 4 Finalist teams.And the winners were: