Ucsd library10182010

47
Why is Scholarly Communication Broken and What Can Be Done? In Celebration of Open Access Week Philip E. Bourne University of California San Diego [email protected] UCSD Libraries Oct. 18, 2010

description

 

Transcript of Ucsd library10182010

Page 1: Ucsd library10182010

UCSD Libraries

Why is Scholarly Communication Broken and What Can Be Done?

In Celebration of Open Access Week

Philip E. BourneUniversity of California San Diego

[email protected]

Oct. 18, 2010

Page 2: Ucsd library10182010

UCSD Libraries

Disclaimer

• I am a domain (life) scientist not a computer or information scientist

• I am fortunate enough to have a major biological resource (the Protein Data Bank) and a major biological journal (PLoS Computational Biology) as my playground

• I am part of the long tail

• I am naïve, but I am the majorityOct. 18, 2010

Page 3: Ucsd library10182010

UCSD Libraries

Agenda

• Motivation

• What needs to be done?

• A few examples

• The role of the institution

Oct. 18, 2010

Page 4: Ucsd library10182010

UCSD Libraries

The Scientific Process is Too Slow to Respond to a Crisis – Either Global or Personal

Oct. 18, 2010Motivation

http://knol.google.com/k/plos-currents-influenza#

By the time the paper is published we could all be dead

Page 5: Ucsd library10182010

UCSD Libraries

* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm

Jan. 2008 Jan. 2009 Jan. 2010Jul. 2009Jul. 2008 Jul. 2010

1RUZ: 1918 H1 Hemagglutinin

Structure Summary page activity forH1N1 Influenza related structures

3B7E: Neuraminidase of A/Brevig Mission/1/1918 H1N1 strain in complex with zanamivir

In a time of crisis the need for fast access to accurate data and any knowledge ofthat data are paramount

MotivationOct. 18, 2010

Page 6: Ucsd library10182010

UCSD Libraries

If that is not enough…

For some people the scientific process may be too slow to save their life

Oct. 18, 2010Motivation

Page 7: Ucsd library10182010

UCSD Libraries

Josh Sommer – A Remarkable Young ManCo-founder & Executive Director the Chordoma Foundation

Oct. 18, 2010http://sagecongress.org/Presentations/Sommer.pdf

Motivation

Page 8: Ucsd library10182010

UCSD Libraries

Chordoma

• A rare form of brain cancer

• No known drugs• Treatment – surgical

resection followed by intense radiation therapy

Oct. 18, 2010Motivation

http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG

Page 9: Ucsd library10182010

UCSD LibrariesOct. 18, 2010

http://sagecongress.org/Presentations/Sommer.pdf

Motivation

Page 10: Ucsd library10182010

UCSD LibrariesOct. 18, 2010

http://sagecongress.org/Presentations/Sommer.pdf

Motivation

Page 11: Ucsd library10182010

UCSD LibrariesOct. 18, 2010

http://sagecongress.org/Presentations/Sommer.pdf

Motivation

Page 12: Ucsd library10182010

UCSD LibrariesOct. 18, 2010

Adapted: http://sagecongress.org/Presentations/Sommer.pdf

Motivation

Isaac

If I have seen further it is only by standing on the shoulders of giants

Isaac Newton

From Josh’s point of view the climb up just takes too long

> 15 years and > $850M to be more precise

Page 13: Ucsd library10182010

UCSD LibrariesOct. 18, 2010

http://sagecongress.org/Presentations/Sommer.pdf

Motivation

Page 14: Ucsd library10182010

UCSD LibrariesOct. 18, 2010Motivation

http://sagecongress.org/Presentations/Sommer.pdf

Page 15: Ucsd library10182010

UCSD LibrariesOct. 18, 2010

http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation

Motivation

Page 16: Ucsd library10182010

UCSD Libraries

Now we are all hopefully motivated let us break this down to what actually needs to be done in my opinion

Here are a few big things …

Oct. 18, 2010What Needs to be Done?

Page 17: Ucsd library10182010

UCSD Libraries

A Few Things to Accelerate the Rate of Scientific Discovery

• Better communication, data and knowledge access, and new modes of discovery, which means:– We need data and knowledge about that data to interoperate

i.e. we need new kinds of fast, versatile publications and data archives

– We need to be more open with both– We need to think more about the tools that analyze, visualize

and annotate data to maximize knowledge discovery– Reward systems need to change– We need scientist management tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet

Oct. 18, 2010 Easy Hard

Page 18: Ucsd library10182010

1. A link brings up figures from the paper

0. Full text of PLoS papers stored in a database

2. Clicking the paper figure retrievesdata from the PDB which is

analyzed

3. A composite view ofjournal and database

content results

We Need Data and Knowledge About That

Data to Interoperate

1. User clicks on content2. Metadata and

webservices to data provide an interactive view that can be annotated

3. Selecting features provides a data/knowledge mashup

4. Analysis leads to new content I can share

4. The composite view haslinks to pertinent blocks

of literature text and back to the PDB

1.

2.

3.

4.

The Knowledge and Data Cycle

PLoS Comp. Biol. 2005 1(3) e34

Page 19: Ucsd library10182010

UCSD Libraries

We Need Data and Knowledge About That Data to Interoperate – What is Stopping US?

• Governance – publishers vs. database providers

• Reward• Metadata standards for provenance, privacy

etc.• Exemplars• ….

Oct. 18, 2010

Caveat: Each discipline is different – I speak very much from a biomedicalsciences perspective

Page 20: Ucsd library10182010

Certainly the Argument for Interoperability in the Biomedical Sciences is Strong

• PubMed contains 18,792,257 entries

• ~100,000 papers indexed per month

• In Feb 2009:– 67,406,898 interactive

searches were done– 92,216,786 entries were

viewed

• 1078 databases reported in NAR 2008

• MetaBase http://biodatabase.org reports 2,651 entries edited 12,587 times

Data as of April 14, 2009

PLoS Comp. Biol. 2005 1(3) e34What Needs to be Done?

Page 21: Ucsd library10182010

UCSD Libraries

www.rcsb.org/pdb/explore/literature.do?structureId=1TIM

Example Interoperability: The Database View

BMC Bioinformatics 2010 11:220Oct. 18, 2010What Needs to be Done?

Page 22: Ucsd library10182010

UCSD Libraries

Example Interoperability: The Literature Viewhttp://biolit.ucsd.edu

Nucleic Acids Research 2008 36(S2) W385-389Oct. 18, 2010What Needs to be Done?

Page 23: Ucsd library10182010

UCSD LibrariesICTP Trieste, December 10, 2007

Oct. 18, 2010

Page 24: Ucsd library10182010

UCSD Libraries

Semantic Tagging & Widgets are a Powerful Tool to Integrate Data and Knowledge of that

Data, But as Yet Not Used Much

Oct. 18, 2010

Will Widgets and Semantic Tagging Change Computational Biology? PLoS Comp. Biol. 6(2) e1000673

What Needs to be Done?

Page 25: Ucsd library10182010

Semantic Tagging of Database Content in The Literature or Elsewhere

http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jspPLoS Comp. Biol. 6(2) e1000673Semantic Tagging

Page 26: Ucsd library10182010

UCSD LibrariesOct. 18, 2010What Needs to be Done?

Page 27: Ucsd library10182010

UCSD Libraries

The Publishers are Starting to Do It

Oct. 18, 2010From Anita de Waard, Elsevier

What Needs to be Done?

Page 28: Ucsd library10182010

UCSD Libraries

This is Literature Post-processingBetter to Get the Authors Involved

• Authors are the absolute experts on the content

• More effective distribution of labor

• Add metadata before the article enters the publishing process

Oct. 18, 2010What Needs to be Done?

Page 29: Ucsd library10182010

UCSD Libraries

Word 2007 Add-in for authors

• Allows authors to add metadata as they write, before they submit the manuscript

• Authors are assisted by automated term recognition– OBO ontologies– Database IDs

• Metadata are embedded directly into the manuscript document via XML tags, OOXML format– Open– Machine-readable

• Open source, Microsoft Public License

http://www.codeplex.com/ucsdbiolitOct. 18, 2010

What Needs to be Done?

Page 30: Ucsd library10182010

UCSD Libraries

Challenges

• Authors – Carrot IF one or more publishers fast tracked a

paper that had semantic markup it might catch on

• Publishers– Carrot Competitive advantage

Oct. 18, 2010What Needs to be Done?

Page 31: Ucsd library10182010

UCSD Libraries

A Few Things to Accelerate the Rate of Scientific Discovery

• Better communication, data and knowledge access, and new modes of discovery, which means:– We need data and knowledge about that data to interoperate

i.e. we need new kinds of fast, versatile publications and data archives

– We need to be more open with both– We need to think more about the tools that analyze, visualize

and annotate data to maximize knowledge discovery– Reward systems need to change– We need scientist management tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet

Oct. 18, 2010 Easy Hard

Page 32: Ucsd library10182010

UCSD Libraries

Reward Systems Need to ChangeWhat is Needed?

• Author disambiguation• Auditing (identification and metrics) of all

scholarship - means new tools• Seniors need to promote alternative forms of

scholarship• Juniors need to respond

Oct. 18, 2010Reward Systems Need to Change

Ten Simple Rules for Getting Promoted as a Computational Biologist in Academia PLoS Comp Biol to appear

Page 33: Ucsd library10182010

UCSD Libraries

Example Tools

Oct. 18, 2010

http://pubnet.gersteinlab.org/

http://www.researcherid.com/

http://www.biomedexperts.com

Page 34: Ucsd library10182010

UCSD Libraries

What Are these Alternative Forms of Scholarship?

Research[Grants]

JournalArticle

ConferencePaper

PosterSession

Reviews

BlogsCommunity Service/Data

Curation

Reward Systems Need to ChangeOct. 18, 2010

Page 35: Ucsd library10182010

UCSD Libraries

Ideally the ID will be Tagged to Every Piece of Scholarly Communication

I an Not a Scientist I am a NumberPLoS Comp. Biol. 2008 4(12) e1000247

Reward Systems Need to ChangeOct. 18, 2010

Page 36: Ucsd library10182010

UCSD Libraries

A Few Things to Accelerate the Rate of Scientific Discovery

• Better communication, data and knowledge access, and new modes of discovery, which means:– We need data and knowledge about that data to interoperate

i.e. we need new kinds of fast, versatile publications and data archives

– We need to be more open with both– We need to think more about the tools that analyze, visualize

and annotate data to maximize knowledge discovery– Reward systems need to change– We need scientist management tools– We need to be less fixated on the big data problems– We need to unleash the full power of the Internet

Oct. 18, 2010 Easy Hard

Page 37: Ucsd library10182010

UCSD Libraries

The Truth About My Laboratory

• I have ?? mail folders!

• The intellectual memory of my laboratory is in those folders

• This is an unhealthy hub and spoke mentality

We Need Scientist Management ToolsOct. 18, 2010

Page 38: Ucsd library10182010

The Truth About My Laboratory

• I generate way more negative that positive data, but where is it?

• Content management is a mess– Slides, posters…..– Data, lab notebooks ….– Collaborations, Journal clubs …

• Software is open but where is it?• Farewell is for the data too

Computational Biology Resources Lack Persistence and Usability. PLoS Comp. Biol. 2008 4(7): e1000136 We Need Scientist Management Tools

http://artbyvida.com/portfolio.php

Page 39: Ucsd library10182010

UCSD Libraries

Many Great Tools Out There

Oct. 18, 2010 We Need Scientist Management Tools

Taverna

Page 40: Ucsd library10182010

UCSD Libraries

Where I See the Problems

• The long tail is confused

• Lack of interoperability between the options

• The reward (publishing) is still removed from the available tools

Oct. 18, 2010 We Need Scientist Management Tools

Page 41: Ucsd library10182010

Science is Increasingly a Digital Workflow

Scientist

Idea

Experiment

Data

Conclusions

PublishThe Role of the Institution

Laboratory

Publisher

Page 42: Ucsd library10182010

Maybe The Line is Somewhere Else?

Scientist

Idea

Experiment

Data

Conclusions

Publish

Laboratory

Publisher

Institution

Lab Notebook

The Role of the Institution

Page 43: Ucsd library10182010

This Amounts to Publishing WorkflowsBut That Has its Problems

• Workflows are not linear• Workflow : paper is not 1:1• Confidentiality• Peer review• Infrastructure• Community acceptance• Reward system

The Role of the Institution

Page 44: Ucsd library10182010

Solutions to Publishing Workflows?

• New organizations (university as publisher?)

• Appropriate reward system

• Shared governance – author, institution, publisher

• Crowd sourcing the electronic printing press

The Role of the Institution

Page 45: Ucsd library10182010

Crowd Sourcing the Electronic Printing Press(aka Workshop: Beyond the PDF)

• Funded by DDCF, Microsoft, NCI, Sage Bionetworks:

• Aims:– Define user requirements– Establish a specification document– Open source the development effort– Have a commitment from a publisher to publish a

research object using the system– Act as an exemplar for what can be done

The Role of the Institution

Page 46: Ucsd library10182010

Logistics

• UC San Diego• Jan 19-21, 2010• Under the auspices of

W3C• FoRC will have a follow

on meeting

The Role of the Institution

Page 47: Ucsd library10182010

UCSD Libraries

Questions?

[email protected]

Oct. 18, 2010