The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California...

43
The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego [email protected] http://www.sdsc.edu/pb

Transcript of The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California...

Page 1: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

The Role of Ontologies in Improved Scholarly

Communication

Philip E. BourneUniversity of California San Diego

[email protected]://www.sdsc.edu/pb

Page 2: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

My Perspective …• Ontology Developer (years ago – mmCIF -

Bioinformatics 2002 18: 1280-128)• Database Developer – RCSB PDB• Supporter of open access (provided there is a

business model) - editor in chief of PLoS Computational Biology

• Co-founder - SciVee Inc. • I am becoming increasingly interested in scholarly

communication• I use ontologies to support this work

Page 3: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Objective Today

• Describe how we are using ontologies to try and improve scholarly communication

• Motivate you towards thinking about ontologies that should be developed

• Learn from you where we might spend our efforts

Page 4: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

First Consider What Motivates Us to Improve Scholarly

Communication

Page 5: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

We Cannot Possibly Read a Fraction of the Papers We Should

Drivers of Change Renear & Palmer 2009 Science 325:828-832

Page 6: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Hence We Are Scanning More Reading Less

Renear & Palmer 2009 Science 325:828-832Drivers of Change

Page 7: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

The Truth About the Scientific eLaboratory

• I have ?? mail folders!

• The intellectual memory of my laboratory is in those folders

• This is an unhealthy hub and spoke mentality

Drivers of Change

Page 8: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

The Truth About the Scientific eLaboratory

• I generate way more negative that positive data, but where is it?

• Content management is a mess– Slides, posters…..– Data, lab notebooks ….– Collaborations, Journal clubs …

• Software is open but where is it?• Farewell is for the data too

Drivers of Change

Computational Biology Resources Lack Persistence and Usability. PLoS Comp. Biol. 4(7): e1000136

Page 9: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Data and the Publication Are Disjoint

• PubMed contains 18,792,257 entries

• ~100,000 papers indexed per month

• In Feb 2009:– 67,406,898 interactive

searches were done– 92,216,786 entries were

viewed

• 1078 databases reported in NAR 2008

• MetaBase http://biodatabase.org reports 2,651 entries edited 12,587 times

Biosciences Data as of April 14, 2009Drivers of Change

Page 10: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Publishing Limitations

• A paper is an artifact of a previous era• It is not the logical end product of eScience,

hence:– Work is omitted– Article vs supplement is a mess– Visualization may be limited– Interaction and enquiry are non-existent– Rich media can help, but are rarely used

Drivers of Change

Page 11: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

We Need to do Better & The Game is Afoot

It is being driven from the top down and the bottom up

Page 12: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Ontologies & Semantic Tagging

Page 13: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

BioLit Data Extraction/StorageDatabase IDsOntology termsText excerptsOther… BioLit

MySQLdatabase

XML

XML,Meta-data

<w

eb

se

rvic

es>

we

b

ext

ern

ald

ata

bas

es

Semantic Tagging

Page 14: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Tagging of PubMed Central

• Ontologies read from OBO Files• Words converted to tree structures• Matched to every non-trivial word in the

paper• Matches tagged• A long paper can be matched to GO in less

than 30 seconds

Semantic Tagging http://biolit.ucsd.edu

Page 15: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Semantic Tagging http://biolit.ucsd.edu

Page 16: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

ICTP Trieste, December 10, 200716

http://biolit.ucsd.eduSemantic Tagging

Page 17: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Provision of Webservices to this tagging may be the most valuable contribution..

Semantic Tagging

Page 18: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

www.rcsb.org/pdb/explore/literature.do?structureId=1TIMDatabase & Literature Integration

Context

BMC Bioinformatics 2010 11:220Semantic Tagging

Page 19: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Semantic Tagging of Database Content

http://www.pdb.orgPLoS Comp. Biol. 6(2) e1000673Semantic Tagging

Page 20: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Automatic Knowledge Discovery for Those with No Time to Read

Immunology Literature

Cardiac DiseaseLiterature

Shared FunctionSemantic Tagging

Page 21: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

This is Literature Post-processingBetter to Get the Authors Involved

• Authors are the absolute experts on the content

• More effective distribution of labor

• Add metadata before the article enters the publishing process

BMC Bioinformatics 2010 11:103Semantic Tagging

Page 22: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Word 2007 Add-in for Authors

• Allows authors to add metadata as they write, before they submit the manuscript

• Authors are assisted by automated term recognition– OBO ontologies– Database IDs

• Metadata are embedded directly into the manuscript document via XML tags, OOXML format– Open– Machine-readable

• Open source, Microsoft Public License

http://www.codeplex.com/ucsdbiolitDrivers of Change

Page 23: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Word 2007 Add-in Example of What it Looks Like - Ontologies

• Inline Recognition, Highlighting, and Mark-up of Informative Terms– A recognized term will have a dotted, purple underline– Hovering generates a Smart Tag above the term

• add mark-up for this term• ignore this term• view the term in the ontology browser• If a recognized term appears in more than one ontology, all instances

of that term will be listed– Hovering over a marked-up term

• option to apply mark-up to all recognized instances of term• stop recognizing a term

– Pass ontology terms back to provider

Semantic Tagging BMC Bioinformatics 2010 11:103

Page 24: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

• Built-in Knowledge of Ontologies and Databases– Add-in provides a list of biomedical ontologies to

download– and a list of databases for ID recognition

(GenBank/RefSeq, UniProt, Protein Data Bank)– A user may also supply a URL to download other

ontologies

• Ontology Browser– allows a user to select an ontology and then navigate

through it to view terms and their relationships

BMC Bioinformatics 2010 11:103

Page 25: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Custom Metadata• Ontologies do not contain all usages of a concept• Add-in allows user to assign custom metadata

• Human Disease Ontology term: Leukemia, T-Cell, HTLV-II-Associated

• Synonym: Atypical hairy cell leukemia (disorder) • Actual use in literature:

– hairy cell leukemia– hairy-cell leukemia– hairy T cell leukemia– T cell hairy leukemia

BMC Bioinformatics 2010 11:103

Page 26: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Synonym mapping, disambiguation

• Inclusion of an additional set of synonyms for a term that reflect its use in natural language– Automated finding of synonyms in extant

literature– Gather synonyms from term-mapping databases

• Incorporate a more sophisticated term recognition approach into the add-in

BMC Bioinformatics 2010 11:103

Page 27: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Challenges

• Author use– Familiarity with ontologies, terms– Agreement between co-authors

• End-use of semantically enriched manuscript

• Need to combine with NLM XML standard

Semantic Tagging BMC Bioinformatics 2010 11:103

Page 28: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Challenges: Author Use

IF one or more publishers fast tracked a paper that had semantic

markup I would argue it would catch on in no time

Semantic Tagging BMC Bioinformatics 2010 11:103

Page 29: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Where we Need {Better} Ontologies

1. To Support Mashups Between Different Types of Scholarly Output

Page 30: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Post-publication of Video and Paperwww.scivee.tv

Drivers of Change

Page 31: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Pubcast – Video Integrated with the Full Text of the Paper

Page 32: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Pubcasts - A Unique Technology

Don’t understand what you are reading? Click and have the author pop-up and explain it!

See the scientists and the experiments behind the research papers and textbooks

Pubcasts - A Blend of Video, text, tables, figures, PowerPoints, comments, ratings…ALL SYNCHRONIZED FOR RAPID LEARNING

Mashups – www.scivee.tv

Page 33: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Where we Need {Better} Ontologies

2. To Support Tagging of all Aspects of the Scholarly Product

Page 34: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Consider Today’s Academic Workflow

Research[Grants]

JournalArticle

ConferencePaper

PosterSession

Feds

Societies

Publishers

Reviews

BlogsCommunity Service/Data

Curation

What Should be Done?

Page 35: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Consider Tomorrow’s Academic Workflow

Research[Grants]

JournalArticle

ConferencePaper

PosterSession

Feds

Societies

Publishers

Reviews

BlogsCommunity Service/Data

CurationIdeas, Data, Hypotheses

What Should be Done?

Page 36: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Maybe The Line is Somewhere Else?

Scientist

Idea

Experiment

Data

Conclusions

Publish

Laboratory

Publisher

Page 37: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Maybe The Line is Somewhere Else?

Scientist

Idea

Experiment

Data

Conclusions

PublishWhat Should We Do?

Laboratory

Publisher

Institution

Lab Notebook

Page 38: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Crowd Sourcing the Electronic Printing Press(aka Workshop: Beyond the PDF)

• Proposal to the US National Science Foundation:

• Aims:– Define user requirements– Establish a specification document– Open source the development effort– Have a commitment from a publisher to publish a

research object using the system– Act as an exemplar for what can be done

Page 39: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Question: What if Everyone Had An Electronic Printing Press?

• Peer review might change?• Bibliometrics might change?• Business models will likely change?• What happens to the database/literature divide?• Societies might do more self publishing?• We might have improved the dissemination of

science, but will we have improved the comprehension?

Page 40: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

General References

• What Do I Want from the Publisher of the Future PLoS Comp Biol http://www.sdsc.edu/pb

• Fourth Paradigm: Data Intensive Scientific Discovery http://research.microsoft.com/enus/collaboration/fourthparadigm/

Page 41: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

References to Exemplars

• Semantic Biochemical Journal - 2010: Using Utopia

• Article of the Future, Cell, 2009:• Prospect, Royal Society of Chemistry, 2009:• Adventures in Semantic Publishing, Oxford U, 2009:

• The Structured Digital Abstract, Seringhaus/Gerstein, 2008• CWA Nanopublications – 2010

Page 42: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Acknowledgements• BioLit Team

– Lynn Fink– Parker Williams– Marco Martinez– Rahul Chandran– Greg Quinn

• Microsoft Scholarly Communications– Pablo Fernicola– Lee Dirks– Savas Parastitidas– Alex Wade– Tony Hey

• wwPDB team

• SciVee Team– Apryl Bailey– Tim Beck– Leo Chalupa– Lynn Fink– Marc Friedman (CEO)– Ken Liu– Alex Ramos– Willy Suwanto

http://www.scivee.tv

http://biolit.ucsd.eduhttp//www.pdb.orghttp://www.codeplex.com/ucsdbiolit

Page 43: The Role of Ontologies in Improved Scholarly Communication Philip E. Bourne University of California San Diego pbourne@ucsd.edu .

Questions?

[email protected]