The consequences for STM publishing ASA – London –22 February 2011 Jan Velterop – ACKnowledge...

Post on 23-Jan-2016

220 views 0 download

Tags:

Transcript of The consequences for STM publishing ASA – London –22 February 2011 Jan Velterop – ACKnowledge...

The consequences for STM publishing

ASA – London –22 February 2011Jan Velterop – ACKnowledge Ltd.

What does

mean for STM publishing?

ACKnowledge

Nanopublication

GAME OVER!!! GAME OVER!!! GAME OVER!!! GAME OVER!!! GAME OVER!!! GAME OVER!!! GAME OVER!!! GAME OVER!!!

ACKnowledge

Publishing ≠ knowledge transfer

Publishing ≠ knowledge transfer

Publishing = knowledge transfer

Publishing = knowledge transfer

ACKnowledge

Why this change ?Why this change ?

ACKnowledge

Record Keeping

Knowledge Transfer

Different interestsDifferent interests

ACKnowledge

Why do scientists publish ?

Why do scientists publish ?

ACKnowledge

The R

ecord

of Scien

ce

The record*“keeping the minutes of science”The record*“keeping the minutes of science”

*picture inspired by Geoffrey Bilder of CrossRef*picture inspired by Geoffrey Bilder of CrossRef

1

ACKnowledge

Credit in the ego-system; the acknowledge economy*Credit in the ego-system; the acknowledge economy*

*‘Acknowledge economy’ coined by Geoffrey Bilder of CrossRef*‘Acknowledge economy’ coined by Geoffrey Bilder of CrossRef

2

ACKnowledge

1+2=the interface with

officialdomthe interface with

officialdom

ACKnowledge

Transfer ofinformation and knowledge

Transfer ofinformation and knowledge

3

ACKnowledge

3=the interface with

sciencethe interface with

science

ACKnowledge

Are the requirements for all three the same?

Are the requirements for all three the same?

ACKnowledge

What we have may be good for the record and

for credit...

What we have may be good for the record and

for credit...

ACKnowledge

...but is it satisfactory for the transfer of knowledge ?...but is it satisfactory for

the transfer of knowledge ?

?ACKnowledge

There is justtoo much to read

There is justtoo much to read

ACKnowledge

“Information consumes the attention of its recipients...

“Information consumes the attention of its recipients...

ACKnowledge

Datarrhoea?

Publicatarrh?

Datarrhoea?

Publicatarrh?

ACKnowledge

...hence a wealth of information creates a poverty of attention”

...hence a wealth of information creates a poverty of attention”

Herbert SimonHerbert SimonACKnowledge

Should we have to make choices what to read and

what not?

Should we have to make choices what to read and

what not?

ACKnowledge

Can we, truly ?Can we, truly ?

ACKnowledge

How would we choose anyway?

How would we choose anyway?

Photo by: flickr - RainbirderACKnowledge

Shouldn’t we take in ALL the knowledge in our

area?

Shouldn’t we take in ALL the knowledge in our

area?

ACKnowledge

As well as satisfy the academic desire to avoid

reading ?

As well as satisfy the academic desire to avoid

reading ?

ACKnowledge

ACKnowledge

A new article in PubMed every 36

seconds

Scientists are struggling to make sense of the expanding scientific literature.

Corie Lok asks whether computational tools can do the hard work for them.

In 2002, when he began to make the transition from basic cell biology to research into Alzheimer’s disease, Virgil Muresan found himself all but overwhelmed by the sheer volume of literature on the disease. He and his wife, Zoia, both now at the University of Medicine and Dentistry of New Jersey in Newark, were hoping to test an idea that they had developed about the formation of the protein plaques in the brains of people with Alzheimer’s disease. But, as new- comers to the field, they were finding it almost impossible to figure out whether their hypothesis was consistent with existing publications. “It’s really difficult to be up to date with so much being published,” says Virgil Muresan. And it’s a challenge that is increasingly facing researchers in every field. The 19 million citations and abstracts covered by the US National Library of Medicine’s PubMed search engine include nearly 830,000 articles published in 2009, up from some 814,000 in 2008 and around 772,000 in 2007. That growth rate shows no signs of abating, especially as emerg- ing countries such as China and Brazil continue to ratchet up their research. The Muresans, however, were able to make use of Semantic Web Applications in Neuromedicine (SWAN), one of a new generation of online tools designed to help researchers zero in on the papers most relevant to their interests, uncover connections and gaps that might not otherwise be obvious, and test and generate new hypotheses. “If you think about how much effort and money we put into just Alzheimer’s disease research, it is surpris- ing that people don’t put more effort into harvesting the published knowledge,” says Elizabeth Wu, SWAN’s project manager. SWAN attempts to help researchers harvest that knowledge by providing a curated, browseable online repository of hypotheses in Alzheimer’s disease research. The hypothesis that the Muresans put into SWAN, for example, was that plaque formation begins when amyloid-β, the major component of brain plaques, forms seeds in the terminal regions of cells in the brainstem that then nucleate the plaques in the other parts of the brain into which the terminals reach. SWAN provides a visual, colour- coded display of the relationships between the hypotheses, as derived from the published literature, and shows where they may agree or conflict. The connections revealed by SWAN led the Muresans to new mouse-model experiments designed to strengthen their hypothesis. “SWAN has advanced our research, and focused it in a certain direction but also broadened it to other directions,” says Virgil Muresan. The use of computers to help researchers drink from the literature firehose dates back to the early 1960s and the first experiments with techniques such as keyword searching. More recent efforts include the striking ‘maps of science’ that cluster papers together on the basis of how often they cite one another, or by similarities in the frequencies of certain keywords. As fascinating as these maps can be, however, they don’t get at the semantics of the papers — the fact that they are talking about specific entities such as genes and proteins, and making assertions about those entities (such as gene X regulates gene Y). The extraction of this kind of informa- tion is much harder to automate, because computers are notoriously poor at understanding what they are read- ing. Even so, informaticians and biologists are working together more and making considerable progress, says Maryann Martone, the chairwoman of the Society for Neuroscience’s neuroinformatics committee. Recently, a number of companies and academic researchers have begun to create tools that are useful for scientists, using various mixtures of automated analysis and manual curation (see ‘Power tools’, page 418). Deeper meaning The goal of these tools is to help researchers analyse and integrate the literature more efficiently than they can do through their own reading, to hone in on the most fruitful experiments to do and to make new predictions of gene functions, say, or drug side effects. The first step towards that goal is for the text- or semantic-mining tool to recognize key terms, or enti- ties, such as genes and proteins. For example, academic publisher Elsevier, headquartered in Amsterdam, has piloted Reflect in two recent online issues of its jour- nal Cell. The technology was developed at the European Molecular Biology Laboratory in Heidelberg, Germany, and won Elsevier’s Grand Challenge 2009 competition for new tools that improve the communication and use of scientific information. Reflect automatically recognizes and highlights the names of genes, proteins and small molecules in the Cell articles. Users clicking on a highlighted term will see a pop- up box containing information related to that term, such as sequence data and molecular structures, along with links to the sources of the data. Reflect obtains this information from its dictionary of millions of proteins and small molecules. Such ‘entity recognition’ can be done fairly accurately by many mining tools today. But other tools take on the tougher challenge of recognizing relationships between the entities. Researchers from Leiden University and Erasmus University in Rotterdam, both in the Netherlands, have developed software called Peregrine, and used it to pre- dict an undocumented interaction between two proteins:

“A new entrant in the field would need 19 years and 202 days to read all the relevant literature” (Assumption: reading 5

articles an hour, 8 hours a day, 5 days a week, 50 weeks a year) Ergo: nobody can have a comprehensive overview

• Shared knowledge between scientists is an illusion • The chance that a specialist reading one paper a day will

read a particular paper is only 1 in 8.9 • The chance that a colleague elsewhere will read the

same paper in a given year is only 1 in 79

Fraser, A.G. and Dunstan, F.D. (2010), ‘On the impossibility of being expert’. BMJ 2010; 341:c6815, doi: 10.1136/bmj.c6815.

ACKnowledge

Current publishing : needle transportCurrent publishing : needle transport

Photo By Stewsnews

ACKnowledge

There is no way we can read everything – not even when it’s highly

relevant

There is no way we can read everything – not even when it’s highly

relevant

ACKnowledge

Create an overview first, perhaps?

Create an overview first, perhaps?

ACKnowledge

ACKnowledge

And then home in on detail?

And then home in on detail?

ACKnowledge

ACKnowledge

Back to the question:what does it all mean for

publishing ?

Back to the question:what does it all mean for

publishing ?

ACKnowledge

Content is King

ACKnowledge

Or “was” perhaps?

ACKnowledge

Though the emphasis is more and more on

service, content still counts

Though the emphasis is more and more on

service, content still counts

ACKnowledge

Science literature is full of assertions

Science literature is full of assertions

ACKnowledge

Most ‘concrete’ ones have the form of ‘triples’:

Object Predicate Subject

ConceptConceptss

ConceptConceptss

ACKnowledge

has pathogenicity

interacts with

has GO annotation

LGMD2A

SNT3

proteosome endopeptidase activity

CAPN3_00265

Dystrophin

Calpain-3

ACKnowledge

Some examples of ‘triples’:

ACKnowledge

Identifier: d63dd9a2-5c8c-11df-b0cb-001517ac506c

d0e6b292-5c61-11df-b0cb-001517ac506c dcf809bb-1a01-468a-a316-3b08de22dd46 b2437eb4-5ec4-11df-b0cb-001517ac506c

ConceptsConceptsUnambiguously identifiedUnambiguously identified

ConceptsConceptsUnambiguously identifiedUnambiguously identified

ACKnowledge

‘Assertions’

Dystrophininteracts with

SNT3

Adding Attributes to Assertions

Examples of attributes:

assertedBy - states which entity asserted (i.e. created) the statement

curatedBy - states that a specified entity has curated the statement

isPeerReviewed - states that this statement has been peer reviewed

isPublished - states where this statement was first published

isEvidencedBy - states that another statement, Y, should be considered evidence for this statement X

createdOn - states the date/time that the statement was created

hasAuthor - states who claims authorship of the statement

isApprovedBy - states who approves of the statement

isDeprecatedBy - states that the statement is no longer in use by the entity in question

ACKnowledge

Let’s call them‘Nanopublications’

In: The anatomy of a nanopublicationPaul Groth, Andrew Gibson, Jan Velterop

http://iospress.metapress.com/content/ftkh21q50t521wm2/fulltext.html

EvidenceStore

Data store 1

Data store 2

Data store n

Imp

ort

, In

tegra

tion

, N

oti

fica

tion A

PI

Identity mapping service (IRS service layer)

Reasoning / Integration

Concept identifier store (IRS data layer)Also associating with semantic type. Not more!

Linked Data Cache(Cardinal Assertions)

ProvenanceStore

Evidence Calculator

ACKnowledge

EvidenceStore

Data store 1

Data store 2

Data store n

Imp

ort

, In

tegra

tion

, N

oti

fica

tion A

PI

Identity mapping service (IRS service layer)

Reasoning / Integration

Use

r In

terf

ace

s /

Web s

erv

ices

Concept identifier store (IRS data layer)Also associating with semantic type. Not more!

Linked Data Cache(Cardinal Assertions)

ProvenanceStore

Evidence Calculator

Cura

tion Inte

rface

s

ACKnowledge

Incorporate nanopublications in your

content

Help disseminate the interfaces (HTML

as well as PDF)

Publishers:Publishers:

ACKnowledge

Nanopublications are not only assertions

Nanopublications are not only assertions

ACKnowledge

Nanopublications are also referencesNanopublications

are also references

ACKnowledge

i.e. they can be cited:good for impact & acknowledgement

i.e. they can be cited:good for impact & acknowledgement

ACKnowledge

And...aren’t references open and free ?

And...aren’t references open and free ?

ACKnowledge

<rdf:Description rdf:about=”http://www.nbic.nl/cwa/relation/#26277419#13817745"><cwa:typeRelation rdf:resource=” http://predicate.conceptwiki.org/index.php/#2121378”/><cwa:direction>1,2</cwa:direction><cwa:strength>1.0</cwa:strength><cwa:author rdf:resource=”http://people.conceptwiki.org/index.php/#85094810”/><cwa:provenance rdf:resource=”http://article.conceptwiki.org/index.php/#121646370”/><cwa:timestamp>1240641052059</cwa:timestamp><cwa:annotated_by rdf:resource=”http://people.conceptwiki.org/index.php/#43065817”/><cwa:annotation rdf resource=”http://www.virusdb.org/viruses/av/Heliothis_virescens_insect”></rdf:Description>

ACKnowledge

ACKnowledge

The whole picture

Even if you don’tEven if you don’thave all the detailhave all the detail

detail

ACKnowledge

nanopublications (i.e. references) can also be used to reason

ACKnowledge

protein A

protein X

‘Publications’ yield ‘data’Exposing the ‘unknown knowns’

What is the use of water?

What is the use of water?

ACKnowledge

ACKnowledge

What is the use of information?

What is the use of information?

ACKnowledge

ACKnowledge

Jan Velterop

oCelsius

Ice

Water

ACKnowledge

Jan Velterop

The climate in the scholarly communication world is clearly

hotting up

Thank you

jan.velterop acknowledgeconnect comvelterop conceptweballiance orgat

at

ACKnowledge