Post on 23-Jan-2016
The consequences for STM publishing
ASA – London –22 February 2011Jan Velterop – ACKnowledge Ltd.
What does
mean for STM publishing?
ACKnowledge
Nanopublication
GAME OVER!!! GAME OVER!!! GAME OVER!!! GAME OVER!!! GAME OVER!!! GAME OVER!!! GAME OVER!!! GAME OVER!!!
ACKnowledge
Publishing ≠ knowledge transfer
Publishing ≠ knowledge transfer
Publishing = knowledge transfer
Publishing = knowledge transfer
ACKnowledge
Why this change ?Why this change ?
ACKnowledge
Record Keeping
Knowledge Transfer
Different interestsDifferent interests
ACKnowledge
Why do scientists publish ?
Why do scientists publish ?
ACKnowledge
The R
ecord
of Scien
ce
The record*“keeping the minutes of science”The record*“keeping the minutes of science”
*picture inspired by Geoffrey Bilder of CrossRef*picture inspired by Geoffrey Bilder of CrossRef
1
ACKnowledge
Credit in the ego-system; the acknowledge economy*Credit in the ego-system; the acknowledge economy*
*‘Acknowledge economy’ coined by Geoffrey Bilder of CrossRef*‘Acknowledge economy’ coined by Geoffrey Bilder of CrossRef
2
ACKnowledge
1+2=the interface with
officialdomthe interface with
officialdom
ACKnowledge
Transfer ofinformation and knowledge
Transfer ofinformation and knowledge
3
ACKnowledge
3=the interface with
sciencethe interface with
science
ACKnowledge
Are the requirements for all three the same?
Are the requirements for all three the same?
ACKnowledge
What we have may be good for the record and
for credit...
What we have may be good for the record and
for credit...
ACKnowledge
...but is it satisfactory for the transfer of knowledge ?...but is it satisfactory for
the transfer of knowledge ?
?ACKnowledge
There is justtoo much to read
There is justtoo much to read
ACKnowledge
“Information consumes the attention of its recipients...
“Information consumes the attention of its recipients...
ACKnowledge
Datarrhoea?
Publicatarrh?
Datarrhoea?
Publicatarrh?
ACKnowledge
...hence a wealth of information creates a poverty of attention”
...hence a wealth of information creates a poverty of attention”
Herbert SimonHerbert SimonACKnowledge
Should we have to make choices what to read and
what not?
Should we have to make choices what to read and
what not?
ACKnowledge
Can we, truly ?Can we, truly ?
ACKnowledge
How would we choose anyway?
How would we choose anyway?
Photo by: flickr - RainbirderACKnowledge
Shouldn’t we take in ALL the knowledge in our
area?
Shouldn’t we take in ALL the knowledge in our
area?
ACKnowledge
As well as satisfy the academic desire to avoid
reading ?
As well as satisfy the academic desire to avoid
reading ?
ACKnowledge
ACKnowledge
A new article in PubMed every 36
seconds
Scientists are struggling to make sense of the expanding scientific literature.
Corie Lok asks whether computational tools can do the hard work for them.
In 2002, when he began to make the transition from basic cell biology to research into Alzheimer’s disease, Virgil Muresan found himself all but overwhelmed by the sheer volume of literature on the disease. He and his wife, Zoia, both now at the University of Medicine and Dentistry of New Jersey in Newark, were hoping to test an idea that they had developed about the formation of the protein plaques in the brains of people with Alzheimer’s disease. But, as new- comers to the field, they were finding it almost impossible to figure out whether their hypothesis was consistent with existing publications. “It’s really difficult to be up to date with so much being published,” says Virgil Muresan. And it’s a challenge that is increasingly facing researchers in every field. The 19 million citations and abstracts covered by the US National Library of Medicine’s PubMed search engine include nearly 830,000 articles published in 2009, up from some 814,000 in 2008 and around 772,000 in 2007. That growth rate shows no signs of abating, especially as emerg- ing countries such as China and Brazil continue to ratchet up their research. The Muresans, however, were able to make use of Semantic Web Applications in Neuromedicine (SWAN), one of a new generation of online tools designed to help researchers zero in on the papers most relevant to their interests, uncover connections and gaps that might not otherwise be obvious, and test and generate new hypotheses. “If you think about how much effort and money we put into just Alzheimer’s disease research, it is surpris- ing that people don’t put more effort into harvesting the published knowledge,” says Elizabeth Wu, SWAN’s project manager. SWAN attempts to help researchers harvest that knowledge by providing a curated, browseable online repository of hypotheses in Alzheimer’s disease research. The hypothesis that the Muresans put into SWAN, for example, was that plaque formation begins when amyloid-β, the major component of brain plaques, forms seeds in the terminal regions of cells in the brainstem that then nucleate the plaques in the other parts of the brain into which the terminals reach. SWAN provides a visual, colour- coded display of the relationships between the hypotheses, as derived from the published literature, and shows where they may agree or conflict. The connections revealed by SWAN led the Muresans to new mouse-model experiments designed to strengthen their hypothesis. “SWAN has advanced our research, and focused it in a certain direction but also broadened it to other directions,” says Virgil Muresan. The use of computers to help researchers drink from the literature firehose dates back to the early 1960s and the first experiments with techniques such as keyword searching. More recent efforts include the striking ‘maps of science’ that cluster papers together on the basis of how often they cite one another, or by similarities in the frequencies of certain keywords. As fascinating as these maps can be, however, they don’t get at the semantics of the papers — the fact that they are talking about specific entities such as genes and proteins, and making assertions about those entities (such as gene X regulates gene Y). The extraction of this kind of informa- tion is much harder to automate, because computers are notoriously poor at understanding what they are read- ing. Even so, informaticians and biologists are working together more and making considerable progress, says Maryann Martone, the chairwoman of the Society for Neuroscience’s neuroinformatics committee. Recently, a number of companies and academic researchers have begun to create tools that are useful for scientists, using various mixtures of automated analysis and manual curation (see ‘Power tools’, page 418). Deeper meaning The goal of these tools is to help researchers analyse and integrate the literature more efficiently than they can do through their own reading, to hone in on the most fruitful experiments to do and to make new predictions of gene functions, say, or drug side effects. The first step towards that goal is for the text- or semantic-mining tool to recognize key terms, or enti- ties, such as genes and proteins. For example, academic publisher Elsevier, headquartered in Amsterdam, has piloted Reflect in two recent online issues of its jour- nal Cell. The technology was developed at the European Molecular Biology Laboratory in Heidelberg, Germany, and won Elsevier’s Grand Challenge 2009 competition for new tools that improve the communication and use of scientific information. Reflect automatically recognizes and highlights the names of genes, proteins and small molecules in the Cell articles. Users clicking on a highlighted term will see a pop- up box containing information related to that term, such as sequence data and molecular structures, along with links to the sources of the data. Reflect obtains this information from its dictionary of millions of proteins and small molecules. Such ‘entity recognition’ can be done fairly accurately by many mining tools today. But other tools take on the tougher challenge of recognizing relationships between the entities. Researchers from Leiden University and Erasmus University in Rotterdam, both in the Netherlands, have developed software called Peregrine, and used it to pre- dict an undocumented interaction between two proteins:
“A new entrant in the field would need 19 years and 202 days to read all the relevant literature” (Assumption: reading 5
articles an hour, 8 hours a day, 5 days a week, 50 weeks a year) Ergo: nobody can have a comprehensive overview
• Shared knowledge between scientists is an illusion • The chance that a specialist reading one paper a day will
read a particular paper is only 1 in 8.9 • The chance that a colleague elsewhere will read the
same paper in a given year is only 1 in 79
Fraser, A.G. and Dunstan, F.D. (2010), ‘On the impossibility of being expert’. BMJ 2010; 341:c6815, doi: 10.1136/bmj.c6815.
ACKnowledge
Current publishing : needle transportCurrent publishing : needle transport
Photo By Stewsnews
ACKnowledge
There is no way we can read everything – not even when it’s highly
relevant
There is no way we can read everything – not even when it’s highly
relevant
ACKnowledge
Create an overview first, perhaps?
Create an overview first, perhaps?
ACKnowledge
ACKnowledge
And then home in on detail?
And then home in on detail?
ACKnowledge
ACKnowledge
Back to the question:what does it all mean for
publishing ?
Back to the question:what does it all mean for
publishing ?
ACKnowledge
Content is King
ACKnowledge
Or “was” perhaps?
ACKnowledge
Though the emphasis is more and more on
service, content still counts
Though the emphasis is more and more on
service, content still counts
ACKnowledge
Science literature is full of assertions
Science literature is full of assertions
ACKnowledge
Most ‘concrete’ ones have the form of ‘triples’:
Object Predicate Subject
ConceptConceptss
ConceptConceptss
ACKnowledge
has pathogenicity
interacts with
has GO annotation
LGMD2A
SNT3
proteosome endopeptidase activity
CAPN3_00265
Dystrophin
Calpain-3
ACKnowledge
Some examples of ‘triples’:
ACKnowledge
Identifier: d63dd9a2-5c8c-11df-b0cb-001517ac506c
d0e6b292-5c61-11df-b0cb-001517ac506c dcf809bb-1a01-468a-a316-3b08de22dd46 b2437eb4-5ec4-11df-b0cb-001517ac506c
ConceptsConceptsUnambiguously identifiedUnambiguously identified
ConceptsConceptsUnambiguously identifiedUnambiguously identified
ACKnowledge
‘Assertions’
Dystrophininteracts with
SNT3
Adding Attributes to Assertions
Examples of attributes:
assertedBy - states which entity asserted (i.e. created) the statement
curatedBy - states that a specified entity has curated the statement
isPeerReviewed - states that this statement has been peer reviewed
isPublished - states where this statement was first published
isEvidencedBy - states that another statement, Y, should be considered evidence for this statement X
createdOn - states the date/time that the statement was created
hasAuthor - states who claims authorship of the statement
isApprovedBy - states who approves of the statement
isDeprecatedBy - states that the statement is no longer in use by the entity in question
ACKnowledge
Let’s call them‘Nanopublications’
In: The anatomy of a nanopublicationPaul Groth, Andrew Gibson, Jan Velterop
http://iospress.metapress.com/content/ftkh21q50t521wm2/fulltext.html
EvidenceStore
Data store 1
Data store 2
Data store n
Imp
ort
, In
tegra
tion
, N
oti
fica
tion A
PI
Identity mapping service (IRS service layer)
Reasoning / Integration
Concept identifier store (IRS data layer)Also associating with semantic type. Not more!
Linked Data Cache(Cardinal Assertions)
ProvenanceStore
Evidence Calculator
ACKnowledge
EvidenceStore
Data store 1
Data store 2
Data store n
Imp
ort
, In
tegra
tion
, N
oti
fica
tion A
PI
Identity mapping service (IRS service layer)
Reasoning / Integration
Use
r In
terf
ace
s /
Web s
erv
ices
Concept identifier store (IRS data layer)Also associating with semantic type. Not more!
Linked Data Cache(Cardinal Assertions)
ProvenanceStore
Evidence Calculator
Cura
tion Inte
rface
s
ACKnowledge
Incorporate nanopublications in your
content
Help disseminate the interfaces (HTML
as well as PDF)
Publishers:Publishers:
ACKnowledge
Nanopublications are not only assertions
Nanopublications are not only assertions
ACKnowledge
Nanopublications are also referencesNanopublications
are also references
ACKnowledge
i.e. they can be cited:good for impact & acknowledgement
i.e. they can be cited:good for impact & acknowledgement
ACKnowledge
And...aren’t references open and free ?
And...aren’t references open and free ?
ACKnowledge
<rdf:Description rdf:about=”http://www.nbic.nl/cwa/relation/#26277419#13817745"><cwa:typeRelation rdf:resource=” http://predicate.conceptwiki.org/index.php/#2121378”/><cwa:direction>1,2</cwa:direction><cwa:strength>1.0</cwa:strength><cwa:author rdf:resource=”http://people.conceptwiki.org/index.php/#85094810”/><cwa:provenance rdf:resource=”http://article.conceptwiki.org/index.php/#121646370”/><cwa:timestamp>1240641052059</cwa:timestamp><cwa:annotated_by rdf:resource=”http://people.conceptwiki.org/index.php/#43065817”/><cwa:annotation rdf resource=”http://www.virusdb.org/viruses/av/Heliothis_virescens_insect”></rdf:Description>
ACKnowledge
ACKnowledge
The whole picture
Even if you don’tEven if you don’thave all the detailhave all the detail
detail
ACKnowledge
nanopublications (i.e. references) can also be used to reason
ACKnowledge
protein A
protein X
‘Publications’ yield ‘data’Exposing the ‘unknown knowns’
What is the use of water?
What is the use of water?
ACKnowledge
ACKnowledge
What is the use of information?
What is the use of information?
ACKnowledge
ACKnowledge
Jan Velterop
oCelsius
Ice
Water
ACKnowledge
Jan Velterop
The climate in the scholarly communication world is clearly
hotting up
Thank you
jan.velterop acknowledgeconnect comvelterop conceptweballiance orgat
at
ACKnowledge