Velterop 2 a ssp arlington may 2015
Transcript of Velterop 2 a ssp arlington may 2015
Big Journal LiteratureBig Usage
Jan Velterop – SSP – Arlington, May 28, 2015
11,135,542
More than 2 addedevery minute of 2014
Number of abstracts in PubMed
Information overload!
that Overload?
Or rapidly increasing knowledge…
…making a world of difference that can change the course of scientific thought?
The purpose of scientific communication:
Dissemination of knowledge
• maximal usefulness of scientific research results efficient, fast, and effective new knowledge creation & discovery
Optimal dissemination for
Efficient?
There’s too much and it’s impossible to read
everything, even if you have access!
Lamp post research
Looking merely at the literature that one can read – which is not necessarily all the literature that is potentially important to one’s
research
Lamp post research:
Big Usage
But not in the way we’re used to
So, what to do?
problemproblemEveryEvery has itshas its solutionsolution
Possible strategies:
1.Publish a smaller number of papers
2.Accept that an ever smaller proportion of the available papers is actually being read
3.Capture the knowledge contained in all papers and map it in such a way that you can navigate that knowledge
Possible strategies:
1.Publish a smaller number of papersMaybe, but if it means less information, it’s
ludicrous
2.Accept that an ever smaller proportion of the available papers is actually being read
3.Capture the knowledge contained in all papers and map it in such a way that you can navigate that knowledge
Possible strategies:
1.Publish a smaller number of papers
2.Accept that an ever smaller proportion of the available papers is actually being read
How to choose, though?
3.Capture the knowledge contained in all papers and map it in such a way that you can navigate that knowledge
In any event:
l’embarras du choixIn any event:
l’embarras du choix
Possible strategies:
1.Publish a smaller number of papers
2.Accept that an ever smaller proportion of the available papers is actually being read
3.Capture the knowledge contained in all papers and map it in such a way that you can navigate that knowledge
Yes! Helps to see trends and what to choose!
First
create an overview…
…only then
start digging
How might we create overviews?
“As the rate of publishing accelerates, the need for computational support to work out which articles to read, and how to interpret, reproduce and validate the claims they contain is growing.”
Quote from ‘Lazarus’: http://www.bbsrc.ac.uk/pa/grants/AwardDetails.aspx?FundingReference=BB/L005298/1
Extract Key Insights
Extract Key Insights
Extract Key Insights
Extract Key Insights
Imagine you had a paper that concluded:
“On hot days, it turns out that aspirin decreases the chances of blot clots, but increases the chances of heart attack in humans; the effect wasn't observed in rats at all; simulations of dogs seem to suggest that the effect is present but independent of temperature unless the dog is accompanied by a human”
Imagine you had a paper that concluded:
“On hot dayshot days, it turns out that aspirinaspirin decreasesdecreases the chances of blot clotsblot clots, but increasesincreases the chances of heart attackheart attack in humanshumans; the effect wasn't observed in ratsrats at all; simulations of dogsdogs seem to suggest that the effect is present but independent of temperaturetemperature unless the dogdog is accompanied by a humanhuman”
Significant concepts:
[CHEMBL25] (aspirin)[EFO_0001702] ('temperature' from the experimental factors ontology)[Canis lupus familiaris][Homo sapiens][Mus musculus]
Headline Interactions (in the form of Triples):
[ASPIRIN] [DECREASES] [THROMBOSIS][ASPIRIN] [INCREASES] [MYOCARDIAL INFARCTION]
Significant concepts:
[CHEMBL25] (aspirin)[EFO_0001702] ('temperature' from the experimental factors ontology)[Canis lupus familiaris][Homo sapiens][Mus musculus]
Headline Interactions (in the form of Triples):
[ASPIRIN] [DECREASES] [THROMBOSIS][ASPIRIN] [INCREASES] [MYOCARDIAL INFARCTION]
Add this to the article’s abstract (after it’s been validated by the author):
Most efficient:If publishers were to do this (doesn’t cost much, and makes articles far more useful)
In case publishers don’t, alternative ways are being developed outside publishers’ control
publishing data in articles
Currently:
equals burying data R.I.P.R.I.P.
ocumentsVia Utopia Documents, LAZARUS ‘resurrects’
knowledge from being buried in articles:• entities (‘concepts’, incl. synonyms, e.g.
proteins)• phrases, statements, assertions (e.g. triples)• molecules (incl. Markush structure groups)• graphs• tables
http://utopiadocs.com
• entities (‘concepts’, incl. synonyms, e.g. proteins)• phrases, statements, assertions (e.g. triples)• molecules (incl. Markush structure groups)• graphs• tables
These are captured – with their provenance, e.g. DOI – in a ‘Knowledge Graph’ of their relationshipsWhen assertions are captured, they are compared to the Knowledge Graph and labelled as ‘new’ (to the Graph) or ‘already found earlier’
should be should be interesting interesting for the peer for the peer
reviewer of a reviewer of a newly newly
submitted submitted articlearticle
should be should be interesting interesting for the peer for the peer
reviewer of a reviewer of a newly newly
submitted submitted articlearticle
“Lazarus to harness the crowd reading life-science articles to resurrect the swathes of legacy data buried in charts, tables, diagrams and free-text, to liberate processable data into a shared resource that benefits the community.”
“Lazarus to harness the crowd reading life-science articles to resurrect the swathes of legacy data buried in charts, tables, diagrams and free-text, to liberate processable data into a shared resource that benefits the community.”
“…activities currently carried out anyway by individuals for their own purposes (annotating, cross-referencing articles with databases, organising collections of articles).”
“Lazarus to harness the crowd reading life-science articles to resurrect the swathes of legacy data buried in charts, tables, diagrams and free-text, to liberate processable data into a shared resource that benefits the community.”
Works on any pdf, from
Works on any pdf, from
paywalled and open sources
paywalled and open sources
alikealikeWorks on any pdf, from
Works on any pdf, from
paywalled and open sources
paywalled and open sources
alikealike
“…activities currently carried out anyway by individuals for their own purposes (annotating, cross-referencing articles with databases, organising collections of articles).”
VHL protein binds to HIF-α which is ubiquitinated and tagged for degradation in the proteasome.
‘Assertions’ and ‘significant concepts’ extracted from articles (either by the publisher or by others, like Utopia’s LAZARUS), are added to a growing ‘knowledge graph’ which can be analysed for trends, clusters, areas of intensive activity, etc.
Getting the picture from a large number of data
What we need is information extracted from as many
articles as possible
The more we have, the ‘sharper’ the knowledge
picture
Getting a better picture from even more assertions
Homing in
i.e. making the
choice what to
read in detaili.e. making the
choice what to
read in detail
It’s not just about finding information
It’s also – and possibly more –about the value & power of
‘recombinant knowledge’
BRAIN — Bio Relations And Intelligence Network
“Recombinant Knowledge”
>>>>
Once researchers have identified the articles they really need to read,
it should be made very easy to do so
Ergo, what publishers should do, too, is to make all articles
available in all formats: HTML, XML, PDF and ePub – even print, on demand.
Also on mobile devices
For instance:
Easier than you might think
(www.researchpad.co)
Build collection of favourites
Read full text
Inspect metrics
share with others
[email protected] technical inquiries: [email protected]
In their words:
ResearchPad Launch Process
ProjectDefinition
Branding
Publishing
Go LiveTurnaround
Time - 8 weeks
Slide borrowed from:
What ResearchPad can do for publishers who want it, at no extra cost*, is to integrate a publisher’s content with anything from elsewhere that’s freely available with open access, so that this open access material can be accessed from
within the publisher’s platform
* personal communication
[email protected] technical inquiries: [email protected]
The End