Post on 10-Feb-2017
LINKED DATA: THE CHALLENGES
AHEAD
A PERSONAL PERSPECTIVELEE HARLAND, SCIBITE LIMITED
@SCIBITELY
PRESENTED AT OPEN PHACTS MEETING
VIENNA FEB 2016
SciBitehttp://openphacts.org http://scibite.com
CONTEXT FOR SLIDESHARE• THIS TALK WAS PRESENTED AT THE OPEN PHACTS CLOSING MEETING IN
VIENNA FEB 2016 WHEN THE OPEN PHACTS INFRASTRUCTURE WAS HANDED OVER OFFICIALLY TO THE OPEN PHACTS FOUNDATION
• THE AIM OF THE TALK WAS TO DISCUSS SOME OF THE KEY CHALLENGES OF THE ORIGINAL OPEN PHACTS PROJECT BUT IN TODAYS CONTEXT
• PLEASE VISIT HTTP://OPENPHACTS.ORG FOR INFORMATION ON THE OPEN PUBLIC-PRIVATE SEMANTICS-BASED PLATFORM FOR DRUG DISCOVERY AND HELP SUPPORT THIS VALUABLE INITIATIVE!
• PLEASE VISIT HTTP://SCIBITE.COM FOR INFORMATION ON OUR HIGH-THROUGHPUT SEMANTIC TOOLS FOR INTEGRATING SCIENTIFIC DOCUMENTS WITH “BIG DATA” SOLUTIONS AND TEXT MINING!
IT STARTED IN 2009….. Meeting of multiple pharma to discuss key issues, lead to the Open PHACTS
IMI Call text
2009 COMPETITION
http://readwrite.com/2009/12/15/twitters_top_10_tech_trends_of_2009
Even with a lot of cash, success is not
a given
AH… 2009….Was a time of
massive increase in data science / big
data
2016… So if we were planning Open
PHACTS in 2016, what would be on
my mind?
ACKNOWLEDGEMENTS• BRYN WILLIAMS-JONES
• NICK LYNCH
• KIERA MCNEICE
• ANNA GAULTON
• ALL THOSE WHO GAVE SUGGESTIONS
• AND OPEN PHACTS CONSORTIUM FOR A GREAT 5 YEARS AND DELIVERING A UNIQUE SYSTEM
CHALLENGE #1
IF THIS IS NEWS TO YOU, YOU NEED TO STAY IN MORE
A second concern held by some is that a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites”
Rather than labelling people
parasites, lets make the tools to ensure
credit for all involved!
…. Prior to this…
However, many of us who have actually conducted clinical research, managed clinical studies and data collection and analysis, and curated data sets have concerns about the details. The first concern is that someone not involved in the generation and collection of the data may not understand the choices made in defining the parameters. Special problems arise if data are to be combined from independent studies and considered comparable. How heterogeneous were the study populations? Were the eligibility criteria the same….
Actually a very fair comment
RESEARCH REPRODUCIBILITY
See also http://reason.com/archives/2016/01/19/broken-science
This is really becoming a hot topic right now
……estimates for the reproducibility of preclinical research range from 51 percent to 89 percent. They estimate that at least half of all U.S. preclinical biomedical research funding—about $28 billion annually—is therefore squandered……
http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165
But is this just an “academic”
problem, or should industry care?
http://www.cell.com/cell/abstract/S0092-8674%2809%2900316-X
Looks like a good lead– certainly
some drug companies thought
so
http://f1000research.com/articles/5-136/v1
Another paper that was a hot idea back
in the day…
http://www.nature.com/nrd/journal/v10/n9/full/nrd3439-c1.html
A first-of-a-kind analysis of Bayer's internal efforts to validate 'new drug target' claims now not only supports this view but suggests that 50% may be an underestimate; the company's in-house experimental data do not match literature claims in 65% of target-validation projects, leading to project discontinuation.
Industry should *really* care
#DATALAKE
DOES THE #DATALAKE LOOK LIKE THIS
OR THIS?
http://www.cafepress.co.uk/+metadata+t-shirts
http://explorer.openphacts.org/
https://www.w3.org/TR/prov-o/
WHOA….
Quality?
http://lod-cloud.net/
CAN “DATA” PEOPLE HELP?
Quality Of The ExperimentIndependent Confirmations
Negative AssertionsLiterature “dead ends”
Social Commentary / Sentiment AnalysisMarking Questionable Assertions
Reproducibility, Meta Data & Provenance
CHALLENGE #2
HERE’S A SPARQL QUERY
ONTOLOGIES ARE CRITICAL
ESSENTIAL FOR DATA DISCOVERY
2011
Here we talked about how industry
needs open ontologies
BIOPORTAL UPDATES (OF 618 ONTOLOGIES)
Year Six Months Three Months
One Month0102030405060708090
100%
Of O
ntol
ogie
s
Updated In The Last…
Only 1/3rd updated in the last year
SCIENCE MOVES AT A DIFFERENT PACE
OK so apples and oranges, but still,
there’s a vast difference between
the two
ONTOLOGY SUPPORT
BAO incredibly valuable resource, need to support it!
IS IT TIME FOR A NEW STRATEGY?
We may never have “enough” resource,
so what are the alternatives to
ontology sustain?
PISTOIA ONTOLOGY WG
https://www.qmarkets.org/live/pistoia/
ON THE FLY URI
ORPHANET Rare/Orphan diseaseDynamic Phenotype Network In RDF
Generated by TERMite PhenotypeFinder
Ack. Michael Hughes
http://scibite.com
SciBite
Yellow = disease, pink = on-the-fly
phenotype concept generated by text
mining (not found in HPO)
?We need some cool
new ideas!
Reproducibility, Meta Data & Provenance
Ontology
CHALLENGE #3
139755-83-2
A standard, but not “open” – inhibits certain projects
AN OPEN SEMANTIC CHEMISTRY API
21,000,000 structures
Open PHACTS built a completely open semantic chemistry
registry
http://www.slideshare.net/alasdair_gray/scientific-lenses-to-support-multiple-views-over-linked-chemistry-data
With which you can do some very fancy
things
http://www.inchi-trust.org/
Chemistry doesn’t stand still
And we need to support biologicals
too!
https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL1252
Excellent open-data success story!
HOW TO SEMANTICALLY ENCODE TACIT KNOWLEDGE?
http://blogs.sciencemag.org/pipeline/archives/2016/01/19/what-does-one-do-with-these
Some amazing knowledge out
there, how do we get it “into the
graph”?
Reproducibility, Meta Data & Provenance
Ontology
Semantics 4 Therapeutic
s
CHALLENGE #4
The Biggest Challenge In Data Integration Is…
…The Data
OPEN PHACTS APPROACH
DATA TOOLING
The best approach, not doing either in
isolation but a dialog between
both
CHATTER PLOT
data tech
Much more time was spent on issues with the data than
the tech.Not saying the tech
was easy!
WE BUILT SOME CUTTING EDGE TECH
Dynamic Identifier Resolution
Mixing SPARQL & web services
Production Deployment
API Centric Integration
Business Question-Focus
API Management
“App Store” Ecosystem Semantic Chemistry
Nanopubs & Provenance Cool, Friendly UI’s
DRIVING FORWARD THROUGH DIRECT DIALOG
But, that every data conversation
generated more and more
questions!
DATA IS ALWAYS EVOLVING
Chembl012009
When OPS Started
By the time we really started making RDF!
And things didn’t stand
still!
Today- Chembl schema is unrecognizable from
2009.Open PHACTS is the way
companies can work with providers to stay
on top of data evolution!
Reproducibility, Meta Data & Provenance
Ontology
Semantics 4 Therapeutic
sData
Evolution
CHALLENGE #5
PCSK9: AN “ICONIC EXAMPLE” OF TRANSLATIONAL MEDICINE IN THE GENOMICS ERA*
*http://www.nature.com/news/genetics-a-gene-of-rare-effect-1.12773
PCSK9 is an example of genomics,
informatics and drug discovery
coming together for real change
TRANSLATING PRECLINICAL DISCOVERIESResearchers have found that removing a gene called USF1 protects mice against heart disease, diabetes and obesity
We don’t yet have a system where we can just jump and
explore the biology and chemistry of
the USF1 pathways….
Reproducibility, Meta Data & Provenance
Ontology
Semantics 4 Therapeutic
sData
Evolution
The Testable Disease Network
Conclusions - this is what i’d be arguing for in Open PHACTS
2016