Open PHACTS : Linked Data Future Challenges

Post on 10-Feb-2017

760 views 0 download

Transcript of Open PHACTS : Linked Data Future Challenges

LINKED DATA: THE CHALLENGES

AHEAD

A PERSONAL PERSPECTIVELEE HARLAND, SCIBITE LIMITED

@SCIBITELY

PRESENTED AT OPEN PHACTS MEETING

VIENNA FEB 2016

SciBitehttp://openphacts.org http://scibite.com

CONTEXT FOR SLIDESHARE• THIS TALK WAS PRESENTED AT THE OPEN PHACTS CLOSING MEETING IN

VIENNA FEB 2016 WHEN THE OPEN PHACTS INFRASTRUCTURE WAS HANDED OVER OFFICIALLY TO THE OPEN PHACTS FOUNDATION

• THE AIM OF THE TALK WAS TO DISCUSS SOME OF THE KEY CHALLENGES OF THE ORIGINAL OPEN PHACTS PROJECT BUT IN TODAYS CONTEXT

• PLEASE VISIT HTTP://OPENPHACTS.ORG FOR INFORMATION ON THE OPEN PUBLIC-PRIVATE SEMANTICS-BASED PLATFORM FOR DRUG DISCOVERY AND HELP SUPPORT THIS VALUABLE INITIATIVE!

• PLEASE VISIT HTTP://SCIBITE.COM FOR INFORMATION ON OUR HIGH-THROUGHPUT SEMANTIC TOOLS FOR INTEGRATING SCIENTIFIC DOCUMENTS WITH “BIG DATA” SOLUTIONS AND TEXT MINING!

IT STARTED IN 2009….. Meeting of multiple pharma to discuss key issues, lead to the Open PHACTS

IMI Call text

2009 COMPETITION

http://readwrite.com/2009/12/15/twitters_top_10_tech_trends_of_2009

Even with a lot of cash, success is not

a given

AH… 2009….Was a time of

massive increase in data science / big

data

2016… So if we were planning Open

PHACTS in 2016, what would be on

my mind?

ACKNOWLEDGEMENTS• BRYN WILLIAMS-JONES

• NICK LYNCH

• KIERA MCNEICE

• ANNA GAULTON

• ALL THOSE WHO GAVE SUGGESTIONS

• AND OPEN PHACTS CONSORTIUM FOR A GREAT 5 YEARS AND DELIVERING A UNIQUE SYSTEM

CHALLENGE #1

IF THIS IS NEWS TO YOU, YOU NEED TO STAY IN MORE

A second concern held by some is that a new class of research person will emerge — people who had nothing to do with the design and execution of the study but use another group’s data for their own ends, possibly stealing from the research productivity planned by the data gatherers, or even use the data to try to disprove what the original investigators had posited. There is concern among some front-line researchers that the system will be taken over by what some researchers have characterized as “research parasites”

Rather than labelling people

parasites, lets make the tools to ensure

credit for all involved!

…. Prior to this…

However, many of us who have actually conducted clinical research, managed clinical studies and data collection and analysis, and curated data sets have concerns about the details. The first concern is that someone not involved in the generation and collection of the data may not understand the choices made in defining the parameters. Special problems arise if data are to be combined from independent studies and considered comparable. How heterogeneous were the study populations? Were the eligibility criteria the same….

Actually a very fair comment

RESEARCH REPRODUCIBILITY

See also http://reason.com/archives/2016/01/19/broken-science

This is really becoming a hot topic right now

……estimates for the reproducibility of preclinical research range from 51 percent to 89 percent. They estimate that at least half of all U.S. preclinical biomedical research funding—about $28 billion annually—is therefore squandered……

http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002165

But is this just an “academic”

problem, or should industry care?

http://www.cell.com/cell/abstract/S0092-8674%2809%2900316-X

Looks like a good lead– certainly

some drug companies thought

so

http://f1000research.com/articles/5-136/v1

Another paper that was a hot idea back

in the day…

http://www.nature.com/nrd/journal/v10/n9/full/nrd3439-c1.html

A first-of-a-kind analysis of Bayer's internal efforts to validate 'new drug target' claims now not only supports this view but suggests that 50% may be an underestimate; the company's in-house experimental data do not match literature claims in 65% of target-validation projects, leading to project discontinuation.

Industry should *really* care

#DATALAKE

DOES THE #DATALAKE LOOK LIKE THIS

OR THIS?

http://www.cafepress.co.uk/+metadata+t-shirts

http://explorer.openphacts.org/

https://www.w3.org/TR/prov-o/

WHOA….

Quality?

http://lod-cloud.net/

CAN “DATA” PEOPLE HELP?

Quality Of The ExperimentIndependent Confirmations

Negative AssertionsLiterature “dead ends”

Social Commentary / Sentiment AnalysisMarking Questionable Assertions

Reproducibility, Meta Data & Provenance

CHALLENGE #2

HERE’S A SPARQL QUERY

ONTOLOGIES ARE CRITICAL

ESSENTIAL FOR DATA DISCOVERY

2011

Here we talked about how industry

needs open ontologies

BIOPORTAL UPDATES (OF 618 ONTOLOGIES)

Year Six Months Three Months

One Month0102030405060708090

100%

Of O

ntol

ogie

s

Updated In The Last…

Only 1/3rd updated in the last year

SCIENCE MOVES AT A DIFFERENT PACE

OK so apples and oranges, but still,

there’s a vast difference between

the two

ONTOLOGY SUPPORT

BAO incredibly valuable resource, need to support it!

IS IT TIME FOR A NEW STRATEGY?

We may never have “enough” resource,

so what are the alternatives to

ontology sustain?

PISTOIA ONTOLOGY WG

https://www.qmarkets.org/live/pistoia/

ON THE FLY URI

ORPHANET Rare/Orphan diseaseDynamic Phenotype Network In RDF

Generated by TERMite PhenotypeFinder

Ack. Michael Hughes

http://scibite.com

SciBite

Yellow = disease, pink = on-the-fly

phenotype concept generated by text

mining (not found in HPO)

?We need some cool

new ideas!

Reproducibility, Meta Data & Provenance

Ontology

CHALLENGE #3

139755-83-2

A standard, but not “open” – inhibits certain projects

AN OPEN SEMANTIC CHEMISTRY API

21,000,000 structures

Open PHACTS built a completely open semantic chemistry

registry

http://www.slideshare.net/alasdair_gray/scientific-lenses-to-support-multiple-views-over-linked-chemistry-data

With which you can do some very fancy

things

http://www.inchi-trust.org/

Chemistry doesn’t stand still

And we need to support biologicals

too!

https://www.ebi.ac.uk/chembl/compound/inspect/CHEMBL1252

Excellent open-data success story!

HOW TO SEMANTICALLY ENCODE TACIT KNOWLEDGE?

http://blogs.sciencemag.org/pipeline/archives/2016/01/19/what-does-one-do-with-these

Some amazing knowledge out

there, how do we get it “into the

graph”?

Reproducibility, Meta Data & Provenance

Ontology

Semantics 4 Therapeutic

s

CHALLENGE #4

The Biggest Challenge In Data Integration Is…

…The Data

OPEN PHACTS APPROACH

DATA TOOLING

The best approach, not doing either in

isolation but a dialog between

both

CHATTER PLOT

data tech

Much more time was spent on issues with the data than

the tech.Not saying the tech

was easy!

WE BUILT SOME CUTTING EDGE TECH

Dynamic Identifier Resolution

Mixing SPARQL & web services

Production Deployment

API Centric Integration

Business Question-Focus

API Management

“App Store” Ecosystem Semantic Chemistry

Nanopubs & Provenance Cool, Friendly UI’s

DRIVING FORWARD THROUGH DIRECT DIALOG

But, that every data conversation

generated more and more

questions!

DATA IS ALWAYS EVOLVING

Chembl012009

When OPS Started

By the time we really started making RDF!

And things didn’t stand

still!

Today- Chembl schema is unrecognizable from

2009.Open PHACTS is the way

companies can work with providers to stay

on top of data evolution!

Reproducibility, Meta Data & Provenance

Ontology

Semantics 4 Therapeutic

sData

Evolution

CHALLENGE #5

PCSK9: AN “ICONIC EXAMPLE” OF TRANSLATIONAL MEDICINE IN THE GENOMICS ERA*

*http://www.nature.com/news/genetics-a-gene-of-rare-effect-1.12773

PCSK9 is an example of genomics,

informatics and drug discovery

coming together for real change

TRANSLATING PRECLINICAL DISCOVERIESResearchers have found that removing a gene called USF1 protects mice against heart disease, diabetes and obesity

We don’t yet have a system where we can just jump and

explore the biology and chemistry of

the USF1 pathways….

Reproducibility, Meta Data & Provenance

Ontology

Semantics 4 Therapeutic

sData

Evolution

The Testable Disease Network

Conclusions - this is what i’d be arguing for in Open PHACTS

2016