Scientific lenses to support multiple views over linked chemistry data

24
Scientific lenses to support multiple views over linked Chemistry data Alasdair J G Gray [email protected] alasdairjggray.co.u k @gray_alasdair Open PHACTS [email protected] openphacts.org @open_phacts

description

When are two entries about a small molecule in different datasets the same? If they have the same drug name, chemical structure, or some other criteria? The choice depends upon the application to which the data will be put. However, existing Linked Data approaches provide a single global view over the data with no way of varying the notion of equivalence to be applied. In this paper, we present an approach to enable applications to choose the equivalence criteria to apply between datasets. Thus, supporting multiple dynamic views over the Linked Data. For chemical data, we show that multiple sets of links can be automatically generated according to different equivalence criteria and published with semantic descriptions capturing their context and interpretation. This approach has been applied within a large scale public-private data integration platform for drug discovery. To cater for different use cases, the platform allows the application of different lenses which vary the equivalence rules to be applied based on the context and interpretation of the links.

Transcript of Scientific lenses to support multiple views over linked chemistry data

Page 1: Scientific lenses to support multiple views over linked chemistry data

Scientific lenses to support multiple views over linked Chemistry data

Alasdair J G [email protected]@gray_alasdair

Open PHACTS

[email protected]@open_phacts

Page 2: Scientific lenses to support multiple views over linked chemistry data

Multiple Identities

P12047X31045

P12047

GB:29384RS_2353

21 October 2014 Scientific Lenses – A. J. G. Gray 2

Page 3: Scientific lenses to support multiple views over linked chemistry data

Gleevec®: Imatinib Mesylate

21 October 2014 Scientific Lenses – A. J. G. Gray 3

DrugbankChemSpider PubChem

Imatinib

MesylateImatinib MesylateYLMAHDNUQAMNNX-UHFFFAOYSA-N

Page 4: Scientific lenses to support multiple views over linked chemistry data

Gleevec®: Imatinib Mesylate

21 October 2014 Scientific Lenses – A. J. G. Gray 4

DrugbankChemSpider PubChem

Imatinib

MesylateImatinib MesylateYLMAHDNUQAMNNX-UHFFFAOYSA-N

Are these records the same?It depends upon your task!

Page 5: Scientific lenses to support multiple views over linked chemistry data

Example Use Cases

21 October 2014 Scientific Lenses – A. J. G. Gray 5

I need to perform an analysis, give me

details of the active compound in Gleevec.

Which targets are known to interact

with Gleevec?

Page 6: Scientific lenses to support multiple views over linked chemistry data

Scientific Lenses – A. J. G. Gray 6

skos:exactMatch(InChI)

Strict Relaxed

Analysing Browsing

Structure Lens

21 October 2014

I need to perform an analysis, give me details of the active compound in

Gleevec.

Page 7: Scientific lenses to support multiple views over linked chemistry data

Scientific Lenses – A. J. G. Gray 7

skos:closeMatch(Drug Name)

skos:closeMatch(Drug Name)

skos:exactMatch(InChI)

Strict Relaxed

Analysing Browsing

Name Lens

21 October 2014

Which targets are known to interact with Gleevec?

Page 8: Scientific lenses to support multiple views over linked chemistry data

8

What is a Scientific Lens?

A lens defines a conceptual view over the data Specifies operational equivalence conditions

Consists of: Identifier (URI) Title

(dct:title) Description

(dct:description) Documentation link

(dcat:landingPage) Creator

(pav:createdBy) Timestamp

(pav:createdOn) Equivalence rules

(bdb:linksetJustification)16 October 2014 Scientific Lenses – A. J. G. Gray

Page 9: Scientific lenses to support multiple views over linked chemistry data

9

Ibuprofen consists of two equally active stereoisomers.• Stereoisomers not always represented in dataUsers wish to retrieve information for any stereoisomer.

CHEMBL427526

CHEMBL521CHEMBL175

Lens Effects: Ibuprofen

21 October 2014 Scientific Lenses – A. J. G. Gray

Page 10: Scientific lenses to support multiple views over linked chemistry data

10

Default Lens

21 October 2014 Scientific Lenses – A. J. G. Gray

Ibuprofen consists of two equally active stereoisomers.• Stereoisomers not always represented in dataUsers wish to retrieve information for any stereoisomer.

Page 11: Scientific lenses to support multiple views over linked chemistry data

11

Stereoisomer Lens

21 October 2014 Scientific Lenses – A. J. G. Gray

Ibuprofen consists of two equally active stereoisomers.• Stereoisomers not always represented in dataUsers wish to retrieve information for any stereoisomer.

Page 12: Scientific lenses to support multiple views over linked chemistry data

12

Mapping Generation

21 October 2014 Scientific Lenses – A. J. G. Gray

ops:OPS437281

ops:OPS380297

has_stereoundefined_parent [ci:CHEMINF_000456]

ops:OPS380292

is_stereoisomer_of[ci:CHEMINF_000461] Other relationships

• has part• is tautomer of• uncharged counterpart• isotope…

Page 13: Scientific lenses to support multiple views over linked chemistry data

13

Explorer Screenshot

21 October 2014 Scientific Lenses – A. J. G. Gray

Page 14: Scientific lenses to support multiple views over linked chemistry data

14

Explorer Screenshot

21 October 2014 Scientific Lenses – A. J. G. Gray

Page 15: Scientific lenses to support multiple views over linked chemistry data

15

OPS Discovery Platform

RDFNanopub

Db

VoID

Data Cache (Virtuoso Triple Store)

Semantic Workflow Engine

Linked Data API (RDF/XML, TTL, JSON)DomainSpecificServices

Identity Resolution

Service

Chemistry RegistrationNormalisation & Q/C

IdentifierManagement

Service

Indexing

Co

re P

latf

orm

P12374EC2.43.4

CS4532

“Adenosine receptor 2a”

RDF

VoID

Db

RDFNanopub

Db

VoID

RDF

Db

VoID

RDFNanopub

VoID

Public Content Commercial

Public Ontologies

User Annotations

Apps

21 October 2014 Scientific Lenses – A. J. G. Gray

Page 16: Scientific lenses to support multiple views over linked chemistry data

?iri cheminf:logd ?logd .FILTER (?iri = cw:979b545d-f9a9 || ?iri = cs:2157 || ?iri = chembl:1280 || ?iri = db:db00945 )

GRAPH <http://rdf.chemspider.com> {

}GRAPH <http://…

cw:979b545d-f9a9 cheminf:logd ?logd .

Identity Mapping Service

(BridgeDB)

Query Expander Service

Profiles

Mappings

Q, L1 Q’

[cw:979b545d-f9a9,cs:2157, chembl:1280,db:db00945]

cw:979b545d-f9a9, L1

cw:979b545d-f9a9 cheminf:logd ?logd .

Lenses: Under the hood

• IMS call adds overhead• Call time below human perception [1]• Can also be achieved through UNION[1] C. Y. A. Brenninkmeijer, C. Goble, A. J. G. Gray, P. Groth, A. Loizou, and S. Pettifer, “Including Co-referent URIs in a SPARQL Query,” COLD2013, http://ceur-ws.org/Vol-1034/

21 October 2014 Scientific Lenses – A. J. G. Gray 16

Page 17: Scientific lenses to support multiple views over linked chemistry data

17

API Hits

21 October 2014 Scientific Lenses – A. J. G. Gray

April 2013 – March 2014: 15.8mApril 2014 – Sept 2014: 14mTotal: 29.8 million

Page 18: Scientific lenses to support multiple views over linked chemistry data

Conclusions

Scientific data is complex and messy

Requires flexibility in linking

Equivalence depends upon context

Lenses provide support for operational

equivalence

Chemical structures support automatic

computing of links with justification

21 October 2014 Scientific Lenses – A. J. G. Gray 18

Page 19: Scientific lenses to support multiple views over linked chemistry data

Co-authorsRoyal Society of Chemistry Colin Batchelor Karen Karapetyan Jon Steele Valery Tkachenko Antony Williams

University of Manchester Christian Brenninkmeijer Ian Dunlop Carole Goble Steve Pettifer Robert Stevens

Swiss Institute for Bioinformatics Christine Chichester

European Bioinformatics

Institute Mark Davies Anna Gaulton John Overington

University of Vienna Daniela Digles

Maastricht University Chris Evelo Andra Waagmeester Egon Willighagen

VU University of Amsterdam Paul Groth Antonis Loizou

Connected Discovery Lee Harland

21 October 2014 Scientific Lenses – A. J. G. Gray 19

Page 20: Scientific lenses to support multiple views over linked chemistry data

Questions

Alasdair J G [email protected]@gray_alasdair

Open [email protected]@open_phacts

21 October 2014 Scientific Lenses – A. J. G. Gray 20

Demo at stall 33 this evening!

Page 21: Scientific lenses to support multiple views over linked chemistry data

Scientific Lenses – A. J. G. Gray 21

Source Initial Records Triples Properties

ChEMBL 1,481,473 304,360,749 77

DrugBank 19,628 517,584 74

UniProt 564,246 405,473,138 82

ENZYME 6,187 73,838 2

ChEBI 40,575 1,673,863 2

GeneOntology 38,137 2,447,682 26

GOA 661,232 1,765,622,393 15

ChemSpider 1,361,568 215,193,441 23

ConceptWiki 2,828,966 4,291,131 1

WikiPathways 946 1,949,074 34

Over 2.7 billion

triples

Open PHACTS Data

21 October 2014

Page 22: Scientific lenses to support multiple views over linked chemistry data

22

App EcosystemAn “App Store”?

http://www.openphactsfoundation.org/apps.html

Explorer Explorer2 ChemBioNavigator Target Dossier Pharmatrek Helium

MOE Collector Cytophacts Utopia Garfield SciBite

KNIME Mol. Data Sheets PipelinePilot scinav.it Taverna

21 October 2014 Scientific Lenses – A. J. G. Gray

Page 23: Scientific lenses to support multiple views over linked chemistry data

Discovery Platform

21 October 2014 Scientific Lenses – A. J. G. Gray 23

Drug Discovery Platform

Apps

Domain API

Interactive responses

Production qualityintegration platform

MethodCalls

Page 24: Scientific lenses to support multiple views over linked chemistry data

Linked Data API

21 October 2014 Scientific Lenses – A. J. G. Gray 24

Drug

Disease (1.4)

PathwayTarget

https://dev.openphacts.org/