DisGeNET: a discovery platform to support translational research and drug discovery

Post on 23-Jan-2018

251 views 1 download

Transcript of DisGeNET: a discovery platform to support translational research and drug discovery

DisGeNET-RDF: a GDA Linked Open Data resource

DisGeNET: a discovery platform to support translational research and drug discovery Janet Piñero, Núria Queralt-Rosinach, Àlex Bravo, Ferran Sanz and Laura I. Furlong

Integrative Biomedical Informatics Group, Research Programme on Biomedical Informatics; Hospital del Mar Medical Research Institute; Pompeu Fabra University

Acknowledgements

The authors thank the Open PHACTS partners, Michel Dumontier and the OpenLink staff for their input, collaboration and help. Funding: We received support from ISCIII-FEDER (PI13/00082, CP10/00524), from the IMI-JU under grants agreements nº 115002 (eTOX), nº 115191 (Open PHACTS)], nº 115372 (EMIF) and nº 115735 (iPiE), resources of which are composed of financial con-tribution from the European Union's Seventh Framework Pro-gramme (FP7/2007-2013) and EFPIA companies’ in kind contribu-tion, and the EU H2020 Programme 2014-2020 under grant agreements no. 634143 (MedBioinformatics) and no. 676559 (Elixir-Excelerate). The Research Programme on Biomedical Informatics (GRIB) is a node of the Spanish National Institute of Bioinformatics (INB).

DisGeNET: Disease-Gene NETwork of relations for discovery

DATA

DISCOVERY

KNOWLEDGE BASE

TOOLS FOR EXPLORATION AND ANALYSIS

Motivation: Better understanding of human gene component and disease mechanisms for translational research and drug discovery and development. Challenge: One of the major current bottlenecks for knowledge discovery on the genetic component of diseases is that the information is fragmented. The vast amount of biomedical information about genotype-phenotype relations is distributed in several databases, represented and annotated using different data models, vocabularies and standards, and it is domain and technology-specific, which hampers their access, integration, analysis, and interpretation. Approach: DisGeNET Discovery Platform1 collects and integrates the available information on gene-disease associations (GDAs), covering the whole spectrum of human diseases, and using standards for their annotation and representation.

DisGeNET in the LOD cloud for translational research

• DisGeNET + external multidomain sources in LOD. • It is interlinked to other biomedical databases to answer scientific questions that need the interrogation of cross-domain resources. • It aims to support the development of bioinformatic Semantic Web applications to extract key knowledge on the molecular mechanisms of diseases.

Implementation: The platform is composed of a knowledge base and a set of tools for data analysis and interpretation.

EVIDENCE-BASED DISCOVERY

CLINICIAN INTEROPERABILITY

METADATA

DATABASES & LITERATURE STANDARDS

INTEGRATION

OPEN

http://www.disgenet.org/

RESEARCHER CURATOR

BIOINFORMATICIAN & DEVELOPER

DISCOVERABILITY COMMUNITY USE

LARGE-SCALE EXTRACTION AND INTEGRATION

DIGITAL PUBLICATION, SHARING AND LINKING

Usage stats (Ago2014-Ago2015): • 12,040 users, 22,696 sessions • 14,494 downloads • DisGeNET used in +20 publications,

cited in +60 articles • Other Projects: PubAnnotation,

OpenLifeData

Registered: • biosharing • OMICtools • NeuroLex • Datahub

Present in the Semantic Web: • URI/RDF/nanpublications • Machine-processable • Semantic integration • Links to the Linked Open Data (LOD)

cloud • Data analysis across domains

SEMANTIC WEB

What is the tissue expression pattern of the genes associated to Obesity?

• Large-scale integration across domains

• 17,181 Genes • PANTHER class

• 14,610 Diseases • MeSH class

60% complex, 36% rare/Mendelian, and 4% infectious diseases

DO MSH OMIM NCI ORDO ICD9

19 58 38 33 13 12

TRACK OF EVIDENCE

S = WCURATED + WPREDICTED + WLITERATURE

• Provenance (PubMed ID, source)

• DisGeNET score (evidence)

Web: http://www.disgenet.org/ RDF: http://rdf.disgenet.org/ SPARQL: http://rdf.disgenet.org/sparql/ Open PHACTS API: https://dev.openphacts.org

ACCESS Open Database License: http://opendatacommons.org/licenses/odbl/1.0/

Downloads: • Tab separated plain text • SQLite • RDF • Trusty nanopublications Web interface SPARQL endpoint / Linked Data browser Open PHACTS Discovery Platform Nanopublication network disGeNET2R R package

DIFFERENT USER PROFILES AVAILABILITY Metadata: • data-item description • dataset description Programmatic access: • Automatic analysis • Higher speed • Reduce error • Share results • Embed in workflows

REPRODUCIBILITY

Several formats and models

Transparency and Validation

SOURCES

Recent findings

429,111 Gene-Disease Associations

Sentence description

NORMALIZATION

HARMONIZATION

• NCBI Gene ID • UMLS CUIs.

DisGeNET association type ontology

INTEROPERABILITY

SYNTACTIC

COMMON IDs and ONTOLOGIES

SEMANTIC • 11 common ontologies in

• RDF2

• Nanopublications3

• GENE:

• DISEASE

STANDARDIZATION

Digital objects

DisGeNET association type ontology

Semanticscience Integrated Ontology (SIO)4

• Normalized Identification Scheme http://rdf.disgenet.org/resource/gda/ + ID

http://lod-cloud.net/; Aug 2014

4,962,315 RDF links to RDF datasets in the LOD

https://datahub.io/dataset/disgenet (more statistics)

LOD cloud RDFIZATION

METADATA

RDF

INTERLINKING

• Dataset (Open PHACTS + )

• Linksets (Open PHACTS + )

• Use Open PHACTS guidelines • Dereferenceable URIs (primary or ) • SIO

• • )

OWL

• NCBI Gene ID • PANTHER Classification

• UMLS CUIs • MeSH Classification

• Data providers • Disease annotation in the Open PHACTS Discovery Platform5

• OMIM included • > 20 000 000 of triples

RDF SCHEMA METADATA INTERLINKING • Linksets providers • > 70 000 number of linksets

FUTURE

New data: • Disease-phenotype associations (HPO) • New use cases • New API calls

Score: • Add to API calls

EXPLORER KNIME

More @ http://www.disgenet.org/web/DisGeNET/menu/rdf#sparql-queries-2

MAPPINGS TO OTHER DISEASE TERMINOLOGIES

DRUG

TARGET PATHWAY

DISEASE

DISEASE PHENOTYPE

DISEASE GENE

GDA

EVIDENCE SNP SCORE

Gene-disease association as entity

• Data item

• Dataset

<disease> <void:inDataset> <dgn-void:disease-dataset>

http://www.myexperiment.org/groups/1125.html

API

References 1. Piñero, J., Queralt-Rosinach, N., Bravo, A., Deu-Pons, J., Bauer-Mehren, A., Baron, M., … Furlong, L. I. (2015). DisGeNET: a

discovery platform for the dynamical exploration of human diseases and their genes. Database, 2015(0), bav028–bav028. 2. Queralt-Rosinach, N., Piñero,J. , Bravo, À, Sanz, F. and Furlong, L.I. DisGeNET-RDF: harnessing the innovative power of the

Semantic Web to explore the genetic basis of diseases, 2015 (submitted). 3. Queralt-Rosinach, N., Kuhn, T., Chichester, C., Dumontier, M., Sanz, F., and Furlong, L.I., Publishing DisGeNET as

Nanopublications. Semantic Web Journal, (to appear), 1-10, 2015. 4. Dumontier, M., Baker, C. J., Baran, J., Callahan, A., Chepelev, L., Cruz-Toledo, J., … Hoehndorf, R. (2014). The Semanticscience

Integrated Ontology (SIO) for biomedical research and knowledge discovery. Journal of Biomedical Semantics, 5(1), 2014. 5. Gray, A. J. G., Groth, P., Loizou, A., Askjaer, S., Brenninkmeijer, C., Burger, K., … Williams, A. J. (2014, January 1). Applying linked

data approaches to pharmacology: Architectural decisions and implementation. Semantic Web. IOS Press. doi:10.3233/SW-2012-0088

• GDAs described by SIO

https://dev.openphacts.org

/disease/getTargets

http://rdf.disgenet.org/void-v3.0.0.ttl

Which compounds target proteins associated with Parkinson's disease or Alzheimer's disease?

DisGeNET in the Open PHACTS Discovery Platform for drug discovery and development