Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

24
Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly

Transcript of Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

Page 1: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

Semantics for eScience

Susie Stephens, Principal Research Scientist, Eli Lilly

Page 2: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

Outline

• Introduction to the Semantic Web

• W3C’s Semantic Web for Health Care and Life Sciences Interest Group

• Semantic Web Solutions at Lilly

Page 3: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

Introduction to the Semantic Web

Page 4: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

Drivers for the Semantic Web

• Business models develop rapidly these days, so infrastructure that supports change is needed

• Organizations are increasingly forming and disbanding collaborations so need to be able to better share data

• Increasing need in pharma to be able to query across data silos

• Data is growing so quickly that it is no longer possible for individuals to identify patterns in their heads

• Increasing recognition of the benefits of collective intelligence

Page 5: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

Characterizing the Semantic Web

• Semantic Web is an interoperability technology

• An architecture for interconnected communities and vocabularies

• A set of interoperable standards for knowledge exchange

Page 6: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

Creating a Web of Data

Source: Ivan Herman

Graph representation

Data in various formats

Applications

Page 7: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

Mashing Data

Source: W3C

Page 8: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

W3C’s Semantic Web for Health Care and Life Sciences Interest Group

Page 9: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

Task Forces

• Terminology – Semantic Web representation of existing resources• Task lead - John Madden

• Scientific Discourse – building communities through networking• Task leads - Tim Clark, John Breslin

• Clinical Observations Interoperability – patient recruitment in trials• Task lead - Vipul Kashyap

• BioRDF – integrated neuroscience knowledge base• Task lead - Kei Cheung

• Linking Open Drug Data – aggregation of Web-based drug data • Task lead - Chris Bizer

• Other Projects: Clinical Decision Support, URI Workshop, Collaborations with CDISC & HL7

Page 10: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

BioRDF: Integrating Heterogeneous Data

• Integration and analysis of heterogeneous data sets• Hypothesis, Genome, Pathways, Molecular Properties, Disease, etc.

NeuronDB

BAMS

NC Annotations

Homologene

SWAN

Entrez Gene

Gene Ontology

Mammalian Phenotype

PDSPki

BrainPharm

AlzGene

Antibodies

PubChem

MESH

Reactome

Allen Brain Atlas

Publications

Page 11: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

BioRDF: Looking for Targets for Alzheimer’s

• Signal transduction pathways are considered to be rich in “druggable” targets

• CA1 Pyramidal Neurons are known to be particularly damaged in Alzheimer’s disease

• Casting a wide net, can we find candidate genes known to be involved in signal transduction and active in Pyramidal Neurons?

Source: Alan Ruttenberg

Page 12: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

BioRDF: SPARQL Query

Source: Alan Ruttenberg

Page 13: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

BioRDF: Results: Genes, Processes

DRD1, 1812 adenylate cyclase activationADRB2, 154 adenylate cyclase activationADRB2, 154 arrestin mediated desensitization of G-protein coupled receptor protein signaling pathwayDRD1IP, 50632 dopamine receptor signaling pathwayDRD1, 1812 dopamine receptor, adenylate cyclase activating pathwayDRD2, 1813 dopamine receptor, adenylate cyclase inhibiting pathwayGRM7, 2917 G-protein coupled receptor protein signaling pathwayGNG3, 2785 G-protein coupled receptor protein signaling pathwayGNG12, 55970 G-protein coupled receptor protein signaling pathwayDRD2, 1813 G-protein coupled receptor protein signaling pathwayADRB2, 154 G-protein coupled receptor protein signaling pathwayCALM3, 808 G-protein coupled receptor protein signaling pathwayHTR2A, 3356 G-protein coupled receptor protein signaling pathwayDRD1, 1812 G-protein signaling, coupled to cyclic nucleotide second messengerSSTR5, 6755 G-protein signaling, coupled to cyclic nucleotide second messengerMTNR1A, 4543 G-protein signaling, coupled to cyclic nucleotide second messengerCNR2, 1269 G-protein signaling, coupled to cyclic nucleotide second messengerHTR6, 3362 G-protein signaling, coupled to cyclic nucleotide second messengerGRIK2, 2898 glutamate signaling pathwayGRIN1, 2902 glutamate signaling pathwayGRIN2A, 2903 glutamate signaling pathwayGRIN2B, 2904 glutamate signaling pathwayADAM10, 102 integrin-mediated signaling pathwayGRM7, 2917 negative regulation of adenylate cyclase activityLRP1, 4035 negative regulation of Wnt receptor signaling pathwayADAM10, 102 Notch receptor processingASCL1, 429 Notch signaling pathwayHTR2A, 3356 serotonin receptor signaling pathwayADRB2, 154 transmembrane receptor protein tyrosine kinase activation (dimerization)PTPRG, 5793 ransmembrane receptor protein tyrosine kinase signaling pathwayEPHA4, 2043 transmembrane receptor protein tyrosine kinase signaling pathwayNRTN, 4902 transmembrane receptor protein tyrosine kinase signaling pathwayCTNND1, 1500 Wnt receptor signaling pathway

Many of the genes are related to AD through gamma

secretase (presenilin) activity

Source: Alan Ruttenberg

Page 14: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

LODD: Introduction

B C

Thing

typedlinks

A D E

typedlinks

typedlinks

typedlinks

Thing

Thing

Thing

Thing

Thing Thing

Thing

Thing

Thing

Search Engines

Linked DataMashups

Linked DataBrowsers

Use Semantic Web technologies to1. publish structured data on the Web2. set links between data from one data source to data within other data sources

Source: Chris Bizer

Page 15: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

LODD: Potential Links between Data Sets

Source: Chris Bizer

Page 16: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

LODD: Potential questions to answer

• Physicians and Pharmacists• What are alternative drugs for a given indication (disease)?• What are equivalent drugs (generic version of a brand name, or the chemical name of

a active ingredient)?• Are there ongoing clinical trials for a drug?

• Patients• What background information is available about a drug?• What are the contraindications of a drug?• Which alternative drugs are available?• What are the results of clinical trials for a drug?

• Pharmaceutical Companies• What are other companies with drugs in similar areas?• Which companies have a similar therapeutic focus?

Source: Chris Bizer

Page 17: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

LODD: Linked Version of ClinicalTrials.gov

• Total number of triples: 6,998,851

• Number of Trials: 61,920

• RDF links to other data sources: 177,975

• Links to: • DBpedia and YAGO

(from intervention and conditions)• GeoNames (from locations)• Bio2RDF.org's PubMed (from references)

Source: Chris Bizer

Page 18: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

Semantic Web Solutions at Lilly

Page 19: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

Implementations at Lilly

• Integration of Clinical and Pathways Data

• Competitive Intelligence

• Experimental Metadata

• Discovery Metadata

Page 20: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

Discovery Metadata: Goals

• Integrate master data throughout the discovery process to enable information sharing/integration for the scientific community• Model key relationships between master data classes• Provide ability to integrate disparate data sets quicker than the

normal warehouse paradigm typically allows• Create a re-usable and sustainable semantic implementation• Allow for user-driven, manual curation of key data

relationships

Source: Phil Brooks

Page 21: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

Discovery Metadata: Ontology

SAP

REFDB

NCBI

GSM

Manual Curation

Legacy

Source: Phil Brooks

Page 22: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

Discovery Metadata: Architecture

Application 1 Application 2 Application 3 …

SOA Layer/Enterprise Service Bus

(WebServices, Visualizers, DataAccess Components)Authentication

SOA

DATA

APPS

SQL SPARQL

SourceModel 1

SourceModel 2

SourceModel 3

SourceModel 4

LocalAssertions

Top LevelOntology

Provenance

OtherSources

OtherSources

Source…

ETL

OtherTools

SpreadsheetsRdbms

Source: Phil Brooks

Page 23: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

External Collaborations

• RDF Access to Relational Databases - Chris Bizer, Eric Prud'hommeaux• Scalability testing of relational to RDF mapping approaches

• End User Semantic Web Authoring - David Karger• Enhancing the scalability and robustness of the Exhibit and Potluck tools

• Scientist-Driven Semantic Integration of Knowledge in Alzheimer's Disease - Tim Clark, June Kinoshita

• Project to develop an integrated knowledge infrastructure for the neuromedical research community, pairing rich digital semantic context with the ever-growing digital scientific content on the web

• Provenance Collection and Management - Carole Goble, Beth Plale• Project to develop a metadata taxonomy for global data at Lilly which enables

the rapid integration of data and mining/analysis algorithms into dataflows which support clinical and discovery decisions

• W3C’s Health Care and Life Sciences Interest Group

Page 24: Semantics for eScience Susie Stephens, Principal Research Scientist, Eli Lilly.

Conclusion

• Many Semantic Web solutions are being explored within the health care and life sciences community

• Lilly is seeing tangible benefits in multiple projects from Semantic Web

• Semantic Web provides a flexible framework for data integration• Incremental adoption of technology• Flexibility to integrate unanticipated data sets• Link existing silos together

• Lilly is setting up open collaborations in this space

• Try out LSG