linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple...

23
The Path to Linked Data in BioPharma Tom Plasterer, PhD. Tom Plasterer, PhD. integrated informatics Semantic Framework Lead (i2SF) It t dR&DI f ti dK ld M t Integrated R&D Informatics and Knowledge Management

Transcript of linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple...

Page 1: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

The Path to Linked Data in BioPharma

Tom Plasterer, PhD.Tom Plasterer, PhD.integrated informatics Semantic Framework Lead (i2SF)

I t t d R&D I f ti d K l d M tIntegrated R&D Informatics and Knowledge Management

Page 2: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

Abstract

As BioPharma adapts to incorporate nimble networks of suppliersAs BioPharma adapts to incorporate nimble networks of suppliers, collaborators, and regulators the ability to link data is critical for dynamic interoperability. Adoption of linked data paradigm allows BioPharma to

focus on core business: delivering valuable therapeutics in a timely manner

Page 3: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

Blockbuster ‘Patent Cliff’ Gives Way to Personalized ApproachDrivers & Solutions

Blockbuster Patent CliffBlockbuster Patent Cliff

Growth of GenericsGrowth of Generics

Personalized Personalized

Mergers & AcquisitionsMergers &

Acquisitions

e so a edMedicine•Pharmacogenetics•Biomarkers

e so a edMedicine•Pharmacogenetics•Biomarkers

Evaluate Pharma World Preview 2018From: http://www.liv.ac.uk/pharmacogenetics/

American Action Forum; Primer: The Pharmaceutical Industry (Han Zhong l Updated June 2012)

R&D | RDI IMAP Pharma & Biotech Industry Global Report 2011

Page 4: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

Where do the new opportunities arise?Inside & Outside

• Nurture ‘best in class’ programs• Kill earlyBuild from withinBuild from within

Inside & Outside

• Kill early• Repositioning

Build from withinBuild from within

P t B ?M &M & • Partner or Buy?• Integrate cultures & technology• Is the disruption worth it?

Mergers & AcquisitionsMergers &

Acquisitions

• How much can be shared—and still be useful?• Who is driving?

Pre-Competitive Consortiums

Pre-Competitive Consortiums

• Aggressive Regional Partnerships (Pfizer's Centers for Therapeutic Innovation)

• Co-locate near Academic Centers of Excellence (Novartis)Finding ‘KOLs’Finding ‘KOLs’

R&D | RDI

Co locate near Academic Centers of Excellence (Novartis)• Cherry pick (GSK, AZ, others)

gg

Page 5: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

Distributed Data in a Monolithic EnvironmentManaging SilosManaging Silos

• Regulated Systems vs. Discovery• Regulated Systems vs. DiscoveryPartitioned By ContentPartitioned By Content

• US, EU, ASIAPAC• US, EU, ASIAPACPartitioned By Geography & OrganizationPartitioned By Geography & Organization

RDB E l T t RSS RDF?RDB E l T t RSS RDF?• RDB, Excel, Text, RSS, RDF?Data FormatsData Formats

• Steps in the right direction?• Steps in the right direction?Warehouses & Service Oriented Architect re

Warehouses & Service Oriented Architect reArchitectureArchitecture

• eRooms, Sharepoint,Yammer, ‘Lync’ • eRooms, Sharepoint,Yammer, ‘Lync’ vs. Twitter, Google Docs, SkypeCollaborative EnvironmentCollaborative Environment

• Vendor specific or open?• Vendor specific or open?• Mixed BagStandards?Standards?

• UI? Services?• UI? Services?

R&D | RDI

• UI? Services?• Metadata?Where are the ‘smarts’Where are the ‘smarts’

Page 6: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

Requirements of The Informatics LandscapeM i l A ilit

Must span the entire drug development lifecycle

Maximal Agility

Must span the entire drug development lifecycleo and back (post-market surveillance to discovery)

Must support large and very heterogeneous datao single nucleotide polymorphisms to countries

Will change as new science emerges & new regulations come into playo Medline just under 1M articles/year

Must be able to work with multiple, international regulatory bodieso Emerging markets

Partners, customers and collaborators will changeo and will have divergent technical aptitudes

Must be able to interoperated with precompetitive consortiao Can they perform common tasks for the community

Must be able to work with legacy datao Lots of unmined gems here!

R&D | RDI

o Lots of unmined gems here!

Page 7: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

Linked Data Uses the Web Data FormatRDF R D i ti F k• Resources:

- Represent things on the web (web pages, information resources)

RDF: Resource Description Framework

p g ( p g , )- Represent things NOT on the web (people, places Non-Information

Resources)- Can represent anything at all- Named using URIs (usually)

May not have a name Blank Nodes

RDF Triple

SubjectP di

Object- May not have a name — Blank Nodes- “nouns”- (Subjects or Objects)

• Literal ValuesPT3445 CT5877

participatesIn

Predicate

- Are values to work with and show users- Can be just a string of text — Plain Literals- Can have a language assigned to them using ISO codes- Can have a specific datatype assigned to them — Typed Literals

• Predicates:- Relationships between Resources- Named using URIs- “Verbs”

R&D | RDI

- Described in Schema (or vocabularies, or ontologies)

7

Page 8: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

What’s Needed?Li k d D t !Linked Data!

R&D | RDI

http://thedatahub.org/group/lodcloudLOD Cloud 2011

Page 9: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

The 5 Stars of Open Linked DataW3C/TBL Guidance

★★ Make your stuff available on the web (any format)

★★★★ make it available as structured data (e.g. Excel instead of image scan of a table)

★★★★★★ Use a non-proprietary format (e.g. CSV instead of Excel)

★★★★★★★★ Use URLs to identify things, so that people can point at your stuff

★★★★★★★★★★ Link your data to other people’s data to provide context

R&D | RDI 9 http://www.w3.org/DesignIssues/LinkedData.html

Page 10: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

The 5 Stars of Open ClosedLinked DataW3C/TBL Guidance

★★ Make your stuff available on the web intranet (any format)

★★★★ make it available as structured data (e.g. Excel instead of image scan of a table)

★★★★★★ Use a non-proprietary format (e.g. CSV instead of Excel)

★★★★★★★★ Use URLs to identify things, so that people can point at your stuff

★★★★★★★★★★ Link your data to other people’s data to provide context

R&D | RDI 10 http://www.w3.org/DesignIssues/LinkedData.html

Page 11: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

Towards a Linked Data Architecture

Central IdentityManagement

Active & Partial PURLs SemanticVisualization

Vocabulary

Catalogues, Mapping, Queries

RD

F+Tagging

Server

Search

Triplestores

Coontent

11

Structured

http://research.vocab.azlinkeddata.com/id/DOID/2841 http://humandiseaseontology.astrazeneca.net/DOID/2841

Semi-StructuredUnstructured

Page 12: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

Choosing Linked VocabulariesCurrent LOD Cloud Adoption VocabularyCurrent LOD Cloud Adoption

Vocabulary prefix Vocabulary link

Number of usages in data

sets

Server

dc http://purl.org/dc/elements/1.1/ 92 (31.19 %)

foaf http://xmlns.com/foaf/0.1/ 81 (27.46 %)

skos http://www.w3.org/2004/02/skos/core# 58 (19.66 %)skos http://www.w3.org/2004/02/skos/core# 58 (19.66 %)

geo http://www.w3.org/2003/01/geo/wgs84_pos# 25 (8.47 %)

xhtml http://www.w3.org/1999/xhtml/vocab# 19 (6.44 %)

akt http://www.aktors.org/ontology/portal# 17 (5.76 %)

bibo http://purl.org/ontology/bibo/ 14 (4.75 %)

mo http://purl.org/ontology/mo/ 13 (4.41 %)

vcard http://www.w3.org/2006/vcard/ns# 10 (3.39 %)

sioc http://rdfs.org/sioc/ns# 10 (3.39 %)

cc http://creativecommons org/ns# 8 (2 71 %)

R&D | RDI 12

cc http://creativecommons.org/ns# 8 (2.71 %)

geonames http://www.geonames.org/ontology# 6 (2.03 %)

http://www4.wiwiss.fu-berlin.de/lodcloud/state/#terms

Page 13: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

The 5 Stars of Open Linked VocabulariesB d V t t ( ) G idBernard Vatant (Mondeca) Guidance

★★ Publish your vocabulary on the Web at a stable URIURI

★★★★ Provide human-readable documentation and basic metadata (e.g. creator, publisher, date of creation last modification version number)creation, last modification, version number)

★★★★★★ Provide labels and descriptions, if possible in several languages, to make your vocabulary usable in multiple linguistic scopesp g p

★★★★★★★★ Make your vocabulary available via its namespace URI, both as a formal file and human-readable documentation, using content negotiation

★★★★★★★★★★ Link to other vocabularies by re-using elements rather than re-inventing

R&D | RDI 13 http://blog.hubjects.com/2012/02/is-your-linked-data-vocabulary-5-star_9588.html

Page 14: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

Domain Specific VocabulariesLi k d O V b l i NCBOLinked Open Vocabularies, NCBO

http://labs.mondeca.com/dataset/lov/index.html

R&D | RDI 14

http://bioportal.bioontology.org/

Page 15: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

Building Linked Data Applications

Capture Business Questions and

Sources

Capture Business Questions and

SourcesSourcesSources

Domain Expert Concept Map

Domain Expert Concept Map

Interact with RDF answer in a

Faceted Browser

Interact with RDF answer in a

Faceted Browser

Build Formal Ontology•Reuse Vocabularies!

Build Formal Ontology•Reuse Vocabularies!

Model Business Questions (SPARQL)

Model Business Questions (SPARQL)

Challenge with Linked Data

Challenge with Linked Data

Page 16: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

Improving Internal Interoperability

Scientists, Clinicians, Informaticists can now freely interoperate as:

The PURL server provides a central identity management authority for resources that are of value (need to persist) across the enterprise. The Persistent URLs are used to connect resources found in multiple locations

The vocabulary server provides a way of harmonizing concepts across different domainsdifferent domains

o Where possible, public vocabularies are usedo Where not, they’re extendedo We don’t want to develop and maintain vocabularieso We don t want to develop and maintain vocabularies

Page 17: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

Inside/Outside DisappearsExternal Internal

Central IdentityManagement

Active & Partial PURLs SemanticVisualization

VocabularyServer

Catalogues, Mapping, Queries

RD

F+Tagging

Vendor Content

Consortium ContentRESTful

APIs Triplestores

Co

Vendor Content APIs

ontent

R&D | RDI

Structured Structured Semi-StructuredUnstructured

17

Page 18: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

Unstructured Content( t f th d t t th )

Giving Structure to Unstructured Content

(or most of the data out there…)

Giving Structure to Unstructured ContentoEntity RecognitionoUse of common vocabularies

o SchemasDomain Specific Content? Open BEL? TMO?o Domain-Specific Content? Open BEL? TMO?

oCompatibility of text indices with triplestores & middleware tools

Encouraging Publishers to Structure ContentoHow can this be ‘monetized’ so they don’t lose their ROI?oWhat about interoperability & persistence?oCan this be mandated via funding agenciesoRDFa to start?oRDFa to start?

Publishers or ‘Re-publishers’o Thomson-Reuters

Elsevier

R&D | RDI

oElseviero IngenuityoOpen up vocabularies (thanks, Cortellis!)

Page 19: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

Pre-Competitive Consortia

Open PHACTS (Innovative Medicines Initiative)

Pistoia Alliance

W3C Health Care & Life Sciences Interest Groupp

National Center for Biomedical Ontologies(NCBO)

Open BEL (Biological Expression Language)

R&D | RDI

Page 20: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

Open PHACTS (Open Pharmacological Space)• EU/EFPIA Innovative Medicines Initiative (IMI) project

Flexible and adaptableKey Points Large scale data integration

• EU/EFPIA Innovative Medicines Initiative (IMI) project

Flexible and adaptable Dynamic schema-less approach; rapidly incorporate new datasets Queries are adaptive, based on scientific profiles (e g chemist or

Large scale data integration Focused on pharmacology We integrate so you don’t have to Dealing with multiple identifiers for th t scientific profiles (e.g. chemist or

biologist) Use-case driven & tested by users in industry and academia

the same concept Always up-to-date State of the art and industrial strength

Great APIs for building apps JSON REST-style APIs Also supports XML, Turtle, etc

Focus On Data Quality Provenance is critical – know where every data point comes from Google-style indexing; Data providers pp

Chemistry services Exemplars show how to take advantage of the platform Clear licensing details for all data in

g y gkeep their own data Chemistry Standardization –enhancing chemistry connectivity Working with data providers to expose

R&D | RDI

gthe system and enhance their data

20

From: Open PHACTS Architecture - Building the extensible platform (EuroQSAR 2012 in Vienna, 30.08.2012)

Page 21: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

W3C HCLS

The mission of the Semantic Web Health Care and Life Sciences Interest Group (HCLS IG) is to develop, advocate for, and

support the use of Semantic Web technologies across health

Activities:o Continue to develop high level (e.g. TMO) and architectural (e.g. SWAN)

support the use of Semantic Web technologies across health care, life sciences, clinical research and translational medicine

p g ( g ) ( g )vocabularies.

o Implement proof-of-concept demonstrations and industry-ready code.o Document guidelines to accelerate the adoption of the technology.o Disseminate information about the group's work at government, industry, academic

events and by participating in community initiatives.events and by participating in community initiatives.Use Cases/Domainso Drug Discoveryo Electronic Lab Notebookso Comparator Arm Data

CDISC2RDF: Making Clinical Data Interchange Standards Consortium

o Patient Data Ownershipo Biotech Acquisitiono Supply Chain Automationo Web Integrationo Bio surveillance

(CDISC) available as RDF• Roche, AZ, TopQuadrant, Vrije

Universiteit, Amsterdam• More at CSHALS in two weeks

R&D | RDI

o Bio-surveillanceo Co-development

http://www.w3.org/blog/hcls/

Page 22: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

Pleas & Future Directions

PrognosticationsRDF Content Farms

Vendors: Someone will figure out

Community HelpResist Silos

Where is your data? Where is it likely Vendors: Someone will figure out how to monetize thisConsortia: Who ‘Owns’ this?Government in Health Care & Life Sciences can e learn from the

Where is your data? Where is it likely to be in 5, 10 years?A single triplestore with all ETL-streams leading to an RDF ‘data

Sciences; can we learn from the EPA? open.gov?

Shrinking Pharma

warehouse’ is another silooBuilding on top of ‘standards+’ may

lead to silosShrinking Pharma

Smaller (or virtual) footprintoBack to first principles—what do

we do best?

Need to follow & influence emergence of standards if you have a ‘horse in the race’

we do best?More modeling & SimulationRise of the informaticist…

Support (business focused) ConsortiumsWe’re doing the same job many, many times

R&D | RDI

times

Page 23: linked data for biopharma 14FEB2013...zWe integrate so you don’t have to zDealing with multiple identifiers for th t scientific profiles (e.g. chemist or biologist) zUse-case driven

Thank YouLi t &Listeners & Molecular Med TRI-CONMolecular Med TRI CON 2013 Organizers