Linked Data for Biopharma

Click here to load reader

download Linked Data for Biopharma

of 21

Embed Size (px)

description

As BioPharma adapts to incorporate nimble networks of suppliers, collaborators, and regulators the ability to link data is critical for dynamic interoperability. Adoption of linked data paradigm allows BioPharma to focus on core business: delivering valuable therapeutics in a timely manner.

Transcript of Linked Data for Biopharma

  • 1.Tom Plasterer, PhD. integrated informatics Semantic Framework Lead (i2SF) The Path to Linked Data in BioPharma Integrated R&D Informatics and Knowledge Management

2. R&D | RDI Blockbuster Patent Cliff Gives Way to Personalized Approach Drivers & Solutions Blockbuster Patent Cliff Growth of Generics Mergers & Acquisitions Personalized Medicine Pharmacogenetics Biomarkers American Action Forum; Primer: The Pharmaceutical Industry (Han Zhong l Updated June 2012) IMAP Pharma & Biotech Industry Global Report 2011 Evaluate Pharma World Preview 2018From: http://www.liv.ac.uk/pharmacogenetics/ 3. R&D | RDI Nurture best in class programs Kill early Repositioning Build from within Partner or Buy? Integrate cultures & technology Is the disruption worth it? Mergers & Acquisitions How much can be sharedand still be useful? Who is driving? Pre-Competitive Consortiums Aggressive Regional Partnerships (Pfizer's Centers for Therapeutic Innovation) Co-locate near Academic Centers of Excellence (Novartis) Cherry pick (GSK, AZ, others) Finding KOLs Where do the new opportunities arise? Inside & Outside 4. R&D | RDI Distributed Data in a Monolithic Environment Managing Silos Regulated Systems vs. Discovery Partitioned By Content US, EU, ASIAPAC Partitioned By Geography & Organization RDB, Excel, Text, RSS, RDF? Data Formats Steps in the right direction? Warehouses & Service Oriented Architecture eRooms, Sharepoint,Yammer, Lync vs. Twitter, Google Docs, Skype Collaborative Environment Vendor specific or open? Mixed BagStandards? UI? Services? Metadata?Where are the smarts 5. R&D | RDI Requirements of The Informatics Landscape Must span the entire drug development lifecycle o and back (post-market surveillance to discovery) Must support large and very heterogeneous data o single nucleotide polymorphisms to countries Will change as new science emerges & new regulations come into play o Medline just under 1M articles/year Must be able to work with multiple, international regulatory bodies o Emerging markets Partners, customers and collaborators will change o and will have divergent technical aptitudes Must be able to interoperated with precompetitive consortia o Can they perform common tasks for the community Must be able to work with legacy data o Lots of unmined gems here! Maximal Agility 6. R&D | RDI Whats Needed? Linked Data! http://thedatahub.org/group/lodcloud LOD Cloud 2011 7. R&D | RDI The 5 Stars of Open Linked Data W3C/TBL Guidance 7 http://www.w3.org/DesignIssues/LinkedData.html Make your stuff available on the web (any format) make it available as structured data (e.g. Excel instead of image scan of a table) Use a non-proprietary format (e.g. CSV instead of Excel) Use URLs to identify things, so that people can point at your stuff Link your data to other peoples data to provide context 8. R&D | RDI The 5 Stars of Open ClosedLinked Data 8 http://www.w3.org/DesignIssues/LinkedData.html Make your stuff available on the web intranet (any format) make it available as structured data (e.g. Excel instead of image scan of a table) Use a non-proprietary format (e.g. CSV instead of Excel) Use URLs to identify things, so that people can point at your stuff Link your data to other peoples data to provide context W3C/TBL Guidance 9. Catalogues, Mapping, Queries RDF Towards a Linked Data Architecture 9 Active & Partial PURLs Central Identity Management Structured Triplestores http://research.vocab.astrazeneca.com/id/DOID/2841 http://humandiseaseontology.astrazeneca.net/DOID/2841 Semantic Visualization Semi-StructuredUnstructured Content +Tagging Vocabulary Server Search 10. R&D | RDI Choosing Linked Vocabularies Current LOD Cloud Adoption 10 Vocabulary prefix Vocabulary link Number of usages in data sets dc http://purl.org/dc/elements/1.1/ 92 (31.19 %) foaf http://xmlns.com/foaf/0.1/ 81 (27.46 %) skos http://www.w3.org/2004/02/skos/core# 58 (19.66 %) geo http://www.w3.org/2003/01/geo/wgs84_pos# 25 (8.47 %) xhtml http://www.w3.org/1999/xhtml/vocab# 19 (6.44 %) akt http://www.aktors.org/ontology/portal# 17 (5.76 %) bibo http://purl.org/ontology/bibo/ 14 (4.75 %) mo http://purl.org/ontology/mo/ 13 (4.41 %) vcard http://www.w3.org/2006/vcard/ns# 10 (3.39 %) sioc http://rdfs.org/sioc/ns# 10 (3.39 %) cc http://creativecommons.org/ns# 8 (2.71 %) geonames http://www.geonames.org/ontology# 6 (2.03 %) http://www4.wiwiss.fu-berlin.de/lodcloud/state/#terms Vocabulary Server 11. R&D | RDI The 5 Stars of Open Linked Vocabularies Bernard Vatant (Mondeca) Guidance 11 http://blog.hubjects.com/2012/02/is-your-linked-data-vocabulary-5-star_9588.html Publish your vocabulary on the Web at a stable URI Provide human-readable documentation and basic metadata (e.g. creator, publisher, date of creation, last modification, version number) Provide labels and descriptions, if possible in several languages, to make your vocabulary usable in multiple linguistic scopes Make your vocabulary available via its namespace URI, both as a formal file and human-readable documentation, using content negotiation Link to other vocabularies by re-using elements rather than re-inventing 12. R&D | RDI Domain Specific Vocabularies Linked Open Vocabularies, NCBO 12 http://labs.mondeca.com/dataset/lov/index.html http://bioportal.bioontology.org/ 13. Capture Business Questions and Sources Domain Expert Concept Map Build Formal Ontology Reuse Vocabularies! Challenge with Linked Data Model Business Questions (SPARQL) Interact with RDF answer in a Faceted Browser Building Linked Data Applications 14. Improving Internal Interoperability Scientists, Clinicians, Informaticists can now freely interoperate as: The PURL server provides a central identity management authority for resources that are of value (need to persist) across the enterprise. The Persistent URLs are used to connect resources found in multiple locations The vocabulary server provides a way of harmonizing concepts across different domains o Where possible, public vocabularies are used o Where not, theyre extended o We dont want to develop and maintain vocabularies 15. R&D | RDI Structured Vendor Content Consortium Content RESTful APIs Catalogues, Mapping, Queries RDF Structured Triplestores Semi-StructuredUnstructured Content +Tagging Inside/Outside Disappears 15 External Internal Active & Partial PURLs Central Identity Management Semantic Visualization Vocabulary Server 16. R&D | RDI Unstructured Content Giving Structure to Unstructured Content o Entity Recognition o Use of common vocabularies o Schemas o Domain-Specific Content? Open BEL? TMO? o Compatibility of text indices with triplestores & middleware tools Encouraging Publishers to Structure Content o How can this be monetized so they dont lose their ROI? o What about interoperability & persistence? o Can this be mandated via funding agencies o RDFa to start? Publishers or Re-publishers o Thomson-Reuters o Ingenuity o Open up vocabularies (or most of the data out there) 17. R&D | RDI Pre-Competitive Consortia Open PHACTS (Innovative Medicines Initiative) Pistoia Alliance W3C Health Care & Life Sciences Interest Group National Center for Biomedical Ontologies (NCBO) Open BEL (Biological Expression Language) 18. R&D | RDI l l l l l l l l l l l l l l l l l 18 Open PHACTS (Open Pharmacological Space) EU/EFPIA Innovative Medicines Initiative (IMI) project From: Open PHACTS Architecture - Building the extensible platform (EuroQSAR 2012 in Vienna, 30.08.2012) 19. R&D | RDI W3C HCLS Activities: o Continue to develop high level (e.g. TMO) and architectural (e.g. SWAN) vocabularies. o Implement proof-of-concept demonstrations and industry-ready code. o Document guidelines to accelerate the adoption of the technology. o Disseminate information about the group's work at government, industry, academic events and by participating in community initiatives. Use Cases/Domains o Drug Discovery o Electronic Lab Notebooks o Comparator Arm Data o Patient Data Ownership o Biotech Acquisition o Supply Chain Automation o Web Integration o Bio-surveillance o Co-development http://www.w3.org/blog/hcls/ The mission of the Semantic Web Health Care and Life Sciences Interest Group (HCLS IG) is to develop, advocate for, and support the use of Semantic Web technologies across health care, life sciences, clinical research and translational medicine 20. R&D | RDI Pleas & Future Directions Prognostications RDF Content Farms Vendors: Someone will figure out how to monetize this Consortia: Who Owns this? Government in Health Care & Life Sciences; can we learn from the EPA? open.gov? Shrinking Pharma Smaller (or virtual) footprint o Back to first principleswhat do we do best? More modeling & Simulation Rise of the informaticist Community Help Resist Silos Where is your data? Where is it likely to be in 5, 10 years? A single triplestore with all ETL- streams leading to an RDF data warehouse is another silo o Building on top of standards+ may lead to silos Need to follow & influence emergence of standards if you have a horse in the race Support (business focused) Consortiums Were doing the same job many, many times 21. Thank You Listeners & Molecular Med TRI-CON 2013 Organizers