RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to...
Transcript of RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to...
![Page 1: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/1.jpg)
Building an RDF representation of the the ChEMBL Database
RDF Workshop
Mark Davies ChEMBL Group, Technical Lead
30/04/2014
![Page 2: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/2.jpg)
Overview • Brief introduction to ChEMBL database
• Approaches to mapping relational data to RDF data model based on ChEMBL experience
• New features in ChEMBL RDF (version 18)
• Future ChEMBL plans
![Page 3: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/3.jpg)
• Open access database for drug discovery
• Freely available (searchable and downloadable)
• Content:
• Bioactivity data manually extracted from the primary medicinal chemistry literature from journals such as J. Med. Chem.
• Subset of data from PubChem
• Deposited data e.g. neglected disease screening, GSK kinase set
• Bioactivity data is associated with a biological target and a chemical structure
• Compounds are stored in a structure searchable format
• Protein targets are linked to protein sequences in UniProt
• Updated regularly with new data
• Secure searching (https://www.ebi.ac.uk/chembldb )
What is ChEMBL?
![Page 4: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/4.jpg)
ChEMBL
https://www.ebi.ac.uk/chembl/
• ChEMBL 18 Release
• 1,359,508 compounds
• 12,419,715 activities
• 1,042,374 assays
• 9,414 targets
• 53,298 documents
• 19 bioactivity sources
• 6 compound-only sources
![Page 5: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/5.jpg)
Compound Target Activity Assay Ref
What does ChEMBL data look like?
![Page 6: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/6.jpg)
How can I access ChEMBL data?
Website Web Services
Widgets Downloads
Virtual Machine (myChEMBL)
Semantic Web
![Page 7: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/7.jpg)
ChEMBL + Semantic Web
• The creation of the RDF version of ChEMBL is funded by the Open PHACTS project - http://www.openphacts.org/
• Migrate the ChEMBL relational data model to RDF based data model – ‘triplify’ everything
• RDF generation is part of official ChEMBL release process
• Identify and use ontologies important in the field of bioactivity data
• ChEMBL RDF to be made available through EBI RDF Platform
![Page 8: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/8.jpg)
Goals of the ChEMBL RDF conversion
• Responding to the demands of the community
• Academic and more recently industry
• Semantic data conversion and querying
• Reasoning/inferencing - providing a starting point for the community
• Ensure the conversion is part of the ChEMBL release cycle
• ChEMBL data model is still evolving so almost impossible for external efforts to keep up to speed with changes
![Page 9: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/9.jpg)
ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBL-RDF/
ChEMBL RDF
Compound Bioactivity Assay Target Ref
![Page 10: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/10.jpg)
Conversion process
ChEMBL RDF Schema ChEMBL Relational Schema
![Page 11: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/11.jpg)
Migrating a relational data model
• Two approaches were used to convert the ChEMBL relational model to an RDF based model
• Approach 1: Semi-automated using the D2RQ Platform
• Approach 2: Manual model building
Where to start?
What tools to use?
What is your goal?
Who is the audience?
Will it be made available to the public?
How often will it be updated?
Which ontologies to use?
Do you write you own ontology?
Which format to use?
![Page 12: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/12.jpg)
Approach 1: Semi-automated using the D2RQ Platform
![Page 13: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/13.jpg)
D2RQ platform overview
• Query a non-RDF database using SPARQL
• Access the content of the database as Linked Data over the Web
• Create custom dumps of the database in RDF formats for loading into an RDF store
• Access information in a non-RDF database using the Apache Jena API
http://d2rq.org/
![Page 14: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/14.jpg)
ChEMBL relational schema
• 3 core domains
• Compound
• Activity
• Target
• 52 tables (52 primary keys J)
• 341 columns
• 4 data types (40 if length, scale and precision included)
• Many indexes, constraints, triggers..
![Page 15: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/15.jpg)
D2RQ: Prerequisites
• Download the software from http://d2rq.org/
• Java 1.5 or higher
• Oracle users will need to download database driver
![Page 16: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/16.jpg)
D2RQ: Example usage
• Aims
• Create RDF representation of your relational data model
• Run and test SPARQL queries against your database
• Online data access and representation
• Process
• Generate “D2R Mapping File”
• Start up D2R Server using “D2R Mapping File”
• Refine model
• Support databases
• Oracle, SQL Server, PostgreSQL, MySQL, HSQLDB,…
http://d2rq.org/
![Page 17: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/17.jpg)
D2RQ: Mapping File Creation
• Command used to generate D2R Mapping File (you just need a database connection string):
• Command above will inspect database and create RDF based definitions of tables and relationships between tables
• Possible to skip/restrict schemas/tables/columns with additional argument – useful for Oracle
$>./generate-mapping -o example_d2r_mapping.ttl \! -u <USER> \! -p <PASSWORD> \! -d oracle.jdbc.driver.OracleDriver jdbc:oracle:thin:@<SERVER>:<PORT>:<DATABASE>!
![Page 18: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/18.jpg)
D2RQ: Mapping file example @prefix map: <#> .!@prefix db: <> .!@prefix vocab: <vocab/> .!@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .!@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .!@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .!@prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#> .!@prefix jdbc: <http://d2rq.org/terms/jdbc/> .!!map:database a d2rq:Database;! d2rq:jdbcDriver "oracle.jdbc.driver.OracleDriver";! d2rq:jdbcDSN "jdbc:oracle:thin:@<SERVER>:<PORT>:<DATABASE>";! d2rq:username "<USER>";! d2rq:password "<PASSWORD>";! .!!# Table CHEMBL_18.ACTION_TYPE!map:CHEMBL_18_ACTION_TYPE a d2rq:ClassMap;! d2rq:dataStorage map:database;! d2rq:uriPattern "CHEMBL_18/ACTION_TYPE/@@CHEMBL_18.ACTION_TYPE.ACTION_TYPE|urlify@@";! d2rq:class vocab:CHEMBL_18_ACTION_TYPE;! d2rq:classDefinitionLabel "CHEMBL_18.ACTION_TYPE";! .!map:CHEMBL_18_ACTION_TYPE__label a d2rq:PropertyBridge;! d2rq:belongsToClassMap map:CHEMBL_18_ACTION_TYPE;! d2rq:property rdfs:label;! d2rq:pattern "ACTION_TYPE #@@CHEMBL_18.ACTION_TYPE.ACTION_TYPE@@";! .!map:CHEMBL_18_ACTION_TYPE_ACTION_TYPE a d2rq:PropertyBridge;! d2rq:belongsToClassMap map:CHEMBL_18_ACTION_TYPE;! d2rq:property vocab:CHEMBL_18_ACTION_TYPE_ACTION_TYPE;! d2rq:propertyDefinitionLabel "ACTION_TYPE ACTION_TYPE";! d2rq:column "CHEMBL_18.ACTION_TYPE.ACTION_TYPE";! .!map:CHEMBL_18_ACTION_TYPE_DESCRIPTION a d2rq:PropertyBridge;! d2rq:belongsToClassMap map:CHEMBL_18_ACTION_TYPE;! d2rq:property vocab:CHEMBL_18_ACTION_TYPE_DESCRIPTION;! d2rq:propertyDefinitionLabel "ACTION_TYPE DESCRIPTION";! d2rq:column "CHEMBL_18.ACTION_TYPE.DESCRIPTION";! .!map:CHEMBL_18_ACTION_TYPE_PARENT_TYPE a d2rq:PropertyBridge;! d2rq:belongsToClassMap map:CHEMBL_18_ACTION_TYPE;! d2rq:property vocab:CHEMBL_18_ACTION_TYPE_PARENT_TYPE;! d2rq:propertyDefinitionLabel "ACTION_TYPE PARENT_TYPE";! d2rq:column "CHEMBL_18.ACTION_TYPE.PARENT_TYPE";! .!
![Page 19: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/19.jpg)
D2R Server
$>./d2r-server example_d2r_mapping.ttl !
• SPARQL endpoint and explorer
• Browsing database contents
• Resolvable URIs
• Content negotiation
• Downloading contents of BLOBs/CLOBs
• Serving the vocabulary
• Publishing metadata
• Command used to start D2R server (assuming you have generated mapping file):
http://d2rq.org/
![Page 20: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/20.jpg)
D2R Server
http://d2rq.org/
![Page 21: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/21.jpg)
D2R Server
http://d2rq.org/
![Page 22: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/22.jpg)
(Quick) D2RQ Demo
![Page 23: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/23.jpg)
D2RQ: Data modeling
• It is possible to model data and also create more meaningful class and property names
• Approach 1
• Edit the mapping file using advanced features of the D2RQ query language: http://d2rq.org/d2rq-language
• Approach 2
• Create mapping file based on restricted set of database objects e.g. users, schemas, views, materialised views – modeling within the database
![Page 24: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/24.jpg)
D2RQ: Optimisations
• Review D2R server deployment
• Increase D2R server heap space
• Review configurations settings, such as page sizes, resultset limits
• Use latest built-in D2RQ optimisations, by specifying d2rq:useAllOptimizations (or --fast flag on server startup)
• Use D2RQ’s dump-rdf command to export RDF representation of database • Exported RDF can then be imported into a triplestore, e.g.
Virtuoso
![Page 25: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/25.jpg)
D2RQ: Limitations
• General limitations
• Integration of multiple databases not possible – achieved within the database
• Read only access – update extension is available
• Limited inferencing available
• Named graphs not supported
• Users tend to end up creating ‘weird’ mapping files
• Database models are often not perfect or clean, which complicates mapping file creation process
Just mapping to RDF is not enough
![Page 26: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/26.jpg)
Approach 2: Manual model building
![Page 27: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/27.jpg)
Approach 2 outline
• Building a basic ontology and instantiate with data
• Steps:
• Define entities in data source
• Define relationships between entities
• Define properties of the entities
• Identify and use external ontologies
![Page 28: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/28.jpg)
Model building considerations • Review available technologies/languages
• Preferred may not have good RDF support/libs
• RDF data formats e.g. rdf, ttl, n3
• All are interchangeable, but some are considered more readable and offer reduced size
![Page 29: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/29.jpg)
ChEMBL relational schema revisited
Molecules
Targets
References
Activities
Assays
Binding Sites
MOAs
Drugs
![Page 30: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/30.jpg)
Substance Activity
Assay Document
Target
Target-Component Source Journal
Protein-Classification
Bio-Component
ChEMBL Entities/Classes
ChEMBL 17 classes
![Page 31: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/31.jpg)
• An OWL based ontology used to define ChEMBL classes
• OWL snippet used to define ChEMBL assay:
• Tools available to help build and write ontologies, e.g. Protégé and TopBraid Composer
ChEMBL class definition
@prefix : <http://rdf.ebi.ac.uk/terms/chembl#> .!!:Assay! rdf:type owl:Class ;! rdfs:label "ChEMBL Assay Class"^^xsd:string ;! rdfs:subClassOf :ChEMBL .!
![Page 32: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/32.jpg)
Entity RDF representation
• Entity names become classes, which allow you to type your data
• ‘chembl_assay:’ and ‘cco:’ are prefixes and ‘a’ is a turtle shorthand for rdf:type
!chembl_assay:CHEMBL615672 a cco:Assay .!
![Page 33: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/33.jpg)
Substance Activity
Assay Document
Target
Target-Component Source Journal
Protein-Classification
Bio-Component
ChEMBL Entity relationships
![Page 34: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/34.jpg)
Relationship RDF representation
• Relationships defined between instances of your entities are object properties
!chembl_assay:CHEMBL615672 a cco:Assay ;! cco:hasTarget chembl_target:CHEMBL612910 ;! cco:hasActivity chembl_activity:CHEMBL_ACT_227195 .!
![Page 35: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/35.jpg)
ChEMBL assay attributes
• Identify attributes from database you want to include in RDF model
• Map attribute types e.g. integers, strings, Booleans
• Some attributes map to external resources/ontologies – see later
• Denormalisation of relational data, e.g. FKs
![Page 36: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/36.jpg)
Attribute RDF representation
• Attributes you define for your classes are datatype properties
• Good practice to add a rdfs:label to all instances
!chembl_assay:CHEMBL615672 a cco:Assay ;! cco:hasTarget chembl_target:CHEMBL612910 ;! cco:hasActivity chembl_activity:CHEMBL_ACT_227195 ;! rdfs:label "CHEMBL615672" ;! cco:chemblId "CHEMBL615672" ;! cco:assayType "Functional" ;! cco:assayCellType "3LL cell line" ;! cco:organismName "Mus musculus” .!!
![Page 37: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/37.jpg)
More examples ChEMBL Entity properties
Substance
Target
Activity
TopBraid Composer (Free Edition)
![Page 38: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/38.jpg)
Mapping to external ontologies
• Examples of ontologies/taxonomies mapped to in ChEMBL RDF include:
• BioAssay Ontology (BAO)
• ChEBI
• Chemical Infomation Ontology (CHEMINF)
• Bibliographic Ontology
• Unit Ontology (UO)
• QUDT Ontology
• Semantic Science Ontology (SIO)
• Cell Line Ontology (CLO)
• Experimental Factor Ontology (EFO)
![Page 39: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/39.jpg)
External ontologies/taxonomies
• Identification of relevant external ontologies
• Community consensus + recommendations
• BioPortal - https://bioportal.bioontology.org/
• Ontology Lookup Service - https://www.ebi.ac.uk/ontology-lookup/
![Page 40: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/40.jpg)
Substance Activity
Assay Document
Target
Target-Component Source Journal
Protein-Classification
Bio-Component
ChEMBL assay data
![Page 41: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/41.jpg)
ChEMBL assay annotation
• The assay is the central component to the ChEMBL data model
• Current model not ideal
• Single category - binding, functional, ADMET, physchem
• Unstructured/free text used to describe assay
• Many assay parameters not captured – although often not available
• Ontologies are now being used to improve ChEMBL assay annotations - ChEMBL_17 onwards
• Mappings to BAO bioassays, assay_format, endpoints
• http://bioassayontology.org
![Page 42: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/42.jpg)
BioAssay Ontology
Bioassay parent class, 92 descendant classes
How do we map to all these BAO assay classes?
![Page 43: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/43.jpg)
External ontology mapping process
• In many cases mapping is straight forward
• Use common bridging identifier e.g. UniProt
• Simple text based conversion e.g. units - actually units not so straight forward in ChEMBL
• Some mappings require complex rules e.g. assay details
• Multiple database parameters
• Complex text processing
• Manual curation
• Tools available to assist with mapping process
• BioPortal Annotator (http://bioportal.bioontology.org/annotator)
• Zooma (http://www.ebi.ac.uk/fgpt/zooma/)
• ChEMBL Assay Annotator
![Page 44: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/44.jpg)
BioPortal Annotator
ChEMBL Assay Description
Restricted to Ontology interest (optional)
Results
API available http://data.bioontology.org/documentation
http://bioportal.bioontology.org/annotator
![Page 45: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/45.jpg)
BioPortal Annotator Example
• CHEMBL2213497 assay description
• More information here:
https://www.ebi.ac.uk/chembl/assay/inspect/CHEMBL2213497
Now use BioPortal Annotator to annotate…
“Induction of apoptosis in human Jurkat T cells overexpressing Neo assessed as loss in mitochondrial membrane potential at 30 ug/ml after 36 hrs by DiO6-based flow cytometry (Rvb = 5.4%)”
![Page 46: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/46.jpg)
ChEMBL Assay Annotator
• ChEMBL Assay Annotator developed by Samuel Croset
• Aim is to map ChEMBL assays to BAO assay classes
• ‘Tailored’ mapping rules developed
![Page 47: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/47.jpg)
External mapping representation
!chembl_assay:CHEMBL615672 a cco:Assay ;! cco:hasTarget chembl_target:CHEMBL612910 ;! cco:hasActivity chembl_activity:CHEMBL_ACT_227195 ;! rdfs:label "CHEMBL615672" ;! cco:chemblId "CHEMBL615672" ;! cco:assayType "Functional" ;! cco:assayCellType "3LL cell line" ;! cco:organismName "Mus musculus” ;! bao:BAO_0000205 bao:BAO_0000219 .!!!
BAO_0000205 = has_assay_format BAO_0000219 = “Cell based”
• In this example defining assay_format:
![Page 48: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/48.jpg)
ChEMBL Core Ontology (CCO)
• The skeleton schema used to store ChEMBL classes, object properties and datatype properties
• The file is also RDF, so can be queried independent of an instances
• ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBL-RDF/18.0/cco.ttl.gz
• Namespace: http://rdf.ebi.ac.uk/terms/chembl#
• Initial focus on Substance (Molecule) and Target Classification
• In future an additional mapping file may be provided, which maps/aligns ChEMBL classes and properties to external resources
![Page 49: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/49.jpg)
ChEMBL Core Ontology (CCO)
Classes Target Classes Substance Classes
![Page 50: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/50.jpg)
ChEMBL RDF schema
https://www.ebi.ac.uk/rdf/documentation/chembl
![Page 51: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/51.jpg)
The (raw) end result
![Page 52: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/52.jpg)
Querying ChEMBL data
• Need to load files into triplestore (Virtuoso Open Source)
ChEMBL Data
External Ontos
e.g. BAO
CCO
ChEMBL Triplestore
ChEMBL SPARQL
Interface/LD Browser
http://www.ebi.ac.uk/rdf/services/chembl/sparql
Reactome Triplestore
UniProt Triplestore
Bio2RDF
![Page 53: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/53.jpg)
Using ‘external’ RDF data sources
• Questions to think about when using external RDF data sources
• Who creates resource and the RDF representation?
• When was the resource last updated?
• When was RDF last updated?
• Does the data model make sense?
• Basic queries work?
• Shared entities and ontologies?
• Any data licensing issues?
![Page 54: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/54.jpg)
VoID can help
• VoID = Vocabulary of Interlinked Datasets
• Acts a bridge between publishers and users
• EBI RDF resources provide a VoID (just an extra RDF file)
• Information contained in VoID
• Creation timestamps
• Publisher details
• Versioning
• Ontologies/vocabularies used
• Licensing
• Data formats available and where they live (not just RDF)
• More complex information such as Subsets and Linksets
![Page 55: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/55.jpg)
Quick look at the ChEMBL VoID…
![Page 56: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/56.jpg)
Model building recommendations
• Technology review
• URIs should resolve and be future proofed
• Ensure the correct external namespaces are being used
• Add rdfs:label to everything
• Consider using identifiers for ontology names instead of textual descriptions
• As ‘small’ descriptive ontology grows consistent naming conventions can breakdown
• Not used for CCO, but may consider future format switch e.g. CCO_000001 = ChEMBL Activity, CCO_000002 = ChEMBL Assay and so on
![Page 57: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/57.jpg)
Technology stack
• Triple Processing
• Groovy
• OpenRDF Sesame Java API (http://www.openrdf.org)
• Rapper – useful command line utility
• Triplestore/Storage
• Virtuoso Open Source Edition 6.1.7 (Upgrade to Version 7 planned)
• Raw .ttl files available to download from ChEMBL FTP site ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBL-RDF/
• Domain/class specific .ttl files created – helps processing and loading
![Page 58: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/58.jpg)
New features ChEMBL 18 RDF
• More data, now 409,989,782 triples
• New types of data
• Binding sites, cell lines, mechanism of action
• New properties
• Molecule hierarchy mappings
• Target complex mappings
• Assay parameters
• Improved mappings to the BAO ontology assay_format, e.g. biochemical, physiochemical, cell based,…
• Some example queries now follow ->
![Page 59: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/59.jpg)
Example query 1
• Get all human cell-lines:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>!PREFIX dcterms: <http://purl.org/dc/terms/>!PREFIX bao: <http://www.bioassayontology.org/bao#>!PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>!!SELECT ?cellLine ?cellName!WHERE {! ?cellLine a cco:CellLine ;! cco:taxonomy <http://identifiers.org/taxonomy/9606> ;! rdfs:label ?cellName .!}!
http://tinyurl.com/odqulmq
![Page 60: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/60.jpg)
Example query 2
• Get all compounds that have been tested in a cell-based (bao:BAO_0000219) toxicity assay in HepG2 cells:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>!PREFIX dcterms: <http://purl.org/dc/terms/>!PREFIX bao: <http://www.bioassayontology.org/bao#>!PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>!!SELECT ?mol ?assayDesc!WHERE {! ?mol ?p cco:Substance ;! cco:substanceType ?moltype ;! cco:hasActivity ?activity .! ?activity cco:hasAssay ?assay .! ?assay cco:assayCellType 'HepG2' ;! cco:assayType 'Toxicity' ;! bao:BAO_0000205 bao:BAO_0000219 ;! dcterms:description ?assayDesc .!}!
http://tinyurl.com/oyttvlr
![Page 61: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/61.jpg)
Example query 3
• Get all concentration response assays (bao:BAO_0002162) for monoamine receptor targets (CHEMBL_PC_1266):
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>!PREFIX dcterms: <http://purl.org/dc/terms/>!PREFIX bao: <http://www.bioassayontology.org/bao#>!PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>!!SELECT distinct ?assay ?assayDesc!WHERE{! <http://rdf.ebi.ac.uk/resource/chembl/protclass/CHEMBL_PC_1266> cco:hasTargetDescendant ?target .! ?target cco:hasAssay ?assay .! ?assay cco:hasActivity ?activity ;! dcterms:description ?assayDesc .! ?activity bao:BAO_0000208 ?endpoint .! ?endpoint rdfs:subClassOf bao:BAO_0002162 .!}!
http://tinyurl.com/o6qg8uk
![Page 62: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/62.jpg)
Example query 4
• Get the number of ADME assays carried out in organism-based (bao:BAO_0000218) format for FDA approved drugs:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>!PREFIX dcterms: <http://purl.org/dc/terms/>!PREFIX bao: <http://www.bioassayontology.org/bao#>!PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>!!SELECT ?molname ?mol (count(distinct ?assay) as ?assay_count)!WHERE{! ?assay a cco:Assay ;! cco:assayType 'ADME' ;! bao:BAO_0000205 bao:BAO_0000218 .! ?assay cco:hasActivity ?activity .! ?activity cco:hasMolecule ?mol .! ?mol cco:highestDevelopmentPhase 4 ;! rdfs:label ?molname .!}!GROUP BY ?molname ?mol!ORDER BY DESC(count(distinct ?assay))!
http://tinyurl.com/psu5442
![Page 63: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/63.jpg)
Example query 5
• Get all cell-lines that have been used in physical property assays (bao:BAO_0002128):
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>!PREFIX dcterms: <http://purl.org/dc/terms/>!PREFIX bao: <http://www.bioassayontology.org/bao#>!PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#>!!SELECT ?cellLine ?assay!WHERE {! ?cellLine a cco:CellLine ;! cco:isCellLineForAssay ?assay .! ?assay cco:hasActivity ?activity .! ?activity bao:BAO_0000208 ?endpoint .! ?endpoint rdfs:subClassOf bao:BAO_0002128 .!}!
http://tinyurl.com/ojytly3
![Page 64: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/64.jpg)
Example query 6
• Get all Protein Kinase (CHEMBL_PC_1100) inhibitor binding sites:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>!PREFIX dcterms:<http://purl.org/dc/terms/>!PREFIX bao: <http://www.bioassayontology.org/bao#>!PREFIX cco:<http://rdf.ebi.ac.uk/terms/chembl#>!!SELECT ?target ?bindingSite ?siteName ?inhibitor!WHERE{! ?bindingSite a cco:BindingSite ;! cco:bindingSiteName ?siteName ;! cco:hasTarget ?target .! <http://rdf.ebi.ac.uk/resource/chembl/protclass/CHEMBL_PC_1100> cco:hasTargetDescendant ?target .! ?target rdfs:label ?targetName ;! cco:isTargetForMechanism ?mechanism .! ?mechanism cco:mechanismActionType 'INHIBITOR' ;! cco:mechanismDescription ?mechanismDesc ;! cco:hasMolecule ?molecule .! ?molecule rdfs:label ?inhibitor .!}!
http://tinyurl.com/onr2yto
![Page 65: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/65.jpg)
Future Plans: SureChEMBL
• December 2013 EMBL-EBI acquired SureChem – a leading ‘chemistry patent mining’ product from Digital Science, Macmillan Group
• SureChem provides a live (updated daily) view chemical patent space
• Rebranded SureChEMBL
https://www.surechembl.org
![Page 66: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/66.jpg)
Open PHACTS extension
• Open PHACTS project is keen to include patent data in future extensions to the project
• ENSO approved - funding to include SureChEMBL data in Open PHACTS
• RDF conversion, target indexing and API development
• EBI-RDF project benefit from RDF conversion
• SureChEMBL is updated daily, compared to quarterly ChEMBL updates
• Interesting challenge for us creating exports and systems loading SureChEMBL
![Page 67: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/67.jpg)
Open PHACTS Platform
Nanopub
Db
VoID
Data Cache (Virtuoso Triple Store)
Semantic Workflow Engine
Linked Data API (RDF/XML, TTL, JSON) Domain Specific Services
Identity Resolution
Service
Chemistry Registration Normalisation & Q/C
Identifier Management
Service
Indexing
Cor
e Pl
atfo
rm
P12374 EC2.43.4
CS4532
“Adenosine receptor 2a”
VoID
Db
Nanopub
Db
VoID
Db
VoID Nanopub
VoID
Public Content Commercial
Public Ontologies
User Annotations
Apps
(slide author: Lee Harland)
![Page 68: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/68.jpg)
Summary
• Review of the ChEMBL database
• Two approaches used to modeling ChEMBL data
• Approach 2 used to build RDF representation of ChEMBL
• New features included in ChEMBL_18 release
• Model enhancements
• More data
• Plans for the future
• Patents
• Open PHACTS
![Page 69: RDF Workshop - BioMedBridges · Migrating a relational data model • Two approaches were used to convert the ChEMBL relational model to an RDF based model • Approach 1: Semi-automated](https://reader036.fdocuments.net/reader036/viewer/2022070911/5fafe56e8f48d51b5c6efc45/html5/thumbnails/69.jpg)
Acknowledgements
ChEMBL Group
• Anna Gaulton
• Samual Croset
• John Overington
Open PHACTS
• Alasdair Gray
• Antonis Loizou
• Lee Harland
• Egon Willighagen
EBI-RDF Group
• Andy Jenkinson
• Simon Jupp
• James Malone
Groups and people involved in the RDF representation of ChEMBL include: