Download - Pipeline for automated structure-based classification in the ChEBI ontology

Transcript
Page 1: Pipeline for automated structure-based classification in the ChEBI ontology

Pipeline for automated structure-based classification in the ChEBI ontology

Pipeline for automated structure-based classification in the ChEBI ontology

Janna Hastings

Coordinator, Cheminformatics and Metabolism

www.ebi.ac.uk/chebi

ACS Symposium on Chemical Ontologies, Taxonomies and Schemas. Dallas, 16 March 2014

Page 2: Pipeline for automated structure-based classification in the ChEBI ontology

Chemical Entities of Biological Interest

Freely available online, available

for download in full

Freely available online, available

for download in full

Low molecular weight, i.e. no proteins

Low molecular weight, i.e. no proteins

Definitions, relationships,

hierarchy

Definitions, relationships,

hierarchy

E.g. metabolites,

drugs, pesticides

E.g. metabolites,

drugs, pesticides

38,215 entries last release

38,215 entries last release

Page 3: Pipeline for automated structure-based classification in the ChEBI ontology

What does ChEBI provide?

Chemical structures and visualisations

caffeine1,3,7-trimethylxanthine methyltheobromine

Names and synonyms

Formula: C8H10N4O2Charge: 0 Mass: 194.19

Chemical data

metaboliteCNS stimulanttrimethylxanthines

Ontology – classifications

MSDchem: CFFKEGG DRUG: D00528PubMed citations

Links to more information

Chemical InformaticsInChI=1/C8H10N4O2/c1-10-4-9-6-5(10)7(13)12(3)8(14)11(6)2/h4H,1-3H3

SMILES CN1C(=O)N(C)c2ncn(C)c2C1=O

Page 4: Pipeline for automated structure-based classification in the ChEBI ontology

Example ChEBI entry page

Page 5: Pipeline for automated structure-based classification in the ChEBI ontology

Example entry page (continued)

Page 6: Pipeline for automated structure-based classification in the ChEBI ontology

Example entry page (continued)

Page 7: Pipeline for automated structure-based classification in the ChEBI ontology

Structure-based classification in ChEBI

Page 8: Pipeline for automated structure-based classification in the ChEBI ontology

Challenges with manual classification

• May be incomplete

• May be inconsistent

• Difficult to maintain (even with extensive use of computationally expensive automatic validations)

• Blocks automatic loading of otherwise high-quality externally annotated chemical data into ChEBI (as no classification available)

Page 9: Pipeline for automated structure-based classification in the ChEBI ontology

SOCO (SMARTS, OWL) Leonid Chepelev, Michel Dumontier, collaborators• Given a training set of classified molecules,

examine structures for consensus features across all (using fragmentation and feature detection)

• Capture features hierarchically

• Use OWL to classify

Chepelev et al. BMC Bioinformatics 2012 13:3   doi:10.1186/1471-2105-13-3

Page 10: Pipeline for automated structure-based classification in the ChEBI ontology

Limitations of SOCO

• No support for negation

• Only “min” (at least) counting supported, not max or exact. Thus, dicarboxylic acid is_a monocarboxylic acid (Every two-legged human is also a one-legged human in the sense that they have at least one leg…)

• SMARTS is powerful – but not very human-readable. ChEBI is for human biologist and chemist consumption. E.g. SMARTS for the class of aliphatic amines: [$([NH2][CX4]),$([NH]([CX4])[CX4]),$[NX3]([CX4])([CX4])[CX4])]

Can we do better at making definitions accessible?

Page 11: Pipeline for automated structure-based classification in the ChEBI ontology

A new pipeline for automated structure-based ontology classification in ChEBI

Definitions (OWL)

ChEBI structures

OWL Parser => logical

cheminformatics definitions

OWL Parser => logical

cheminformatics definitions

Novelstructure

Candidateclasses

RankingRankingBest classes: save is_a relations

MatchingMatching

Page 12: Pipeline for automated structure-based classification in the ChEBI ontology

Human-readable definitions, mapped to structures in ChEBI knowledgebase

thiadiazoles:molecular_entity and has_part some ( 1,2,3-thiadiazole or 1,2,4-thiadiazole or 1,2,5-thiadiazole or 1,3,4-thiadiazole )

diterpenoid: organic_molecular_entity and has_part exactly 2 terpenoid

organic ion: organic_molecular_entity and ( has_charge some int[>0] or has_charge some int[<0] )monocyclic compound: molecular_entity and has_cycles value "1"^^int

Logical operatorsLogical operators

Counts (min, max and exact)

Counts (min, max and exact)

PropertiesProperties

PartsParts

Page 13: Pipeline for automated structure-based classification in the ChEBI ontology

Planned integration into ChEBI tools

• ChEBI internal data loader and bulk submissions

• ChEBI online submission tool

Pre-population of matched

classes

Pre-population of matched

classes

Page 14: Pipeline for automated structure-based classification in the ChEBI ontology

Acknowledgements – Thanks!

ChEBI team:

Christoph SteinbeckGareth OwenAdriano DekkerNamrata KaleSteve TurnerVenkatesh Muthukrishnan

Collaborators:

Colin Batchelor, RSCLian Duan, ETHLeonid Chepelev, OttawaMichel Dumontier, StanfordDespoina Magka, OxfordIlinca Tudose and John May, EBI

Funding:

BBSRC “Continued development of ChEBI towards better usability for the systems biology and metabolic modelling communities” BB/K019783/1

Page 15: Pipeline for automated structure-based classification in the ChEBI ontology

Questions?

Thank you for listening!Thank you for listening!

[email protected]

ACS Symposium on Chemical Ontologies, Taxonomies and Schemas. Dallas, 16 March 2014