AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of...

24
AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria E-mail [email protected] Joanna Jaworska Central Product Safety Procter and Gamble Belgium

Transcript of AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of...

Page 1: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

AMBIT Software for Data Management and

(Q)SAR Applications

Nina Jeliazkova

Bulgarian Academy of Sciences Institute for Parallel Processing SofiaBulgariaE-mail [email protected]

Joanna Jaworska

Central Product SafetyProcter and GambleBelgium

Page 2: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

Introduction – why AMBIT ? Limited free, publicly accessible, methodologically

transparent software was identified as one of the roadblocks for broadening use of in-silico methods (ICCA Workshop in Setubal 2002, OECD)

Realization that efficient use of existing information on chemicals requires better ways for

Storage standardized formats, computer automated verification

of structures, capability to store large amounts of data

Taking advantage of rapidly evolving field of data mining and extraction of relevant information

Page 3: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

Content

Overview of AMBIT functional modules Technology choice and software capabilities Demonstration of the current state

Web application Online similarity search

Standalone applications Ambit Database Tools

Descriptor search Experimental data search Similarity search Verhaar classification scheme

AmbitDiscovery Applicability domain Grouping by different methods

Page 4: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

Software overview

Database

Search engineSearches by (CAS,

SMILES, Name)Substructure search

Similarity Search

EM9-1a,b, 2,3

Data import and export,Format

ConversionsEM9-1,2,3

Applicability domain

EM9-1a

Similarity assessment

EM9-1b

Page 5: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

AMBIT Database Today

Not restricted to these datasets! Any dataset can be imported! (e.g. DSSTox, AQUIRE, LLNA dataset …)

Page 6: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

AMBIT More about the internals…

Open source, relying on open standards Modular approach Stand alone and web versions Implemented in Java, i.e.

Platform independent (same application runs on Windows, Unix, Mac …) Suitable for web applications

The cheminformatics functionality relies on the open source Java library – The Chemistry Development Kit http://cdk.sourceforge.net/

The software is based on a Relational Database Management System Allows much faster and convenient access to the data in contrast to flat text

files. Our choice is MySQL database (www.mysql.com), which is the most popular

open source relational database. Chemical Markup LanguageChemical Markup Language (CML) (CML)

Acknowledged method of encoding chemical data in XML Acknowledged method of encoding chemical data in XML Being adopted by a large number of chemical organisations, from government, Being adopted by a large number of chemical organisations, from government,

through commercial to academia. through commercial to academia. The choice of CML for the The choice of CML for the internal formatinternal format makes the database makes the database independent independent

of the softwareof the software which is able to access it, in contrast to some proprietary which is able to access it, in contrast to some proprietary solutions.solutions.

Page 7: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

AMBIT Information stored:

Structures internally stored in (compressed) CML format, allowing transparent and easy storage of 1D,2D or 3D representations (including mixtures)

Multiple 3D structures per compound Identifiers (SMILES, INChi, CAS or other registry numbers; unlimited

number of arbitrary identifiers and synonyms) Inventory indicator Descriptors (unlimited number of arbitrary descriptors) Experimental data (flexible templates for experimental data) QSAR models Literature references Fingerprints and atom environments for fast substructure and

similarity search Other information generated in order to accelerate specific

queries The complete documentation of AMBIT Database is available at

http://ambit.acad.bg/docs

Page 8: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

AMBIT Database schema

Descriptors Repository

Compounds Repository

QSAR models RepositoryExperimental

ResultsRepository

UsersRepository

Literature ReferencesRepository

Queries

Page 9: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

AMBIT selected functionalities Input/output of chemical compounds, descriptors,

experimental data and QSAR models (many file formats)

Search Simple search (CAS, SMILES, chemical name) Descriptor search Experimental data search Substructure and similarity search

Grouping Verhaar classification scheme Similarity (see J.Jaworska presentation tomorrow)

QSAR Applicability domain assessment

Page 10: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

AMBIT Online – Similarity search

Page 11: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

AMBIT Online - Query result

Page 12: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

Links to other databases - KEGG

Page 13: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

Information about QSAR models

Page 14: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

AMBIT Database ToolsStandalone application

Page 15: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

AMBIT User InterfaceExample: Search by descriptor ranges

Page 16: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

AMBIT DiscoverySoftware for applicability domain and grouping

Methods: Descriptor space

Ranges Euclidean distance City-block Distance Probability Densityoptions

Threshold Preprocessing (e.g. PCA) Center More….

Structural similarity Fingerprints

Consensus fingerprint + Tanimoto distance

Consensus fingerprint + Missing fragments

Atom environments Consensus atom environments +

Hellinger distance kNN + Tanimoto distance Ranking

Results from several methods can be combined.Results from several methods can be combined.

Page 17: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

AMBIT DiscoveryData visualisation

Page 18: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

AMBIT DiscoveryResults (exported to MSExcel file)

Page 19: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

Similarity based on mechanistic understanding

Verhaar H.J.M., Van Leeuven C., Hermens J.L.M.,Classifying Environmental Pollutants. 1: Structure-Activity Relationships for Prediction of Aquatic Toxicity, Chemosphere, Vol.25, No.4, pp.471-491, 1992

Verhaar scheme 34 rules 5 classes

Class 1. Narcosis or baseline toxicity

Class 2 Less inert compounds

Class 3 Unspecific reactivity

Class 4 Compounds and groups of compounds acting by a specific mechanism

Class 5 Not possible to classify according to these rules

Page 20: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

Verhaar scheme implementation

Modular approach

Can be used within:

•AMBIT Database Tools

•As an extension to ToxTree http://ecb.jrc.it/qsar/toxtree

Page 21: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

Summary

Many tools were developed and we are working on their seamless integration

Both standalone and web application are in beta stage and are being extensively tested

Synergies with other projects LRI Cefic gold standard BCF database will be stored in AMBIT LRI Cefic biotransformation database will be able to communicate with

AMBIT BCF ECB Cramer rules software for TTC (human health) - ToxTree Fraunhofer Institute subchronic toxicity database (human health) Approaches to similarity assessment will be further extended and tested

in context of category development /read across (ECB funded project) Open source software lowers the user barrier,

facilitates the dissemination activities and enables the reproducibility of models and results

Page 22: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

This work is funded byCEFIC LRI EEM-9

Building blocks for a future (Q)SAR decision support system :

databases, applicability domain and structure conversions

Acknowledgment

Page 23: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

The Chemistry Development Kithttp://cdk.sourceforge.net CDK is a freely available open source Java library for

structural chemo- and bioinformatics. Originated in - and is hosted by – the Research Group for

Molecular Informatics at Cologne University’s Bioinformatics Center.

Maintained and enhanced by more than 20 developers from both academic and industrial institutions all over the world.

Used in more than 10 different academic and industrial projects world wide.

Provides methods for many common tasks in molecular informatics SMILES parsing and generation Substructure searching 2D and 3D rendering of chemical structures I/O routines (format conversions) 3D builder QSAR module, etc

Page 24: AMBIT Software for Data Management and (Q)SAR Applications Nina Jeliazkova Bulgarian Academy of Sciences Institute for Parallel Processing Sofia Bulgaria.

QSAR2006 8-12 May, Lyon AMBIT is available online at http://ambit.acad.bg

Thank you!Thank you!

Questions?Questions?