Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology...

37
Cheminformatics, QSAR Cheminformatics, QSAR and drug design and drug design Unit 24 Unit 24 BIOL221T BIOL221T : Advanced : Advanced Bioinformatics for Bioinformatics for Biotechnology Biotechnology Irene Gabashvili, PhD

Transcript of Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology...

Page 1: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Cheminformatics, QSAR and Cheminformatics, QSAR and drug design drug design

Unit 24Unit 24

BIOL221TBIOL221T: Advanced : Advanced Bioinformatics for Bioinformatics for

BiotechnologyBiotechnologyIrene Gabashvili, PhD

Page 2: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

ReferencesReferences

Special Thanks to Tobias Kind Special Thanks to Tobias Kind - UC Davis Genome Center - - UC Davis Genome Center - Fiehnlab Metabolomics and Fiehnlab Metabolomics and

other other cheminformatics/metabolomicheminformatics/metabolomics experts – for their slides cs experts – for their slides

used in this lectureused in this lecture

Page 3: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

What is it?What is it? Cheminformatics, application of informatics Cheminformatics, application of informatics

to problems in the field of chemistry, for to problems in the field of chemistry, for chemical screening and analysis in drug chemical screening and analysis in drug discoverydiscovery

<Structure-Based> Drug design, the design <Structure-Based> Drug design, the design of a drug molecule based on knowledge of of a drug molecule based on knowledge of the target protein (or nucleic acid) structurethe target protein (or nucleic acid) structure

QSAR, Quantitative Structure Activity QSAR, Quantitative Structure Activity Relationship, the relationship between the Relationship, the relationship between the structure of a chemical and its structure of a chemical and its pharmacological activitypharmacological activity

Page 4: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Bioinformatics

Cheminformatics

SELECTING THE BEST SELECTING THE BEST TARGETSTARGETS

Disease-association doesn’t make a protein a target - requires validation as point of intervention in pathway

Having good biological rationale doesn’t make a protein tractable to chemistry (druggable)

Target Validation Process

Disease TargetTargetSelection

Drug Discovery Process

ClinicLeads

Page 5: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

CheminformaticsCheminformatics

Genome Data Target Structure Lead Hypotheses

O

O

HO

O

O

N

F

O

OO

O

O

NN

O

OO

O

ctgacaagtatgaaaacaacaagctgattg tccgcagagggcagtctttctatgtgcaga ttgacctcagtcgtc

Page 6: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

CheminformaticsCheminformatics Identify chemical compounds Identify chemical compounds establish compound-IDs establish compound-IDs

Identify the various structures which a given compound can Identify the various structures which a given compound can adopt in various chemical environments (add structure IDs)adopt in various chemical environments (add structure IDs)

Associate and store computational and experimental Associate and store computational and experimental data/results with corresponding compoundsdata/results with corresponding compounds

Map and analyze in IPA or any Cheminformatics software: Map and analyze in IPA or any Cheminformatics software:

http://www.netsci.org/Resources/Software/Cheminfo/

http://www.akosgmbh.de/chemoinformatics_software.htm

http://www.rdchemicals.com/chemistry-software/

http://www.chemaxon.com/http://www.chemaxon.com/

Page 7: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Dealing with compounds in “Nature’s Dealing with compounds in “Nature’s Way”Way”

• it’s not just about ligands and docking !− although that’s still what garners most of the attention

• and it’s not just about “tautomers” !− must also consider protonation state

− must also consider stereochemical issues

− must also consider conformational issues

• it’s about being able to automatically use the same structures in silico as Mother Nature uses for a compound in the real world

Page 8: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Stereochemical Issues: Proto-Stereochemical Issues: Proto-Invertible Atoms & BondsInvertible Atoms & Bonds

Tautomeric transforms can change Tautomeric transforms can change stereochemistrystereochemistry

Protonation/deprotonation can change Protonation/deprotonation can change stereochemistrystereochemistry

Protomeric transforms Protomeric transforms can change can change stereochemistrystereochemistry

Page 9: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Terminology for some “new” Terminology for some “new” conceptsconcepts

• two types of stereo-centers: truly chiral atoms and bonds• stereomers: different stereochemical isomers (hence,

different chemical compounds)• two types of proto-centers: acid/base & tautomeric D/A pairs• protomers: different protonation states and/or tautomeric

states of a single given compound• protomeric state: refers to both protonation state and

tautomeric state of a given protomer

• protomeric transform: protomeric-statei → protomeric-statej

• proto-stereomers: different stereomers of protomers of a given compound which differ ONLY with respect to chiralities of invertible or proto-invertible (pseudo-chiral) centers

• proto-stereo-conformers: different 3D conformations of the proto-stereomers of a given compound

Page 10: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Terminology for some “new” Terminology for some “new” conceptsconcepts

• proto-stereomers: different stereomers of protomers of a given compound which differ ONLY with respect to chiralities of invertible or proto-invertible (pseudo-chiral) centers

• proto-stereo-conformers: different 3D conformations of the proto-stereomers of a given compound

• 2D-MetaStructure of a compound: the set of all proto-stereomers of a given compound; i.e., set of all 2.5D connection tables which could be achieved by and which should be associated with a given compound

• 3D-MetaStructure of a compound: the set of all proto-stereo-conformers of a given compound; i.e., set of all 3D conformations of all 2.5D connection tables which could be achieved by and which should be associated with a given compound

Page 11: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

ProtoPlex generates 4 neutral tautomeric forms

(plus additional charged protomers)

Example: Ricin Inhibitors - PterinsExample: Ricin Inhibitors - Pterins

Pterin(1) Pterin(2) Pterin(4)

Ionized Protomers not shown

N

NH

N N

O

H2N

N

N

N N

OH

H2N

N

N

HN N

O

H2N

Pterin(3)

HN

N

N N

O

H2N

receptor-bound tautomer (protomer) may not be the protomer most prevalent in solution

Page 12: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Example: Ricin Inhibitors - PterinsExample: Ricin Inhibitors - Pterins

“A tautomer of pterin that is not in the low energy form in either the gas phase or in aqueous solution has the best interaction with the enzyme.”

S. Wang, et. al., Proteins, 31, 33-41 (1998)

Pterin(1) protomer is preferred in both gas and aqueous soln

Pterin(3) protomer is preferred in receptor binding site

HN

N

N N

O

H2N

N H

OGly121

Tyr123

NH2+

H2N NHArg 180

HO

O

N

H

Val81

Ser176

Redrawn from Wang, et. al, Proteins, 31, 33-41(1998)

Pterin(1) Pterin(2) Pterin(4)

Ionized Protomers not shown

N

NH

N N

O

H2N

N

N

N N

OH

H2N

N

N

HN N

O

H2N

Pterin(3)

HN

N

N N

O

H2N

Page 13: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Example: Barbiturate Matrix Metalloproteinase Example: Barbiturate Matrix Metalloproteinase InhibitorsInhibitors

ProtoPlex generates 5 neutral tautomeric forms

(plus additional charged protomers)

N

HN OHO

O

N

Ph

OH

N

HN OO

OH

N

Ph

OH

HN

HN OO

O

N

Ph

OH

Enol Form (A) Enol Form (B) Keto Form

Ionized Protomers not shown

N

N OHO

OH

N

Ph

OH

Di-Enol Form (D)

N

N OHO

OH

N

Ph

OH

Di-Enol Form (E)

• the receptor-bound tautomer (protomer) might not be the keto protomer which is most prevalent in aqueous solution

• which protomer does the receptor prefer?

• which protomer(s) will be used for vHTS???

Page 14: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Example: Barbiturate Matrix Metalloproteinase Example: Barbiturate Matrix Metalloproteinase InhibitorsInhibitors

“The enol form (A) of the barbiturate is thus favored by the protein matrix over the tautomeric keto form, which dominates in solution.”

H. Brandstetter, et. al., J. Biol. Chem., 276(20), 17405-17412 (2001)

N

N OO

O

P1'

P2'

H

Zn+2

N

O

NO

N

O

Pro217 Asn218

Tyr219

-O O

O

N

O

N

Ala160

Ala161

Glu198

Redrawn from Branstetter, J. Biol. Chem

Page 15: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Example: effect of crystal environment Example: effect of crystal environment

Two different protomers observed in the SAME unit cell!

“Coexistence of both histidine tautomers in the solid state and stabilisation of the unfavoured N-H form by intramolecular hydrogen bonding: crystalline L-His-Gly hemihydrate” T. Steiner and G. Koellner, Chem. Commun., 1997, 1207.

Protomeric transform was induced by intramolecular interaction which was induced by a conformational change which was induced by intermolecular interactions.

Page 16: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

QSPRQSPR motives for adopting “Nature’s motives for adopting “Nature’s Way”Way”

better ADME and other SPR and QSPR modelsbetter ADME and other SPR and QSPR models protomeric state of a “solute” depends on the chemical protomeric state of a “solute” depends on the chemical

potential presented by the surrounding “solvent” or potential presented by the surrounding “solvent” or molecular environment (often different than aqueous soln)molecular environment (often different than aqueous soln)

partition coefficients partition coefficients ((twotwo solvent environments to consider) solvent environments to consider) permeability coefficients permeability coefficients (depend on donor-phase (depend on donor-phase andand membrane) membrane) solubilities solubilities (depend on crystalline (depend on crystalline andand solvent environments) solvent environments) melting points melting points (crystal packing can favor unusual protomeric forms)(crystal packing can favor unusual protomeric forms) need to “select” protomeric formsneed to “select” protomeric forms according to user- according to user-

specsspecs better models better models better decisions better decisions

about what to screenabout what to screen about which “hits” to promote to “leads”about which “hits” to promote to “leads” about route of administration and/or formulationabout route of administration and/or formulation about which leads to promote to candidacyabout which leads to promote to candidacy

Page 17: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Cheminformatic motives for adopting Cheminformatic motives for adopting “Nature’s Way”“Nature’s Way”

better storage of databetter storage of data measuredmeasured properties of compound should be associated with properties of compound should be associated with

the compound (with notations re: experimental conditions)the compound (with notations re: experimental conditions) predictedpredicted properties “of a compound” should be associated properties “of a compound” should be associated

with (stored under) the particular with (stored under) the particular structurestructure used for the used for the predictionprediction

that structure, in turn, should be associated with the compoundthat structure, in turn, should be associated with the compound need a unique identifier that can tie need a unique identifier that can tie anyany proto- proto-

stereomeric structure to the compound to which it stereomeric structure to the compound to which it correspondscorresponds

better use of databetter use of data enable “data-mining” of both measured and computed dataenable “data-mining” of both measured and computed data

discard wet HTS data? save for future “data-mining?” discard wet HTS data? save for future “data-mining?” discard virtual HTS data? save for future “data-mining?” discard virtual HTS data? save for future “data-mining?”

better (more robust) results when searching for better (more robust) results when searching for compounds, data, structures, and substructurescompounds, data, structures, and substructures

Bryan Koontz
Too many words. Need to simplify.* Compounds can adopt different structures in Nature* A specific structure used in computer-based prediction might not be the one Nature prefers* (graphic of predicted props -- compounds, measured props -- structure)* Cheminformatics software must enable a more accurate representation of compounds as Nature sees them
Page 18: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Business & IP motivesBusiness & IP motives

companies companies mustmust be able to recognize when be able to recognize when

two different structures two different structures correspondcorrespond

to the same compound!to the same compound!need a canonically unique identifier that can tie

any proto-stereomeric structure to the compound to which it corresponds

Bryan Koontz
Bryan Koontz3/16/2004Too many words. Need to simplify.* Compounds can adopt different structures in Nature*
Page 19: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Business & IP motives for adopting Business & IP motives for adopting “Nature’s Way”“Nature’s Way”

companies allocate resources for compounds, not companies allocate resources for compounds, not structuresstructures resource-related decisions (what should we purchase, synthesize, resource-related decisions (what should we purchase, synthesize,

screen?) should be based on screen?) should be based on compoundscompounds, not , not structuresstructures to properly manage corporate inventoriesto properly manage corporate inventories to avoid costly, unintended duplications (acquisitions and screening)to avoid costly, unintended duplications (acquisitions and screening) to avoid far more costly failure to screen active compounds for which to avoid far more costly failure to screen active compounds for which

the representative (DB) structures were predicted to be inactivethe representative (DB) structures were predicted to be inactive companies own & intend to patent cmpds, not structurescompanies own & intend to patent cmpds, not structures

offensive and defensive “Freedom To Operate” strategies are offensive and defensive “Freedom To Operate” strategies are farfar stronger when all structures of patented compouds are consideredstronger when all structures of patented compouds are considered

failure to realize that a competitor’s “novel compound” is merely a failure to realize that a competitor’s “novel compound” is merely a different structure of your patented compound can cost $billionsdifferent structure of your patented compound can cost $billions

at least one acknowledged example already exists!!at least one acknowledged example already exists!!

Bryan Koontz
Bryan Koontz3/16/2004Too many words. Need to simplify.* Compounds can adopt different structures in Nature*
Page 20: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Example Nature’s Way ProtocolExample Nature’s Way Protocol

Database

Raw, 2D Input

CompoundFilter

Filtered, 2D Input

ProtoPlex StereoPlex Confort

Multiple, 2D Protomers

Multiple, 2.5D Proto-Stereomers

2D App.

vHTS

Multiple, 3D Proto-Stereo-Conformers

For each compound …– many Proto-Stereomers

– One 2D-MetaStructure

– Many Proto-Stereo-Conformers

– One 3D-MetaStructure • associate structure-based data with corresponding structure of each compound pulled from DB

Page 21: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

StereoPlexStereoPlex

for general purposes, provides user-controlled for general purposes, provides user-controlled “multiplexing” of all truly chiral, invertible, and “multiplexing” of all truly chiral, invertible, and proto-proto-invertibleinvertible stereocenters stereocenters

addresses atom-centered (addresses atom-centered (R/SR/S) and bond-centered () and bond-centered (E/ZE/Z) chirality) chirality automatically excludes “stereochemical junk” (automatically excludes “stereochemical junk” (e.g.e.g., 254 out of 256 , 254 out of 256

combinations of combinations of R’R’s and s and S’S’s for chiral, substituted cubane)s for chiral, substituted cubane) outputs a user-specified number of stereomers selected according outputs a user-specified number of stereomers selected according

to a user-specified priority ruleto a user-specified priority rule multiplexing unspecified stereocenters ensures that CADD results multiplexing unspecified stereocenters ensures that CADD results

don’t suffer due to (necessarily) “random” stereochemistry introduced don’t suffer due to (necessarily) “random” stereochemistry introduced when converting from 2D to 3D -- -- a concept we introduced in 1986when converting from 2D to 3D -- -- a concept we introduced in 1986

multiplexing specified stereocenters provides “stereochemical diversity” multiplexing specified stereocenters provides “stereochemical diversity” for vHTS applications – just as important as “structural diversity”for vHTS applications – just as important as “structural diversity”

for “Nature’s Way” purposes, provides user-controlled for “Nature’s Way” purposes, provides user-controlled “multiplexing” of all “multiplexing” of all invertible & proto-invertible invertible & proto-invertible stereocentersstereocenters

yieldsyields proto-stereomersproto-stereomers

Bryan Koontz
I think we should put these "product detail slides" as backup slides -- we can use them if necessary detail is required
Page 22: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

ProtoPlexProtoPlex

identifies and ensures that invertible and proto-invertible identifies and ensures that invertible and proto-invertible (pseudo-chiral) atoms and bonds are (pseudo-chiral) atoms and bonds are notnot labeled as chiral labeled as chiral essential essential for canonically unique compound identificationfor canonically unique compound identification

can output a “normalized” protomer based on a user-can output a “normalized” protomer based on a user-specified selection rule specified selection rule useful for generating input for certain CADD or QSPR applicationsuseful for generating input for certain CADD or QSPR applications useful for implementing corporate “drawing rules” for preferred useful for implementing corporate “drawing rules” for preferred

representation at registration timerepresentation at registration time can output a user-specified number of protomers can output a user-specified number of protomers

selected according to a user-specified priority ruleselected according to a user-specified priority rule useful for limiting the types as well as the numbers of protomers useful for limiting the types as well as the numbers of protomers

considered and used for various CADD purposesconsidered and used for various CADD purposes offers rational protomer-naming optionsoffers rational protomer-naming options

Page 23: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

ProtoPlex ProtoPlex

under development since 1999under development since 1999 achieving chemical and cheminformatic robustness is not easy!achieving chemical and cheminformatic robustness is not easy! benefited from feedback received from large pharma Collaborators benefited from feedback received from large pharma Collaborators

can generate all plausible protomers by exhaustively can generate all plausible protomers by exhaustively “multiplexing” the corresponding protomeric transforms“multiplexing” the corresponding protomeric transforms simultaneously addresses all acid/base and tautomeric transformssimultaneously addresses all acid/base and tautomeric transforms

simultaneity is critically important for cheminformatic robustnesssimultaneity is critically important for cheminformatic robustness automatically excludes implausible “protochemical junk”automatically excludes implausible “protochemical junk”

generates output in a canonically unique protomer-order generates output in a canonically unique protomer-order and eachand each protomerprotomer isis expressedexpressed inin aa canonicallycanonically uniqueunique

atom-order atom-order can output canonically unique protomer selected/based on can output canonically unique protomer selected/based on

an an OOptive ptive SStandard canonical tandard canonical NNormalization ormalization rulerule resulting OSN protomer yields canonically unique compound IDresulting OSN protomer yields canonically unique compound ID

Page 24: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Protomer enumeration is a non-Protomer enumeration is a non-trivial task! trivial task!

don’t want to enumerate “implausible” protomersdon’t want to enumerate “implausible” protomers don’t want to miss any “plausible” protomersdon’t want to miss any “plausible” protomers we must adjust our preconceptions regarding we must adjust our preconceptions regarding

“plausible” but … we must still consider the energy “plausible” but … we must still consider the energy required for the protomeric transforms; required for the protomeric transforms; i.e.,i.e., we must we must not consider energetically implausible protomersnot consider energetically implausible protomers

we need to consider protomers within a user-we need to consider protomers within a user-specified E-window, analogous to the E-window specified E-window, analogous to the E-window concept used when considering conformers concept used when considering conformers

meanwhile, use heuristics (rules)meanwhile, use heuristics (rules) most programs use relatively simple heuristicsmost programs use relatively simple heuristics ProtoPlex uses ProtoPlex uses veryvery detailed heuristics detailed heuristics

Page 25: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Example duplicates found via OSN Example duplicates found via OSN representationrepresentation

NNH

N

S

OCH3O

NN

HN

S

OCH3O

vs.

tautomeric duplicates:tautomeric duplicates:

N

NH

S

O

N

N

HS

O

vs.

N

O

N

N

ONH2

O

Cl

N

HO

N

N

ONH2

O

Cl

vs.

Page 26: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

it seems so obvious ...it seems so obvious ... if CAMD doesn’t use same structures as used by Mother Nature, if CAMD doesn’t use same structures as used by Mother Nature,

we greatly reduce the chance of making reliable predictions we greatly reduce the chance of making reliable predictions if we go to the trouble of performing calculations and predictions if we go to the trouble of performing calculations and predictions

based on structures, it seems silly not to store the results in an based on structures, it seems silly not to store the results in an easily retrievable mannereasily retrievable manner

the fundamental technology required already existsthe fundamental technology required already exists pharmaceutical industry is already moving in this directionpharmaceutical industry is already moving in this direction

increasing emphasis and reliance on vHTS and QSAR methodsincreasing emphasis and reliance on vHTS and QSAR methods increasing concern regarding IP issues and competitive strategiesincreasing concern regarding IP issues and competitive strategies

former Optive collaborators already using NW componentsformer Optive collaborators already using NW components some barriers to broad adoption/implementation but those some barriers to broad adoption/implementation but those

barriers are certainly not insurmountablebarriers are certainly not insurmountable

Computer Aided Molecular Computer Aided Molecular Design (CAMD) software:Design (CAMD) software:

Page 27: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

How is cheminformatics How is cheminformatics related to other topics of related to other topics of

this course?this course? ChemInformaticsChemInformatics & Mass & Mass

SpectrometrySpectrometry Cheminformatics & Protein Cheminformatics & Protein

StructureStructure Metabolomics Metabolomics

Page 28: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

http://www.peptideatlas.org/ : Mass spectral search of peptides

For example, search for IPI00645064 (also supported in IPA) or VSFLSALEEYTK

Page 29: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

How to search molecules Exact search Substructure search Similarity search

NN

L[O,Cl]

Ligand search

Page 30: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Searching Molecules on PubChem

Goto PubChem Structure Search

18 million compound DB (++)

Page 31: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

CAS SciFinder• 33 million molecules and 60 million peptides/proteins• largest reaction DB (14 million reactions) and literature DB• substructure and similarity search of structures• a must for chemists and biochemists/biologists• no bulk download, no good Import/ Export, no Link outs

Page 32: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Structure search in SciFinder

Retrieved 4000 papers

(refine search only MS and MALDI)

Page 33: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

MS Cheminformatics Notes

There are different search types for mass spectral data similarity search, reverse search, neutral loss search, MS/MS search

There are large libraries for electron impact spectra (EI) from GC-MS There are no large open/commercial libraries for spectra from LC-MS

For creation of mass spectral libraries a holistic approach is important Mass spectral trees can give further information (MSE or MSn)

There are different types of searching structures Exact search, similarity search, substructure search

Before you start a research project, create target lists of possible candidates Collect mass spectra or structures in libraries with references

Page 34: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

MS- cheminformatics LinksHigh-resolution mass spectral database http://www.massbank.jp/

http://fields.scripps.edu/sequest/

http://allured.stores.yahoo.net/idofesoilbyg.html (fragrances, terpenoid mass spectra SE-52 column + RIs)

http://kanaya.naist.jp/DrDMASS/DrDMASSInstruction.pdf

http://mmass.biographics.cz/

http://pubchem.ncbi.nlm.nih.gov/omssa/

Page 35: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Sample exercises:

1) Goto PubChem or Chemspider [and perform the 3 different structure searches using benzene; report on the number of results(use the sketch function to draw benzene (6 ring with 3 aromatic bonds))

2) Download NIST MS Search and perform the 3 different mass spectral searches on cocaine (download JAMP-DX from NIST)

3) Use Instant-JChem [from last course session and create a local demo database with PubChem data.Perform 3 different structure searches with benzene by double-clickingon the structure search field. Report number of results.

Additional task for proteomics candidates:4) Download the NIST peptide search and perform a search on the given examples

Page 36: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

Example Chemical Example Chemical Informatics TopicsInformatics Topics

representation of chemical compoundsrepresentation of chemical compounds representation of chemical reactionsrepresentation of chemical reactions chemical data, databases, and data sourceschemical data, databases, and data sources searching chemical structuressearching chemical structures calculation of structure descriptorscalculation of structure descriptors methods for chemical data analysismethods for chemical data analysis

““Molecular Informatics, the Data Grid, and an Molecular Informatics, the Data Grid, and an Introduction to eScience”Introduction to eScience”

““Bridging Bioinformatics and Chemical Bridging Bioinformatics and Chemical Informatics”Informatics”

Page 37: Cheminformatics, QSAR and drug design Unit 24 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD.

%

SEQ

UEN

CE

ID

AdvancedApproaches

AHHLDRPGHNMCEAGFWQPILLTest Sequence

100%

30%

0

Standard Approaches

Next lecture: Next lecture: STRUCTURE-BASED METHODS STRUCTURE-BASED METHODS FIND MANY HOMOLOGUES (AND FIND MANY HOMOLOGUES (AND

PUTATIVE TARGETS) NOT DETECTABLE PUTATIVE TARGETS) NOT DETECTABLE

FROM SEQUENCE SIMILARITYFROM SEQUENCE SIMILARITY Biochemical function and drugability defined by 3D structure, not sequence - structure is better conserved