Informatics in Drug Discovery - evqfm.com.br · Phases of Drug Discovery Enabling Science &...

Post on 30-Jul-2018

242 views 0 download

Transcript of Informatics in Drug Discovery - evqfm.com.br · Phases of Drug Discovery Enabling Science &...

Informatics in Informatics in Drug DiscoveryDrug Discovery

Tudor I. OpreaDivision of Biocomputing

University of New Mexico School of Medicinetoprea@salud.unm.edu

The University of New MexicoDivision of BIOCOMPUTINGCopyright © Tudor I. Oprea, 2008. All rights reserved

OutlineOutline• Phases of Drug Discovery• Target Identification

– What is Bioinformatics?

• Hit Identification– What is Cheminformatics?

• Lead Identification– The GPR30 agonist; systems chemical biology

• Lead Optimization– Structure-based design against Influenza

• The Importance of Accurate Information

PalestrantePalestrante• Palestra:• The first definition of the word comes from the ancient times (ant.):

• lugar para exercícios de ginástica na Grécia e em Roma.• (Place for the practice of exercises and gymnastic in Greece and Rome)

The "Boxer Vase" from Hagia Triada in Crete• So Palestrante is a practitioner of exercises

– in this case, of Speech and Mind

Phases of Drug DiscoveryPhases of Drug Discovery

Enabling Science & Technology Emerging Technologies

Predictive ADME/Tox, Safety Assessment Front-loading Risk

Therapeutic Input Clinical Insights

Stringent Criteria

Clinical CandidateTarget Lead

Target Identification

HitIdentification

Lead & ProbeIdentification

Concept Testing Launch

Develop-ment for launch

Lead/Probe optimization

File for:eINDIND

Receive:NDA

Preclinical Drug Discovery ParadigmPreclinical Drug Discovery Paradigm

TargetsTargets LeadsLeadsIdenti-fication

Vali-dation

Design

MakeTest

GenomeProteomeDisease AreaBiol. Effects

Hit IDOngoing TV Fast-followersIntellectual Property coverageCandidate DrugsCandidate Drugs

BioinformaticsBioinformatics CheminformaticsCheminformatics

Medical InformaticsMedical Informatics

Target Identification

Hit Identif.

Lead Identif.

Lead optim.

Clinical Candidate

ProductionIdentification

Human genetics

Mouse genetics

Target Identification in Preclinical DiscoveryTarget Identification in Preclinical Discovery

ValidationThe key in target identification is mass production of pure protein for structural studies

What is Bioinformatics?What is Bioinformatics?• Historically, bioinformatics was related to sequence

analysis for genes & proteins• It tried to find patterns in multiple seq., and understand how

sequence information relates to genes, proteins, cellular substructures (organelles), cells and tissues, as well as whole (micro)organisms.

• Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline.

• Its ultimate goal is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned

Human Genome MapHuman Genome Map

Use this website to search the human genome. No prior experience required.

http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi

Query: Query: Color BlindnessColor Blindness

Two hits on chromosomes 7 and X (opsin)

Continuing the searchContinuing the search……• We searched GenBank for ‘human opsin’

and found >150 hits. We selected NT_025965 (Homo sapiens chromosome X)• We then searched for ‘opsin 1’ in the hitlist, and

found a mRNA, that was in turn translated into the following protein sequence:

• MAQQWSLQRLAGRHPQDSYEDSTQSSIFTYTNSNSTRGPFEGPNYHIAPRWVYHLTSVWMIFVVTASVFTNGLVLAATMKFKKLRHPLNWILVNLAVADLAETVIASTISIVNQVSGYFVLGHPMCVLEGYTVSLCGITGLWSLAIISWERWLVVCKPFGNVRFDAKLAIVGIAFSWIWSAVWTAPPIFGWSRYWPHGLKTSCGPDVFSGSSYPGVQSYMIVLMVTCCIIPLAIIMLCYLQVWLAIRAVAKQQKESESTQKAEKEVTRMVVVMIFAYCVCWGPYTFFACFAAANPGYAFHPLMAALPAYFAKSATIYNPVIYVFMNRQFRNCILQLFGKKVDDGSELSSASKTEVSSVSSVS

• This sequence was submitted to BLAST

BLAST / PDB ResultsBLAST / PDB Results• Our query was limited to PDB (Protein DataBank)• Putative conserved domain (7 TM) has been detected:

• Five seq. (3 proteins from the opsin family, 2 fragments) were identified:

1F881F88: X: X--ray structure of ray structure of bovine bovine rhodopsinrhodopsin

Front view (left) and trans-membrane view (right) of the prototype for 7-TM protein models for GPCRs (G-protein coupled receptors)

e.c.

i.c.

t.m.

GPCRs are targets to ~35% of the known drugs

Statistical distribution of Statistical distribution of AffymetrixAffymetrix U133U133--A A gene chip probes produced by gene chip probes produced by BPQsBPQs after after

18 hr incubation with MCF18 hr incubation with MCF--10A cells. 10A cells.

• This analysis provided Scott Burchiel with a small subset of genes that could be analyzed in more detail. Experiments indicated that 3,6-BPQ induces dioxin-response elements more than 1,6-BPQ.

3,6-BPQ1,6-BPQ

S.W. Burchiel et al., Toxicol. Applied Pharmacol 2007, 221:203-214

Top Genes Expressed after BPQ ExposureTop Genes Expressed after BPQ Exposure

The University of New MexicoDivision of BIOCOMPUTING

Browse Pathways for free at Biocarta.com

172816q22.1 NQO1 NAD(P)H dehydrogenase, quinone 1210519_s_at

864410p15-p14 AKR1C3

Aldo-keto reductase family 1, member C3 (3-alpha hydroxysteroid dehydrogenase, type II)209160_at

15452p21 CYP1B1 cytochrome P450, family 1, subfamily B, polypeptide 1202436_s_at

164610p15-p14 AKR1C2

Aldo-keto reductase family 1, member C2 (dihydrodioldehydrogenase 2; bile acid binding protein; 3-alpha hydroxysteroid dehydrogenase, type III)

211653_x_at

164610p15-p14 AKR1C2

Aldo-keto reductase family 1, member C2 (dihydrodioldehydrogenase 2; bile acid binding protein; 3-alpha hydroxysteroid dehydrogenase, type III)

209699_x_at

164510p15-p14 AKR1C1

Aldo-keto reductase family 1, member C1 (dihydrodioldehydrogenase 1; 20-alpha (3-alpha)-hydroxysteroid dehydrogenase)

216594_x_at

164510p15-p14 AKR1C1

Aldo-keto reductase family 1, member C1 (dihydrodioldehydrogenase 1; 20-alpha (3-alpha)-hydroxysteroid dehydrogenase)

204151_x_at

LocusLink

ChromosomeGeneDescriptionProbe

S.W. Burchiel et al., Toxicol. Applied Pharmacol 2007, 221:203-214

TryptophanTryptophan Metabolism (KEGG)Metabolism (KEGG)

http://www.biocarta.com/pathfiles/tryptophanPathway.asp

CYP1B1

CYP1B1 CYP1B1 Homology Homology

ModelModelBuilt from X-ray structure of CYP2B4

Partially optimized with 3,6-BPQ

O

O

Benzo[a]pyrene-3,6-dioneCAS # 3067-14-9

The University of New MexicoDivision of BIOCOMPUTING

High Throughput Synthesis and Screening

Synthetic Compounds Natural Products

Target Identification

Hit Identif.

Lead Identif.

Lead optim.

Clinical Candidate

Modern Technologies in Preclinical DiscoveryModern Technologies in Preclinical Discovery

Compound managementCompound management

Handles both liquids & solidsRobots designed for fast reformatting & subsetting (“cherry picking”)Cheminformatics-based selection for diversity

From single compound synthesis to From single compound synthesis to combinatorial chemistrycombinatorial chemistry

Courtesy of Thomas Kühler

Compound storage is fully automated

What is What is CheminformaticsCheminformatics??• Molecular models are widely-used in biosciences ranging

from analytical chemistry & biochemistry to immunology & toxicology

• Cheminformatics integrates data via computer-assisted manipulation of chemical structures

• Chemical inventory & compound registration are vital to cheminformatics, but it is their combination with other theoretical tools, linked to physical (organic) chemistry, toxicity, etc. that brings unique capabilities in the area of bioactivity discovery.

• Other names used by different people that relate to cheminformatics and computational chemistry: computer-aided drug design; quantum dynamics; biopolymer modeling; molecular / chemical diversity; virtual screening; etc...

1 1 60 0 01 2 60 0 01 3 10 1 91 4 5 2 2. . . . . .

Understanding ModelingUnderstanding Modeling

RGB format550 Kb

JPG format25 Kb

•• Data Compression rarely Data Compression rarely leads to understandingleads to understanding

•• Complexity reduction (e.g., Complexity reduction (e.g., modeling) leads to modeling) leads to interpretationinterpretation

Chemical system Biological system

Commercial candidates(pharmaceuticals, flavors, additives, etc.)

Experimental knowledge

Ki Metabolism Toxicity...pH Stability Soly pKa ...

Modified from G. Cruciani

StructureStructure--Property CorrelationsProperty Correlations

++

Modified from G. Cruciani

3D3D--DescriptorsDescriptors

++

Modified from G. Cruciani

3D3D--DescriptorsDescriptors

Etot = Eel + ELJ + Ehb + Eent

Modified from G. Cruciani

3D3D--DescriptorsDescriptors

Modified from G. Cruciani

3D3D--DescriptorsDescriptors

Modified from G. Cruciani

3D3D--DescriptorsDescriptors

StructureStructure--Based Drug DesignBased Drug Design

M von Itzstein et al., Nature, 363:418 - 423, 1993

O=C1N(C)C(=O)N(C)C(=C12)N=CN2C

CN1C(=O)N(C)C(=O)c(c12)n(C)cn2

Cover Art for “Chemoinformatics in Drug Discovery”, Wiley VCH 2005

The Many Facesof Caffeine

Combinatorial & Medicinal Chemistry

ToxicogenomicsToxicogenomics

Target Identif

Hit Identif.

Lead Identif.

Lead optim.

Clinical Candidate

N

R2R1

R3" R4"

Modern Technologies in Preclinical DiscoveryModern Technologies in Preclinical Discovery

GPR30 GPR30 –– A Novel Estrogen TargetA Novel Estrogen Target• Prossnitz et al @ UNM

identified a fully functional intracellular GPCR (bound to Endo-plasmic Reticulum)

• Binds Estradiol• Tamoxifen is an

agonist – may explain cancer relapses to Tamoxifen therapy

17βE2-Alexa 546

Revankar CM et al., Science 2005, 307:1625

NH

OS

ONH

NCH3

CH3CH3

CH3S

OOH

O

CH3 CH3Cl O

OH

ClCl

SOH

OO

CH3OH

HH

HNH

O

OHThe University of New MexicoSCHOOL OF MEDICINE

The University of New MexicoSCHOOL OF MEDICINE

40% 2D Similarity(MDL, Daylight)40% ROCS(3D Similarity fromOpenEye)20% ALMOND(pharmacophorefingerprint similarityfrom Molecular Discovery)Query: Estradiol

Applied to 10,000molecules; of these,100 were tested.

One got lucky…

Bologa C & Revankar C et al., Nature Chem. Biol. 2006, 2: 207-212

Discovery of a Potent GPR30 AgonistDiscovery of a Potent GPR30 Agonist

Computational Chemistry - (Q)SAR

N

O N

N

N

NCl

O

NO

N

N

N

N

Combinatorial Chemistry

N

ON

O

ON

O

N

N

N

N NO

O

N O

O

O

NON

Chemical Space Navigation

N

N

N

N

N

S

O

N N

N

O

N O

ON

OO

NO

N

NNChemicalInformation

BiologicalInformation

StructuralInformation

++

High Throughput High Throughput CheminformaticsCheminformatics

T.I. Oprea, Curr Opin Chem Biol, 2002, 6, 384-389

Needles in the HaystackNeedles in the Haystack……

Mostly Haystack,Mostly Haystack,But Some NeedlesBut Some Needles

actives per target

T1 T5 T9 T13 T17 T21 T25 T29 T33 T37 T41 T45 T49 T53

Density of Actives in Density of Actives in HTS SpaceHTS Space

actives per target

T1 T5 T9 T13 T17 T21 T25 T29 T33 T37 T41 T45 T49 T53

MW = 817

ClogP = 5.7

PSA = 269

Lipinski score = 3

2 x -N=N-

2 x -SO3H

Spotting Frequent HittersSpotting Frequent Hitters

Transgenic animalsClinically relevant disease modelsStudy metabolism & toxicology in“human”-ized conditions

Structure-based Drug DesignZanamivir (inhaled; Relenza®)first anti-influenza drug (1999)Oseltamivir (orally available prodrug, Tamiflu ®) launched in 2001

Target Identif

Hit Identif.

Lead Identif.

Lead optim.

Clinical Candidate

O

NH

NH2

OO

O

Modern Technologies in Preclinical DiscoveryModern Technologies in Preclinical Discovery

Neuraminidase Inhibitors for InfluenzaNeuraminidase Inhibitors for Influenza• X-ray structure guided rational design

• GRID-suggested replacing -OH with basic functionality• Physical properties not amenable for oral delivery

• GSK markets this as Relenza, • First drug for influenza

OCO2H

OH

OH

OH

NH

OH

O

IC50 = 8600nM IC50 = 5nM

Zanamivir

OCO2H

OH

OH

OH

NH

NH

O

NH2

NH

Lead molecule

Slide modified from Andy Davis (AstraZeneca R&D Charnwood)

M von Itzstein et al., Nature, 363:418 - 423, 1993

J. Am. Chem. Soc. 681, 119, 1997 J. Med. Chem. 2451, 41, 1998

IC50 = 1nM

CO2H

OH

NH

NH2

O

O

O

H

H

O

OH

IC50 = 150nM

Glu 276

Gilead Neuraminidase InhibitorsGilead Neuraminidase Inhibitors

• Zwitterion not amenable for oral delivery• Ethyl ester (oseltamivir) good oral absorption, duration• Marketed as Tamiflu, first oral drug for influenza

O

CO2HNH

NH2

O

Slide modified from Andy Davis (AstraZeneca R&D Charnwood)

……RelenzaRelenza vs Tamifluvs Tamiflu• Both are potent neuraminidase inhibitors• Relenza: Zanamivir is delivered via Diskhaler®

• Tamiflu simple tablet formulation• Deesterified in plasma, with long plasma T½

• Tamiflu (marketed by Roche)• took 65% U.S. market-share from Relenza in 7 weeks

• Q1/Q2 2002 sales Relenza vs Tamiflu• Relenza market share fallen to 10%• GSK quoted reason “Slowness of the US to adopt

inhalation therapies”

Slide modified from Andy Davis (AstraZeneca R&D Charnwood)

Inhalation to Overcome Low BioavailabilityInhalation to Overcome Low Bioavailability

LigandLigand--Based Virtual ScreeningBased Virtual Screening

Binding mode of two agonists (red) and two antagonists (blue)B. Edwards et al., Mol Pharmacol 2005 68:1301-1310

TargetTarget--Based Virtual ScreeningBased Virtual Screening

• Started with NMR-solved bound of known peptide inhibitor; applied to existing X-ray structure (docking)

• Used 2D/3D searches in CDL library, using peptide infoWork by Richard Larson, Cristian Bologa, Tudor Oprea (UNM) The University of New Mexico

OFFICE OF BIOCOMPUTING

Macromolecules Genes Chemical Diversity

High Throuput Screening

Identify “actives” Optimize

(Q)SAR

Animal Testing

Pharmacokinetics, pharmacodynamics, toxicology

Clinical Testing

High Throughput Drug DiscoveryHigh Throughput Drug Discovery

Genomics CombiChem

HTSHTS

Informatics in Drug DiscoveryInformatics in Drug DiscoveryMedicalInformatics

MedicalInformatics

Disease

Gene(s) Target(s)

ScreeningPathwayAnalysis

Lead(s)

ADME/ToxOptimization

ClinicalTrials

OptimalDelivery

OptimalDose

ImprovedTherapy

Patient(s)

MolecularAssemblies

CandidateDrug(s)

CheminformaticsBioinformatics

The University of New MexicoDivision of BIOCOMPUTING

Södertälje

Lund

Mölndal

Alderley

Reims

Charnwood

Wilmington

Boston

Montreal

•• For example, AstraZeneca For example, AstraZeneca preclinical discovery:preclinical discovery:

•• 4500 people working in 11 sites 4500 people working in 11 sites across 4 continentsacross 4 continents

•• Truly global organizationTruly global organization

•• Emphasis on local, existing Emphasis on local, existing strengths, difficult to restrengths, difficult to re--balancebalance

•• Internal and external Internal and external collaborations need to be collaborations need to be maximizedmaximized

Bangalore, India

Brisbane, Australia

Research Sites are Often Worlds ApartResearch Sites are Often Worlds Apart

• Large pharma houses cover mostly major disease areas– Gastrointestinal (Gastro Esophageal Reflux / Irritable Bowel

Syndrome)– Cancer (Solid tumours)– Respiratory (Asthma/Chronic Obstructive Pulmonary Disease)– Inflammation (Autoimmune)– Cardiovascular (Thrombosis)– Infection (Genomics targets)– CNS (Neurology) – Pain (Analgesia)

• Focused on civilized (first) world health problems• Not here to cure mankind, but to make a profit• If projected sales below $1.3 billion, will not launch• Often need to bow to market pressure• “Mee too” has been the most effective strategy so far.

The Task: Drug HuntingThe Task: Drug Hunting

The University of New MexicoSCHOOL OF MEDICINE

Traditional Drug DiscoveryTraditional Drug Discovery

Pote

ncy

ADME

Pote

ncy

ADME

T. I. Oprea. Molecules, 7:51-62, 2002

The University of New MexicoSCHOOL OF MEDICINE

The Traditional Approach: AnalysisThe Traditional Approach: Analysis• Historically, medicinal chemistry efforts started

from a lead structure with (typically) poor ADME properties and micromolar affinity.

• Initial synthetic efforts were aimed at improving the binding affinity.

• Once medium- to low-nanomolar affinity was achieved, ADME properties were optimized.

• A similar strategy is being pursued in virtual screening, whereby ADME filters are applied post-docking, in an effort to trim down the large number of “virtual hits

T. I. Oprea. Molecules, 7:51-62, 2002

The University of New MexicoSCHOOL OF MEDICINE

The Problem with the Traditional The Problem with the Traditional ApproachApproach

• One has to preserve the molecular determinants responsible for potency, while modifying the structure in order to achieve good ADME properties.

• This often results in reduced binding affinity, and the process may require several iterations before optimal structures are found

• To a large extent, in vivo screening for good ADME properties can be introduced in order to avoid radical drops in affinity. This strategy is currently used in many pharmaceutical companies.

• However, the in vivo ADME screening strategy cannot cope with increasingly large numbers of compounds.

T. I. Oprea. Molecules, 7:51-62, 2002

The University of New MexicoSCHOOL OF MEDICINE

Pote

ncy

ADME

Pote

ncy

ADME

A Solution From the LiteratureA Solution From the Literature

Apply ADMEfilters duringpost-HTS analysis orvirtual screening:PSA,LogD74, Solubility,Rule of Fiveetc.

T. I. Oprea. Molecules, 7:51-62, 2002

The University of New MexicoSCHOOL OF MEDICINE

The Literature Approach: AnalysisThe Literature Approach: Analysis• This approach yields good ADME properties without

considering the receptor affinity. • It is based on the expectation that, out of thousands (or

more) compounds, some will display good affinity.• This approach is a more advanced implementation of the

“computational alert” procedure of Chris Lipinski. • One has to preserve the molecular determinants

responsible for DMPK properties, while modifying the structure in order to achieve good potency.

• This may often result in reduced ADME properties, but the process is less time-consuming, before optimal structures are found.

T. I. Oprea. Molecules, 7:51-62, 2002

The University of New MexicoSCHOOL OF MEDICINE

The Problem with the Literature Approach:The Problem with the Literature Approach:

• To a large extent, this strategy rules out potential problems with passive permeability, but is unlikely to capture activemechanisms unless more advanced filters are introduced. This approach is very popular in several pharmaceutical companies.

• However, the ADME-filtering strategy cannot cope with the situation where NO HITS are found, because it decreases the probability of detecting hits outside the predefined ADME properties space.

• There is a clear need to design an approach that would allow the simultaneous optimization of both receptor binding affinity and ADME properties.

T. I. Oprea. Molecules, 7:51-62, 2002

The University of New MexicoSCHOOL OF MEDICINE

The The ““ConvergentConvergent”” ApproachApproach

Pote

ncy

ADME

Pote

ncy

ADME

Ignore

Investigatein moredetail

T. I. Oprea. Molecules, 7:51-62, 2002

The University of New MexicoSCHOOL OF MEDICINE

The The ““ConvergentConvergent”” Approach: AnalysisApproach: Analysis• This strategy requires simultaneous optimisation of both

binding affinity and ADME properties. An integrated software framework to address this problem is required.

• The real difficulty consists in addressing relevant properties in a consistent manner. For example: PSA, hydrophobicity and H-bonds are important in both binding and permeability - but the question is, in what proportion are they contributing in each case?

• In the “convergent” strategy, one would simultaneously monitor changes that would influence binding affinity and ADME properties. This may result in progressive increments in affinity and ADME properties, making this process less time-consuming.

T. I. Oprea. Molecules, 7:51-62, 2002

The University of New MexicoSCHOOL OF MEDICINE

The Problem with the The Problem with the ““ConvergentConvergent”” Approach:Approach:

• An integrated software framework that allows interactive user-input should allow a rapid convergence towards interesting compounds.

• This strategy relies on the accuracy of the experimental data used to derive the QSAR models to predict affinity and ADME (the old problem… how good is your model?).

• However, this framework cannot substitute creative thinking, serendipity and good research!

• This approach is limited by the unexpected (novel targets, novel chemistry) and by inherent errors (20-30%).

T. I. Oprea. Molecules, 7:51-62, 2002

The University of New MexicoSCHOOL OF MEDICINE

Reliability of Biological Data (1)Reliability of Biological Data (1)

Actual pICActual pIC5050

Pre

dict

ed p

ICP

redi

cted

pIC

5050

CoMFA Model (External prediction)3210-1-2

-2

-1

0

1

2

3

HIV-1 ProteaseInhibitorsN = 36Q2 = 0.814 KM = 0.064 mM

Oprea et al., in Ghose & Viswhanandan, Dekker 2001, 233-266

The University of New MexicoSCHOOL OF MEDICINE

Reliability of Biological Data (2)Reliability of Biological Data (2)

Actual pICActual pIC5050

Pre

dict

ed p

ICP

redi

cted

pIC

5050

CoMFA Model (External prediction)

HIV-1 ProteaseInhibitorsN = 36Q2 = 0.487 KM = 2.0 mM

3210-1-1

0

1

2

3

Oprea et al., in Ghose & Viswhanandan, Dekker 2001, 233-266

The University of New MexicoSCHOOL OF MEDICINE

Reliability of Biological Data (3)Reliability of Biological Data (3)

%HIA Model (Training set)

Human IntestinalAbsorption:1: Sulfasalazine(%HIA is 12, not 65)2: Sulfapyridine(%HIA is 93)3. 5-aminosalicylicBacterial azo bond reduction occurs in the intestine

N

NH

S NN OH

O

O

COOH

N

NH

S NO

O

N OH

COOH

1

2 3

+

Oprea & Gottfries, J. Mol. Graphics Mod. 1999, 17, 261-274

The University of New MexicoSCHOOL OF MEDICINE

Reliability of Biological Data (4)Reliability of Biological Data (4)Oral DrugAbsorptionN = 16Q2 = 0.44 (%HIA) Q2 = 0.71 (Caco)

This sigmoidalrelationshipis frequentlydescribed inthe literature

-7

-6

-5

-4

-3

-2

-1

0

1

2

3

-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5

u[1]

t[1]

Acetyls

Alpreno

Atenolo

Cortico

DexametFelodip

Hydroco

Mannito

Metopro

Olsalaz

Practol

Propran

Salicyl

Sulfasa

Testost

Warfari

Azo bond reductionin the large intestine

Paracellular Transport

%HIA Model (Training set)

…Any model left?!

Oprea & Gottfries, J. Mol. Graphics Mod. 1999, 17, 261-274

The University of New MexicoSCHOOL OF MEDICINE

Reliability of Biological Data (Reliability of Biological Data (55))

There is poor agreement in terms of clearance data - over 42% of the compounds differ more than 30%

71.3

42.6

13.9

3.0

01020304050607080

% o

f com

pare

d co

mpo

unds

>10 % >30 % >100 % >1000 %difference between sources

Sources:Goodman andGilman 1996 vs. Avery’s 1997 Data:Clearance (202 drugs)

The University of New MexicoSCHOOL OF MEDICINE

Reference Published Structure Corrected Structure Comment JMC 37-476 chart 1

N

O

O

O NO O

O

rolipram: incorrect N atom position

JMC 43-2217 chart 1

N

N

O

N

N

O

A-85380: incorrect ring size

-||- & JMC 36-2645

NN

O

O

O N

N

O

tropisetron: methyl group in plus

-||-

N

N N

O

N O O N

O

N

ON

O

DAU-6285: missing methoxy; N instead O

JMC 37-758 chart 1 N

N

O

O

N

OH

N3

N

N

O

O

N

O

N3

Ro-15-4513: methyl group missing

JMC 37-787 figure 1

N

S

O

SO

O

OO

NS

O

S

epalrestat: E/Z config: E instead Z

Reliability of Chemical Data (1)Reliability of Chemical Data (1)

The University of New MexicoSCHOOL OF MEDICINE

NH

N

OH

O

NH

O

NH2

O NH2

O

ONH

O

"Carisoprodol"Merck Index 13th ed #1854

Carisoprodol correct structure

Disclaimer:The above error have been corrected in Merck Index 14th edition. In general, the Merck Index is a reliable source of information.

Reliability of Chemical Data (2)Reliability of Chemical Data (2)

• Chirality: What chemists can interpret, computers are not always able (the “above/below the plane” must be strictly enforced)

Not machine-readable Machine-readable

• Missing/altered atoms/substituents – overall error rate above 9%– Incorrectly drawn or written structures (3.4%); incorrect molecular

formula or molecular weight (3.4%);– Unspecified binding position for substituents or ambiguous

numbering scheme for the heterocyclic backbone (0.91%);– Structures with the incorrect backbone (0.71%);– Incorrect generic names or chemical names (0.24%);– Incorrect biological activity (0.34%);– Incorrect references (0.2%).

N

NRO

N

NH2

N

N

N

OH OH

NH2

N

N

NNO

OH OH

R

Reliability of Chemical Data (3)Reliability of Chemical Data (3)

Reliability of Structural DataReliability of Structural Data

• Photoactive yellow protein from E. Halophila– First structure 1PHY.. Wrong– Subsequently corrected at higher resolution

1PHY 1989 2.4Å2PHY 1995 1.4Å

The University of New MexicoSCHOOL OF MEDICINE

A Davis, S Teague G Kleywegt Angew. Chem. Int. Ed. 2003

Reliability of Structural Data (2)Reliability of Structural Data (2)“Where there is no chicken wire, there’s no electrons..atoms”

• 1FQH now withdrawn from PDB

The University of New MexicoSCHOOL OF MEDICINE

A Davis, S Teague G Kleywegt Angew. Chem. Int. Ed. 2003

Reliability of Structural Data (3)Reliability of Structural Data (3)

• NH2 & O can’t be distinguished from density as isoelectronic • PDBREPORT suggest 15% in Protein databank likely incorrect

• N/C cannot normally be distinguished from density

NH

ONH

OOH

N

e.g. glutamine, asparagine

e.g. histidine

NH2

O

O

NH2

NH

NN NH

The University of New MexicoSCHOOL OF MEDICINE

Where do I work?Where do I work?

The Importance of Accurate InformationThe Importance of Accurate Information• With one source, we have information• With two sources, we can have confirmation or confusion• With several sources, knowledge emerges• Accurate Information is important – hence we tend to “trust”

certain newspapers, TV stations or scientific journals (the peer-review system regulates that).

• If something is really important to you (*) then consult multiple sources and verify that your assumptions are correct

• (*) e.g., who is on my PhD committee; what’s my girlfriend’s birthday; what are the side-effects for the medicine my parents are taking; is this synthesis route correct? etc.

The University of New MexicoSCHOOL OF MEDICINE

The Attrition Rate in Drug DiscoveryThe Attrition Rate in Drug DiscoveryHTS

HTS Hits

HTS Actives

Lead Series

Drug Candidates

Drug

1,000,000

2,000

1,200

50-200

1

0.1

Incr

ease

d k

nowl

edge

and

val

ue

Incr

ease

d ris

k of

failu

re

Incr

ease

d e

xper

imen

tal e

rror r

ate

The University of New MexicoSCHOOL OF MEDICINE

The Optimization Response in Drug DiscoveryThe Optimization Response in Drug Discovery

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Dru

ggab

ility

Multiple Response Surface

0

0.2

0.4

0.6

“Fuji-Yama” Landscape

Chemical Space

“Leads”

“Drugs”

“Hits”

The University of New MexicoSCHOOL OF MEDICINE

Considering a Career in Molecular SciencesConsidering a Career in Molecular Sciences• Science is not a democracy: just because everyone believes

something does not make it true• Choose topics that open new possibilities, not those that (may)

lead to dead-ends; don’t get stuck with a single technology • Always go to the source (original publication), as information gets

to be sometimes selectively presented

• Make sure you understand the basics – try to explain what you do to a 5-yr old. If you manage, you grasped the concepts

• Stay away from “fashionable” science: just because it might get you funded, it does not mean it’s science

• Make sure to give credit where credit is due…• …but do not be afraid to claim what’s yours (protect your ideas)

The University of New MexicoSCHOOL OF MEDICINE

Some Take Home MessagesSome Take Home Messages• Nothing is what it seems: verify what you see, doubt what you find, and

always get independent confirmation of your observations. For this reason, once you are sure about your findings you can be ready to defend your results (e.g., GPR30 still very contested by others)

• Don’t be afraid to say I DO NOT KNOW, omniscient beings are not of this world (think Buddha, Jesus, and other enlightened beings)

• Although you do not know, be ready to learn• Focus on problem-solving skills, they are more important than static

learning & memory• Always find ways to reward creativity and out-of-the box thinking• People are 100x more important than equipment – as you progress in

your career, you will find that people are the most important asset• If someone steals your ideas (this happends all the time) remember that

this is a subtle form of flattery (so is envy). Focus on generating new ideas, and do not turn the stolen-idea-situation into an obsession (this will block your creativity)

• Learn where your fear comes from: deal with it inside, and do not take it on other people. Fear leads to anger, anger leads to violence

• Express yourself freely and creatively. The Universe is a friendly place.

The University of New MexicoSCHOOL OF MEDICINE

Garland MarshallGarland Marshall’’s Definition of Lucks Definition of Luck