Post on 30-Jul-2018
Informatics in Informatics in Drug DiscoveryDrug Discovery
Tudor I. OpreaDivision of Biocomputing
University of New Mexico School of Medicinetoprea@salud.unm.edu
The University of New MexicoDivision of BIOCOMPUTINGCopyright © Tudor I. Oprea, 2008. All rights reserved
OutlineOutline• Phases of Drug Discovery• Target Identification
– What is Bioinformatics?
• Hit Identification– What is Cheminformatics?
• Lead Identification– The GPR30 agonist; systems chemical biology
• Lead Optimization– Structure-based design against Influenza
• The Importance of Accurate Information
PalestrantePalestrante• Palestra:• The first definition of the word comes from the ancient times (ant.):
• lugar para exercícios de ginástica na Grécia e em Roma.• (Place for the practice of exercises and gymnastic in Greece and Rome)
The "Boxer Vase" from Hagia Triada in Crete• So Palestrante is a practitioner of exercises
– in this case, of Speech and Mind
Phases of Drug DiscoveryPhases of Drug Discovery
Enabling Science & Technology Emerging Technologies
Predictive ADME/Tox, Safety Assessment Front-loading Risk
Therapeutic Input Clinical Insights
Stringent Criteria
Clinical CandidateTarget Lead
Target Identification
HitIdentification
Lead & ProbeIdentification
Concept Testing Launch
Develop-ment for launch
Lead/Probe optimization
File for:eINDIND
Receive:NDA
Preclinical Drug Discovery ParadigmPreclinical Drug Discovery Paradigm
TargetsTargets LeadsLeadsIdenti-fication
Vali-dation
Design
MakeTest
GenomeProteomeDisease AreaBiol. Effects
Hit IDOngoing TV Fast-followersIntellectual Property coverageCandidate DrugsCandidate Drugs
BioinformaticsBioinformatics CheminformaticsCheminformatics
Medical InformaticsMedical Informatics
Target Identification
Hit Identif.
Lead Identif.
Lead optim.
Clinical Candidate
ProductionIdentification
Human genetics
Mouse genetics
Target Identification in Preclinical DiscoveryTarget Identification in Preclinical Discovery
ValidationThe key in target identification is mass production of pure protein for structural studies
What is Bioinformatics?What is Bioinformatics?• Historically, bioinformatics was related to sequence
analysis for genes & proteins• It tried to find patterns in multiple seq., and understand how
sequence information relates to genes, proteins, cellular substructures (organelles), cells and tissues, as well as whole (micro)organisms.
• Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline.
• Its ultimate goal is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned
Human Genome MapHuman Genome Map
Use this website to search the human genome. No prior experience required.
http://www.ncbi.nlm.nih.gov/mapview/map_search.cgi
Query: Query: Color BlindnessColor Blindness
Two hits on chromosomes 7 and X (opsin)
Continuing the searchContinuing the search……• We searched GenBank for ‘human opsin’
and found >150 hits. We selected NT_025965 (Homo sapiens chromosome X)• We then searched for ‘opsin 1’ in the hitlist, and
found a mRNA, that was in turn translated into the following protein sequence:
• MAQQWSLQRLAGRHPQDSYEDSTQSSIFTYTNSNSTRGPFEGPNYHIAPRWVYHLTSVWMIFVVTASVFTNGLVLAATMKFKKLRHPLNWILVNLAVADLAETVIASTISIVNQVSGYFVLGHPMCVLEGYTVSLCGITGLWSLAIISWERWLVVCKPFGNVRFDAKLAIVGIAFSWIWSAVWTAPPIFGWSRYWPHGLKTSCGPDVFSGSSYPGVQSYMIVLMVTCCIIPLAIIMLCYLQVWLAIRAVAKQQKESESTQKAEKEVTRMVVVMIFAYCVCWGPYTFFACFAAANPGYAFHPLMAALPAYFAKSATIYNPVIYVFMNRQFRNCILQLFGKKVDDGSELSSASKTEVSSVSSVS
• This sequence was submitted to BLAST
BLAST / PDB ResultsBLAST / PDB Results• Our query was limited to PDB (Protein DataBank)• Putative conserved domain (7 TM) has been detected:
• Five seq. (3 proteins from the opsin family, 2 fragments) were identified:
1F881F88: X: X--ray structure of ray structure of bovine bovine rhodopsinrhodopsin
Front view (left) and trans-membrane view (right) of the prototype for 7-TM protein models for GPCRs (G-protein coupled receptors)
e.c.
i.c.
t.m.
GPCRs are targets to ~35% of the known drugs
Statistical distribution of Statistical distribution of AffymetrixAffymetrix U133U133--A A gene chip probes produced by gene chip probes produced by BPQsBPQs after after
18 hr incubation with MCF18 hr incubation with MCF--10A cells. 10A cells.
• This analysis provided Scott Burchiel with a small subset of genes that could be analyzed in more detail. Experiments indicated that 3,6-BPQ induces dioxin-response elements more than 1,6-BPQ.
3,6-BPQ1,6-BPQ
S.W. Burchiel et al., Toxicol. Applied Pharmacol 2007, 221:203-214
Top Genes Expressed after BPQ ExposureTop Genes Expressed after BPQ Exposure
The University of New MexicoDivision of BIOCOMPUTING
Browse Pathways for free at Biocarta.com
172816q22.1 NQO1 NAD(P)H dehydrogenase, quinone 1210519_s_at
864410p15-p14 AKR1C3
Aldo-keto reductase family 1, member C3 (3-alpha hydroxysteroid dehydrogenase, type II)209160_at
15452p21 CYP1B1 cytochrome P450, family 1, subfamily B, polypeptide 1202436_s_at
164610p15-p14 AKR1C2
Aldo-keto reductase family 1, member C2 (dihydrodioldehydrogenase 2; bile acid binding protein; 3-alpha hydroxysteroid dehydrogenase, type III)
211653_x_at
164610p15-p14 AKR1C2
Aldo-keto reductase family 1, member C2 (dihydrodioldehydrogenase 2; bile acid binding protein; 3-alpha hydroxysteroid dehydrogenase, type III)
209699_x_at
164510p15-p14 AKR1C1
Aldo-keto reductase family 1, member C1 (dihydrodioldehydrogenase 1; 20-alpha (3-alpha)-hydroxysteroid dehydrogenase)
216594_x_at
164510p15-p14 AKR1C1
Aldo-keto reductase family 1, member C1 (dihydrodioldehydrogenase 1; 20-alpha (3-alpha)-hydroxysteroid dehydrogenase)
204151_x_at
LocusLink
ChromosomeGeneDescriptionProbe
S.W. Burchiel et al., Toxicol. Applied Pharmacol 2007, 221:203-214
TryptophanTryptophan Metabolism (KEGG)Metabolism (KEGG)
http://www.biocarta.com/pathfiles/tryptophanPathway.asp
CYP1B1
CYP1B1 CYP1B1 Homology Homology
ModelModelBuilt from X-ray structure of CYP2B4
Partially optimized with 3,6-BPQ
O
O
Benzo[a]pyrene-3,6-dioneCAS # 3067-14-9
The University of New MexicoDivision of BIOCOMPUTING
High Throughput Synthesis and Screening
Synthetic Compounds Natural Products
Target Identification
Hit Identif.
Lead Identif.
Lead optim.
Clinical Candidate
Modern Technologies in Preclinical DiscoveryModern Technologies in Preclinical Discovery
Compound managementCompound management
Handles both liquids & solidsRobots designed for fast reformatting & subsetting (“cherry picking”)Cheminformatics-based selection for diversity
From single compound synthesis to From single compound synthesis to combinatorial chemistrycombinatorial chemistry
Courtesy of Thomas Kühler
Compound storage is fully automated
What is What is CheminformaticsCheminformatics??• Molecular models are widely-used in biosciences ranging
from analytical chemistry & biochemistry to immunology & toxicology
• Cheminformatics integrates data via computer-assisted manipulation of chemical structures
• Chemical inventory & compound registration are vital to cheminformatics, but it is their combination with other theoretical tools, linked to physical (organic) chemistry, toxicity, etc. that brings unique capabilities in the area of bioactivity discovery.
• Other names used by different people that relate to cheminformatics and computational chemistry: computer-aided drug design; quantum dynamics; biopolymer modeling; molecular / chemical diversity; virtual screening; etc...
1 1 60 0 01 2 60 0 01 3 10 1 91 4 5 2 2. . . . . .
Understanding ModelingUnderstanding Modeling
RGB format550 Kb
JPG format25 Kb
•• Data Compression rarely Data Compression rarely leads to understandingleads to understanding
•• Complexity reduction (e.g., Complexity reduction (e.g., modeling) leads to modeling) leads to interpretationinterpretation
Chemical system Biological system
Commercial candidates(pharmaceuticals, flavors, additives, etc.)
Experimental knowledge
Ki Metabolism Toxicity...pH Stability Soly pKa ...
Modified from G. Cruciani
StructureStructure--Property CorrelationsProperty Correlations
++
Modified from G. Cruciani
3D3D--DescriptorsDescriptors
++
Modified from G. Cruciani
3D3D--DescriptorsDescriptors
Etot = Eel + ELJ + Ehb + Eent
Modified from G. Cruciani
3D3D--DescriptorsDescriptors
Modified from G. Cruciani
3D3D--DescriptorsDescriptors
Modified from G. Cruciani
3D3D--DescriptorsDescriptors
StructureStructure--Based Drug DesignBased Drug Design
M von Itzstein et al., Nature, 363:418 - 423, 1993
O=C1N(C)C(=O)N(C)C(=C12)N=CN2C
CN1C(=O)N(C)C(=O)c(c12)n(C)cn2
Cover Art for “Chemoinformatics in Drug Discovery”, Wiley VCH 2005
The Many Facesof Caffeine
Combinatorial & Medicinal Chemistry
ToxicogenomicsToxicogenomics
Target Identif
Hit Identif.
Lead Identif.
Lead optim.
Clinical Candidate
N
R2R1
R3" R4"
Modern Technologies in Preclinical DiscoveryModern Technologies in Preclinical Discovery
GPR30 GPR30 –– A Novel Estrogen TargetA Novel Estrogen Target• Prossnitz et al @ UNM
identified a fully functional intracellular GPCR (bound to Endo-plasmic Reticulum)
• Binds Estradiol• Tamoxifen is an
agonist – may explain cancer relapses to Tamoxifen therapy
17βE2-Alexa 546
Revankar CM et al., Science 2005, 307:1625
NH
OS
ONH
NCH3
CH3CH3
CH3S
OOH
O
CH3 CH3Cl O
OH
ClCl
SOH
OO
CH3OH
HH
HNH
O
OHThe University of New MexicoSCHOOL OF MEDICINE
The University of New MexicoSCHOOL OF MEDICINE
40% 2D Similarity(MDL, Daylight)40% ROCS(3D Similarity fromOpenEye)20% ALMOND(pharmacophorefingerprint similarityfrom Molecular Discovery)Query: Estradiol
Applied to 10,000molecules; of these,100 were tested.
One got lucky…
Bologa C & Revankar C et al., Nature Chem. Biol. 2006, 2: 207-212
Discovery of a Potent GPR30 AgonistDiscovery of a Potent GPR30 Agonist
Computational Chemistry - (Q)SAR
N
O N
N
N
NCl
O
NO
N
N
N
N
Combinatorial Chemistry
N
ON
O
ON
O
N
N
N
N NO
O
N O
O
O
NON
Chemical Space Navigation
N
N
N
N
N
S
O
N N
N
O
N O
ON
OO
NO
N
NNChemicalInformation
BiologicalInformation
StructuralInformation
++
High Throughput High Throughput CheminformaticsCheminformatics
T.I. Oprea, Curr Opin Chem Biol, 2002, 6, 384-389
Needles in the HaystackNeedles in the Haystack……
Mostly Haystack,Mostly Haystack,But Some NeedlesBut Some Needles
actives per target
T1 T5 T9 T13 T17 T21 T25 T29 T33 T37 T41 T45 T49 T53
Density of Actives in Density of Actives in HTS SpaceHTS Space
actives per target
T1 T5 T9 T13 T17 T21 T25 T29 T33 T37 T41 T45 T49 T53
MW = 817
ClogP = 5.7
PSA = 269
Lipinski score = 3
2 x -N=N-
2 x -SO3H
Spotting Frequent HittersSpotting Frequent Hitters
Transgenic animalsClinically relevant disease modelsStudy metabolism & toxicology in“human”-ized conditions
Structure-based Drug DesignZanamivir (inhaled; Relenza®)first anti-influenza drug (1999)Oseltamivir (orally available prodrug, Tamiflu ®) launched in 2001
Target Identif
Hit Identif.
Lead Identif.
Lead optim.
Clinical Candidate
O
NH
NH2
OO
O
Modern Technologies in Preclinical DiscoveryModern Technologies in Preclinical Discovery
Neuraminidase Inhibitors for InfluenzaNeuraminidase Inhibitors for Influenza• X-ray structure guided rational design
• GRID-suggested replacing -OH with basic functionality• Physical properties not amenable for oral delivery
• GSK markets this as Relenza, • First drug for influenza
OCO2H
OH
OH
OH
NH
OH
O
IC50 = 8600nM IC50 = 5nM
Zanamivir
OCO2H
OH
OH
OH
NH
NH
O
NH2
NH
Lead molecule
Slide modified from Andy Davis (AstraZeneca R&D Charnwood)
M von Itzstein et al., Nature, 363:418 - 423, 1993
J. Am. Chem. Soc. 681, 119, 1997 J. Med. Chem. 2451, 41, 1998
IC50 = 1nM
CO2H
OH
NH
NH2
O
O
O
H
H
O
OH
IC50 = 150nM
Glu 276
Gilead Neuraminidase InhibitorsGilead Neuraminidase Inhibitors
• Zwitterion not amenable for oral delivery• Ethyl ester (oseltamivir) good oral absorption, duration• Marketed as Tamiflu, first oral drug for influenza
O
CO2HNH
NH2
O
Slide modified from Andy Davis (AstraZeneca R&D Charnwood)
……RelenzaRelenza vs Tamifluvs Tamiflu• Both are potent neuraminidase inhibitors• Relenza: Zanamivir is delivered via Diskhaler®
• Tamiflu simple tablet formulation• Deesterified in plasma, with long plasma T½
• Tamiflu (marketed by Roche)• took 65% U.S. market-share from Relenza in 7 weeks
• Q1/Q2 2002 sales Relenza vs Tamiflu• Relenza market share fallen to 10%• GSK quoted reason “Slowness of the US to adopt
inhalation therapies”
Slide modified from Andy Davis (AstraZeneca R&D Charnwood)
Inhalation to Overcome Low BioavailabilityInhalation to Overcome Low Bioavailability
LigandLigand--Based Virtual ScreeningBased Virtual Screening
Binding mode of two agonists (red) and two antagonists (blue)B. Edwards et al., Mol Pharmacol 2005 68:1301-1310
TargetTarget--Based Virtual ScreeningBased Virtual Screening
• Started with NMR-solved bound of known peptide inhibitor; applied to existing X-ray structure (docking)
• Used 2D/3D searches in CDL library, using peptide infoWork by Richard Larson, Cristian Bologa, Tudor Oprea (UNM) The University of New Mexico
OFFICE OF BIOCOMPUTING
Macromolecules Genes Chemical Diversity
High Throuput Screening
Identify “actives” Optimize
(Q)SAR
Animal Testing
Pharmacokinetics, pharmacodynamics, toxicology
Clinical Testing
High Throughput Drug DiscoveryHigh Throughput Drug Discovery
Genomics CombiChem
HTSHTS
Informatics in Drug DiscoveryInformatics in Drug DiscoveryMedicalInformatics
MedicalInformatics
Disease
Gene(s) Target(s)
ScreeningPathwayAnalysis
Lead(s)
ADME/ToxOptimization
ClinicalTrials
OptimalDelivery
OptimalDose
ImprovedTherapy
Patient(s)
MolecularAssemblies
CandidateDrug(s)
CheminformaticsBioinformatics
The University of New MexicoDivision of BIOCOMPUTING
Södertälje
Lund
Mölndal
Alderley
Reims
Charnwood
Wilmington
Boston
Montreal
•• For example, AstraZeneca For example, AstraZeneca preclinical discovery:preclinical discovery:
•• 4500 people working in 11 sites 4500 people working in 11 sites across 4 continentsacross 4 continents
•• Truly global organizationTruly global organization
•• Emphasis on local, existing Emphasis on local, existing strengths, difficult to restrengths, difficult to re--balancebalance
•• Internal and external Internal and external collaborations need to be collaborations need to be maximizedmaximized
Bangalore, India
Brisbane, Australia
Research Sites are Often Worlds ApartResearch Sites are Often Worlds Apart
• Large pharma houses cover mostly major disease areas– Gastrointestinal (Gastro Esophageal Reflux / Irritable Bowel
Syndrome)– Cancer (Solid tumours)– Respiratory (Asthma/Chronic Obstructive Pulmonary Disease)– Inflammation (Autoimmune)– Cardiovascular (Thrombosis)– Infection (Genomics targets)– CNS (Neurology) – Pain (Analgesia)
• Focused on civilized (first) world health problems• Not here to cure mankind, but to make a profit• If projected sales below $1.3 billion, will not launch• Often need to bow to market pressure• “Mee too” has been the most effective strategy so far.
The Task: Drug HuntingThe Task: Drug Hunting
The University of New MexicoSCHOOL OF MEDICINE
Traditional Drug DiscoveryTraditional Drug Discovery
Pote
ncy
ADME
Pote
ncy
ADME
T. I. Oprea. Molecules, 7:51-62, 2002
The University of New MexicoSCHOOL OF MEDICINE
The Traditional Approach: AnalysisThe Traditional Approach: Analysis• Historically, medicinal chemistry efforts started
from a lead structure with (typically) poor ADME properties and micromolar affinity.
• Initial synthetic efforts were aimed at improving the binding affinity.
• Once medium- to low-nanomolar affinity was achieved, ADME properties were optimized.
• A similar strategy is being pursued in virtual screening, whereby ADME filters are applied post-docking, in an effort to trim down the large number of “virtual hits
T. I. Oprea. Molecules, 7:51-62, 2002
The University of New MexicoSCHOOL OF MEDICINE
The Problem with the Traditional The Problem with the Traditional ApproachApproach
• One has to preserve the molecular determinants responsible for potency, while modifying the structure in order to achieve good ADME properties.
• This often results in reduced binding affinity, and the process may require several iterations before optimal structures are found
• To a large extent, in vivo screening for good ADME properties can be introduced in order to avoid radical drops in affinity. This strategy is currently used in many pharmaceutical companies.
• However, the in vivo ADME screening strategy cannot cope with increasingly large numbers of compounds.
T. I. Oprea. Molecules, 7:51-62, 2002
The University of New MexicoSCHOOL OF MEDICINE
Pote
ncy
ADME
Pote
ncy
ADME
A Solution From the LiteratureA Solution From the Literature
Apply ADMEfilters duringpost-HTS analysis orvirtual screening:PSA,LogD74, Solubility,Rule of Fiveetc.
T. I. Oprea. Molecules, 7:51-62, 2002
The University of New MexicoSCHOOL OF MEDICINE
The Literature Approach: AnalysisThe Literature Approach: Analysis• This approach yields good ADME properties without
considering the receptor affinity. • It is based on the expectation that, out of thousands (or
more) compounds, some will display good affinity.• This approach is a more advanced implementation of the
“computational alert” procedure of Chris Lipinski. • One has to preserve the molecular determinants
responsible for DMPK properties, while modifying the structure in order to achieve good potency.
• This may often result in reduced ADME properties, but the process is less time-consuming, before optimal structures are found.
T. I. Oprea. Molecules, 7:51-62, 2002
The University of New MexicoSCHOOL OF MEDICINE
The Problem with the Literature Approach:The Problem with the Literature Approach:
• To a large extent, this strategy rules out potential problems with passive permeability, but is unlikely to capture activemechanisms unless more advanced filters are introduced. This approach is very popular in several pharmaceutical companies.
• However, the ADME-filtering strategy cannot cope with the situation where NO HITS are found, because it decreases the probability of detecting hits outside the predefined ADME properties space.
• There is a clear need to design an approach that would allow the simultaneous optimization of both receptor binding affinity and ADME properties.
T. I. Oprea. Molecules, 7:51-62, 2002
The University of New MexicoSCHOOL OF MEDICINE
The The ““ConvergentConvergent”” ApproachApproach
Pote
ncy
ADME
Pote
ncy
ADME
Ignore
Investigatein moredetail
T. I. Oprea. Molecules, 7:51-62, 2002
The University of New MexicoSCHOOL OF MEDICINE
The The ““ConvergentConvergent”” Approach: AnalysisApproach: Analysis• This strategy requires simultaneous optimisation of both
binding affinity and ADME properties. An integrated software framework to address this problem is required.
• The real difficulty consists in addressing relevant properties in a consistent manner. For example: PSA, hydrophobicity and H-bonds are important in both binding and permeability - but the question is, in what proportion are they contributing in each case?
• In the “convergent” strategy, one would simultaneously monitor changes that would influence binding affinity and ADME properties. This may result in progressive increments in affinity and ADME properties, making this process less time-consuming.
T. I. Oprea. Molecules, 7:51-62, 2002
The University of New MexicoSCHOOL OF MEDICINE
The Problem with the The Problem with the ““ConvergentConvergent”” Approach:Approach:
• An integrated software framework that allows interactive user-input should allow a rapid convergence towards interesting compounds.
• This strategy relies on the accuracy of the experimental data used to derive the QSAR models to predict affinity and ADME (the old problem… how good is your model?).
• However, this framework cannot substitute creative thinking, serendipity and good research!
• This approach is limited by the unexpected (novel targets, novel chemistry) and by inherent errors (20-30%).
T. I. Oprea. Molecules, 7:51-62, 2002
The University of New MexicoSCHOOL OF MEDICINE
Reliability of Biological Data (1)Reliability of Biological Data (1)
Actual pICActual pIC5050
Pre
dict
ed p
ICP
redi
cted
pIC
5050
CoMFA Model (External prediction)3210-1-2
-2
-1
0
1
2
3
HIV-1 ProteaseInhibitorsN = 36Q2 = 0.814 KM = 0.064 mM
Oprea et al., in Ghose & Viswhanandan, Dekker 2001, 233-266
The University of New MexicoSCHOOL OF MEDICINE
Reliability of Biological Data (2)Reliability of Biological Data (2)
Actual pICActual pIC5050
Pre
dict
ed p
ICP
redi
cted
pIC
5050
CoMFA Model (External prediction)
HIV-1 ProteaseInhibitorsN = 36Q2 = 0.487 KM = 2.0 mM
3210-1-1
0
1
2
3
Oprea et al., in Ghose & Viswhanandan, Dekker 2001, 233-266
The University of New MexicoSCHOOL OF MEDICINE
Reliability of Biological Data (3)Reliability of Biological Data (3)
%HIA Model (Training set)
Human IntestinalAbsorption:1: Sulfasalazine(%HIA is 12, not 65)2: Sulfapyridine(%HIA is 93)3. 5-aminosalicylicBacterial azo bond reduction occurs in the intestine
N
NH
S NN OH
O
O
COOH
N
NH
S NO
O
N OH
COOH
1
2 3
+
Oprea & Gottfries, J. Mol. Graphics Mod. 1999, 17, 261-274
The University of New MexicoSCHOOL OF MEDICINE
Reliability of Biological Data (4)Reliability of Biological Data (4)Oral DrugAbsorptionN = 16Q2 = 0.44 (%HIA) Q2 = 0.71 (Caco)
This sigmoidalrelationshipis frequentlydescribed inthe literature
-7
-6
-5
-4
-3
-2
-1
0
1
2
3
-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5
u[1]
t[1]
Acetyls
Alpreno
Atenolo
Cortico
DexametFelodip
Hydroco
Mannito
Metopro
Olsalaz
Practol
Propran
Salicyl
Sulfasa
Testost
Warfari
Azo bond reductionin the large intestine
Paracellular Transport
%HIA Model (Training set)
…Any model left?!
Oprea & Gottfries, J. Mol. Graphics Mod. 1999, 17, 261-274
The University of New MexicoSCHOOL OF MEDICINE
Reliability of Biological Data (Reliability of Biological Data (55))
There is poor agreement in terms of clearance data - over 42% of the compounds differ more than 30%
71.3
42.6
13.9
3.0
01020304050607080
% o
f com
pare
d co
mpo
unds
>10 % >30 % >100 % >1000 %difference between sources
Sources:Goodman andGilman 1996 vs. Avery’s 1997 Data:Clearance (202 drugs)
The University of New MexicoSCHOOL OF MEDICINE
Reference Published Structure Corrected Structure Comment JMC 37-476 chart 1
N
O
O
O NO O
O
rolipram: incorrect N atom position
JMC 43-2217 chart 1
N
N
O
N
N
O
A-85380: incorrect ring size
-||- & JMC 36-2645
NN
O
O
O N
N
O
tropisetron: methyl group in plus
-||-
N
N N
O
N O O N
O
N
ON
O
DAU-6285: missing methoxy; N instead O
JMC 37-758 chart 1 N
N
O
O
N
OH
N3
N
N
O
O
N
O
N3
Ro-15-4513: methyl group missing
JMC 37-787 figure 1
N
S
O
SO
O
OO
NS
O
S
epalrestat: E/Z config: E instead Z
Reliability of Chemical Data (1)Reliability of Chemical Data (1)
The University of New MexicoSCHOOL OF MEDICINE
NH
N
OH
O
NH
O
NH2
O NH2
O
ONH
O
"Carisoprodol"Merck Index 13th ed #1854
Carisoprodol correct structure
Disclaimer:The above error have been corrected in Merck Index 14th edition. In general, the Merck Index is a reliable source of information.
Reliability of Chemical Data (2)Reliability of Chemical Data (2)
• Chirality: What chemists can interpret, computers are not always able (the “above/below the plane” must be strictly enforced)
Not machine-readable Machine-readable
• Missing/altered atoms/substituents – overall error rate above 9%– Incorrectly drawn or written structures (3.4%); incorrect molecular
formula or molecular weight (3.4%);– Unspecified binding position for substituents or ambiguous
numbering scheme for the heterocyclic backbone (0.91%);– Structures with the incorrect backbone (0.71%);– Incorrect generic names or chemical names (0.24%);– Incorrect biological activity (0.34%);– Incorrect references (0.2%).
N
NRO
N
NH2
N
N
N
OH OH
NH2
N
N
NNO
OH OH
R
Reliability of Chemical Data (3)Reliability of Chemical Data (3)
Reliability of Structural DataReliability of Structural Data
• Photoactive yellow protein from E. Halophila– First structure 1PHY.. Wrong– Subsequently corrected at higher resolution
1PHY 1989 2.4Å2PHY 1995 1.4Å
The University of New MexicoSCHOOL OF MEDICINE
A Davis, S Teague G Kleywegt Angew. Chem. Int. Ed. 2003
Reliability of Structural Data (2)Reliability of Structural Data (2)“Where there is no chicken wire, there’s no electrons..atoms”
• 1FQH now withdrawn from PDB
The University of New MexicoSCHOOL OF MEDICINE
A Davis, S Teague G Kleywegt Angew. Chem. Int. Ed. 2003
Reliability of Structural Data (3)Reliability of Structural Data (3)
• NH2 & O can’t be distinguished from density as isoelectronic • PDBREPORT suggest 15% in Protein databank likely incorrect
• N/C cannot normally be distinguished from density
NH
ONH
OOH
N
e.g. glutamine, asparagine
e.g. histidine
NH2
O
O
NH2
NH
NN NH
The University of New MexicoSCHOOL OF MEDICINE
Where do I work?Where do I work?
The Importance of Accurate InformationThe Importance of Accurate Information• With one source, we have information• With two sources, we can have confirmation or confusion• With several sources, knowledge emerges• Accurate Information is important – hence we tend to “trust”
certain newspapers, TV stations or scientific journals (the peer-review system regulates that).
• If something is really important to you (*) then consult multiple sources and verify that your assumptions are correct
• (*) e.g., who is on my PhD committee; what’s my girlfriend’s birthday; what are the side-effects for the medicine my parents are taking; is this synthesis route correct? etc.
The University of New MexicoSCHOOL OF MEDICINE
The Attrition Rate in Drug DiscoveryThe Attrition Rate in Drug DiscoveryHTS
HTS Hits
HTS Actives
Lead Series
Drug Candidates
Drug
1,000,000
2,000
1,200
50-200
1
0.1
Incr
ease
d k
nowl
edge
and
val
ue
Incr
ease
d ris
k of
failu
re
Incr
ease
d e
xper
imen
tal e
rror r
ate
The University of New MexicoSCHOOL OF MEDICINE
The Optimization Response in Drug DiscoveryThe Optimization Response in Drug Discovery
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Dru
ggab
ility
Multiple Response Surface
0
0.2
0.4
0.6
“Fuji-Yama” Landscape
Chemical Space
“Leads”
“Drugs”
“Hits”
The University of New MexicoSCHOOL OF MEDICINE
Considering a Career in Molecular SciencesConsidering a Career in Molecular Sciences• Science is not a democracy: just because everyone believes
something does not make it true• Choose topics that open new possibilities, not those that (may)
lead to dead-ends; don’t get stuck with a single technology • Always go to the source (original publication), as information gets
to be sometimes selectively presented
• Make sure you understand the basics – try to explain what you do to a 5-yr old. If you manage, you grasped the concepts
• Stay away from “fashionable” science: just because it might get you funded, it does not mean it’s science
• Make sure to give credit where credit is due…• …but do not be afraid to claim what’s yours (protect your ideas)
The University of New MexicoSCHOOL OF MEDICINE
Some Take Home MessagesSome Take Home Messages• Nothing is what it seems: verify what you see, doubt what you find, and
always get independent confirmation of your observations. For this reason, once you are sure about your findings you can be ready to defend your results (e.g., GPR30 still very contested by others)
• Don’t be afraid to say I DO NOT KNOW, omniscient beings are not of this world (think Buddha, Jesus, and other enlightened beings)
• Although you do not know, be ready to learn• Focus on problem-solving skills, they are more important than static
learning & memory• Always find ways to reward creativity and out-of-the box thinking• People are 100x more important than equipment – as you progress in
your career, you will find that people are the most important asset• If someone steals your ideas (this happends all the time) remember that
this is a subtle form of flattery (so is envy). Focus on generating new ideas, and do not turn the stolen-idea-situation into an obsession (this will block your creativity)
• Learn where your fear comes from: deal with it inside, and do not take it on other people. Fear leads to anger, anger leads to violence
• Express yourself freely and creatively. The Universe is a friendly place.
The University of New MexicoSCHOOL OF MEDICINE
Garland MarshallGarland Marshall’’s Definition of Lucks Definition of Luck