Characterization of Pharmacophore Multiplet Fingerprints ... › shef2004 › talks ›...
Transcript of Characterization of Pharmacophore Multiplet Fingerprints ... › shef2004 › talks ›...
Characterization ofPharmacophore Multiplet Fingerprintsas Molecular Descriptors
Robert D. ClarkTripos, Inc.
©2004 Tripos, [email protected]
Outline• Background
o historyo mechanics
• Finding appropriate binning rangeso biased conformer generation
• Similarity measureso stochastic similarity
• Hypothesis generationo asymmetric similarity
• Conclusions
History of Pharmacophore MultipletsA.C. Good and I.D. Kuntz;
J. Comput.-Aided Mol. Design 1995, 9, 373-379.X. Chen, A. Rusinko, and S.S. Young;
J. Chem. Inf. Comput. Sci. 1998, 38, 1054-1062.J.S. Mason, I. Morize, P.R. Menard, D.L. Cheney, C. Hulme
& R.F. Labaudiniere; J. Med. Chem. 1999, 42, 3251-3264. M.J. McGregor & S.M. Muskal;
J. Chem. Inf. Comput. Sci. 1999, 39, 569-574. H. Matter and T. Pötter;
J. Chem. Inf. Comput. Sci. 1999, 39, 1211-1225. J.S. Mason and B.R. Beno;
J. Mol. Graphics Mod. 2000, 18, 438-451 E. Abrahamian, P.C. Fox, L. Nærum, I.T. Christensen,
H. Thøgersen & R.D. Clark;J. Chem. Inf. Comput. Sci. 2003, 43, 458-468.
Novo Nordisk / Tripos Tuplets Collaboration• 2 year collaboration to develop and extend existing
SYBYL triplet (PDT) technology
• Incorporate pair, triplet and quartet (‘Tuplet) technology
• Augmented ‘Tuplets and support for privileged substructures
• Conformers generated on-the-fly or retrieved
• Bitmaps created, stored and manipulated in compressed format
o four 1.8 x 109 bit bitmaps stored as ~80kb fileo 0.01-0.5 seconds/molecule
Type III antiarrhythmic: UK 66914
acceptoratoms
donor/acceptor atoms
donor atom
hydrophobiccenter
hydrophobic center
positive nitrogen
Multiplet Fingerprints
… 000010001010000000100100001110100001110000111000000000011001...
Indexing Triplets
D
A H
2 3
5
Bin: 5, 3, 2Triplet: H-A-D
Vertex joining longest
and shortest edges
Indexing TetrahedraProblems:• Need a unique mapping • Must deal with chirality• Literally dozens of possible permutations• Mapping must be based on bins and features
Plane of symmetryimplies no chirality
CA
CD
4
4
3
2
2AC
C
D4
4
32
2
BA
C
D4
4
3
2
2Chiral
tetrahedra
C
D 4
4
32
2
AB
Mapping Quartet Bits
...542333* 666666
...DDDD DDDA DDDH HHHH
Mapping for 7 bins and 3 features (D, A, H)
666665
Bitmap Size = 76 * 34 = 9,529,569 bits
...000001000000
*542333 specifies the + enantiomer;245333 specifies the - enantiomer +-
Dis
trib
utio
n of
Dis
tanc
es
Bet
wee
n Fe
atur
es
050
100150200250300
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0
50
100
150
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
betablockers
K+ channelopeners
Type I anti-arrythmics
0
50
100
150
200
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
edge length (Å)
freq
uenc
yfr
eque
ncy
freq
uenc
y
Cum
ulat
ive
Dis
trib
utio
nsac
ross
Cla
sses
100 Conformer By Class
0
10000
20000
30000
40000
50000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Estrogen AntagonistsType III AntiarrythmicsBenzamidesPhenothiazinesBeta BlockersType I AntiarrythmicsK Channel Openers
1 Conformer By Class
0200400600800
10001200140016001800
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Estrogen AntagonistsType III AntiarrythmicsBenzamidesPhenothiazinesBeta BlockersType I AntiarrythmicsK Channel Openers
edge length (Å)
freq
uenc
yfr
eque
ncy
Effe
ct o
f Bia
sed
Con
form
er G
ener
atio
n
100 Confort Conformer By Class
0
10000
20000
30000
40000
50000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Estrogen AntagonistsType III AntiarrythmicsBenzamidesPhenothiazinesBeta BlockersType I AntiarrythmicsK Channel Openers
100 Systematic Search Conformers By Class
02000400060008000
10000120001400016000
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Estrogen AntagonistsType III AntiarrythmicsBenzamidesPhenothiazinesBeta BlockersType I AntiarrythmicsK Channel Openers
edge length (Å)
freq
uenc
yfr
eque
ncy
Hypothesis Fingerprint CreationDDD000
DDD001
DDA200
DAA210
DDH210
DAH331
DHH333
HHH433
0 1 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 1 0 0 1 0
Binary CompoundFingerprints
Hypothesis Fingerprint CreationDDD000
DDD001
DDA200
DAA210
DDH210
DAH331
DHH333
HHH433
0 1 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 3 1 3 0 2 4 1
Binary CompoundFingerprints
Vector Sum Fingerprint
Hypothesis Fingerprint CreationDDD111
DDD211
DDA311
DAA321
DDH321
DAH442
DHH444
HHH544
0 1 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 3 1 3 0 2 4 1 3 3 3 3 4 4 5 6 3 4 5 6 6 10 12 13 9 36 15 54 24 80 240 78
Binary CompoundFingerprints
Vector Sum Fingerprint
Feature Weights
Bin WeightsBit Score
Weighting Bitsfor Hypothesis Generation
Sb is the score for the bitfb is the frequency of the bitfwi is the weight of the feature typedwj is the weight of the distance bin
∑∑==
××=nd
jj
nf
iibb dwfwfS
11f1
f3f2
d2
d3
d1
⇒Construct an hypothesis from the highest scoring bits.
Hypothesis Fingerprint CreationDDD111
DDD211
DDA311
DAA321
DDH321
DAH442
DHH444
HHH544
0 1 0 0 0 1 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 1 1 0 1 1 1 0 0 1 0 1 3 1 3 0 2 4 1 3 3 3 3 4 4 5 6 3 4 5 6 6 10 12 13 9 36 15 54 24 80 240 78
Binary CompoundFingerprints
Vector Sum Fingerprint
Feature Weights
Bin WeightsBit Score
t
tn
NNS =Sa
nity
Che
cker
Similarity Measures• Tanimoto coefficient
• Cosine coefficient
• Stochastic cosine coefficient
)()()()(
),(BpdtApdtBpdtApdt
BAt∪∩
=
[ ][ ] [ ])()()()(
)()(),(
**
*
BpdtBpdtEApdtApdtE
BpdtApdtEBAs
∩×∩
∩=
)()(
)()(),(
bpdtapdtbpdtapdt
baCc×
∩=
Effect of Conformer Counton Stochastic Cosine Similarity
0
0.1
0.2
0.3
0.4
0.5
0.6
0 100 200 300 400 500 600 700 800 900 1000
Estrogen_Antagonist ClassSimilarity
Estrogen_Antagonist Non-ClassSimilarity
K_openers Class Similarity
K_openers Non-Class Similarity
benzamides Class Similarity
benzamides Non-Class Similarity
conformer count (max)
sim
ilarit
y
Effect of Conformer Counton Stochastic Cosine Discrimination
conformer count (max)
0.0000
2.0000
4.0000
6.00008.0000
10.0000
12.0000
14.0000
1 10 100 1000
I_AntiarrythmicsIII_Antiarrythmics
Phenothiazines
beta BlockerBenzamides
K_openers
Estrogen_Antagonist
disc
rimin
atio
n ra
tio
Dis
crim
inat
ion
and
Sim
ilarit
y M
easu
re
0.0000
2.00004.0000
6.0000
8.0000
10.000012.0000
14.0000
1 10 100 1000
I_Antiarrythmics
III_Antiarrythmics
Phenothiazines
beta Blocker
Benzamides
K_openers
Estrogen_Antagonist
0.0000
5.0000
10.0000
15.0000
20.0000
1 10 100 1000
I_Antiarrythmics
III_Antiarrythmics
Phenothiazines
beta Blocker
Benzamides
K_openers
Estrogen_Antagonist
conformer count (max)
disc
rimin
atio
n ra
tiodi
scrim
inat
ion
ratio simple cosine
Tanimoto
Dis
crim
iant
ion
and
Con
form
er B
ias
0.0000
2.0000
4.0000
6.0000
8.0000
10.0000
12.0000
14.0000
1 10 100 1000
I_Antiarrythmics
III_Antiarrythmics
Phenothiazines
beta Blocker
Benzamides
K_openers
Estrogen_Antagonist
0.0000
2.0000
4.0000
6.0000
8.0000
10.0000
12.0000
14.0000
1 10 100 1000
I_Antiarrythmics
III_Antiarrythmics
Phenothiazines
beta Blocker
Benzamides
K_openers
Estrogen_Antagonist
conformer count (max)
disc
rimin
atio
n ra
tiodi
scrim
inat
ion
ratio CONFORT
systematic search
Symmetric Similarity Measures
• Symmetric stochastic cosine
• Asymmetric stochastic cosine
[ ][ ] [ ])()()()(
)()(),(
**
*
BpdtBpdtEApdtApdtE
BpdtApdtEBAs
∩×∩
∩=
[ ][ ]s h tE pdt h pdt t
E pdt h pdt h*( , )
( ) ( )
( ) *( )=
∩
∩
Effe
ct o
f Hyp
ooth
esis
Siz
e (T
ype
III a
ntia
rrhy
thm
ics)
bits in hypothesis
aver
age
sim
ilarit
yav
erag
e si
mila
rity
CONFORT
systematic search
00.10.20.30.40.50.6
0 200 400 600 800 1000
00.10.20.30.40.50.6
0 200 400 600 800 1000
within class
without class
within class
without class
100 Conformers
1000 Conformers
asymmetric stochastic cosine
symmetric cosine
Conclusions
• Compression is cool• Natural binning does make sense
o 1.75 3 4 5 6 7 8 8.75 9.75 10.75 11.75 13 15 >15Åo at least for triplets
• Systematic bias increases discriminationo rule-based conformational bias can be usefulo caveat: it may limit lead-hopping
• More is not necessarily bettero true in terms of conformation counto true in terms of multiplet hypothesis size
• A little asymmetry can be a good thing• Compression is still cool
www.tripos.com
AcknowledgementsNovo Nordisk A/S (Denmark)
Lars Nærum*
Henning Thøgersen*Tripos, Inc.
Edmond AbrahamianPeter FoxTrevor Heritage
May the multiplets be with you...
What a Protein “Sees”
(electrostatic field at 0.5 Å resolution, 80 and 30% contours)
What the Chemist Sees
NH3C
O
NO O
F
CF3
O
NH
NN
ClS
H3C
H3C O
O
H3C
O
tetrahydrophthalimide(American Cyanamide)
trifluorotoluidide pyrazole ether(Monsanto)
Pharmacophoric Features
NH3C
O
NO O
F
CF3
O
NH
NN
ClS
H3C
H3C O
O
H3C
Ohydrophobiccenters
hydrogen bondacceptors
hydrogen bonddonor
Conformational Sampling*
*diverse conformers obtained using CONFORT
Mapping Multiplets
...532000 666
...DDD DDA DDH HHH
Mapping for 7 bins and 3 features (D, A, H)*
665
Bitmap Size = 73 * 33 = 9261 bits
...001
* Features are handled in the order supplied by the application.
1 bit
Hypothesis GenerationMultiple methods implemented for hypothesis generation
o From a collection of known actives
o From a user defined UNITY® query
o From a single molecule pharmacophore map
a) Single or multiple generated conformers
o From user specified residues in receptor cavity
Privileged Substructures:Augmented Triplets
HY
HY
AA
@_AUGMENTED# name mnemonic xref weight min_dist max_distDONOR_SITE DS AA 3.0 2.5 3.5.=NULL.
DS
Effect of Conformer Counton Cosine Coefficient Similarity
0
0.1
0.2
0.3
0.4
0.5
0.6
0 100 200 300 400 500 600 700 800 900 1000
Estrogen_Antagonist ClassSimilarity
Estrogen_Antagonist Non-ClassSimilarity
K_openers Class Similarity
K_openers Non-Class Similarity
benzamides Class Similarity
benzamides Non-Class Similarity
conformer count (max)
sim
ilarit
y
0.0000
2.00004.0000
6.0000
8.0000
10.000012.0000
14.0000
1 10 100 1000
I_Antiarrythmics
III_Antiarrythmics
Phenothiazines
beta Blocker
Benzamides
K_openers
Estrogen_Antagonist
disc
rimin
atio
n ra
tio