David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'
-
Upload
cresset -
Category
Technology
-
view
1.692 -
download
3
Transcript of David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'
David Evans and George Papadatos
Lilly Research Centre, Erl Wood Manor, Windlesham, UK
22nd September 2011
• Discover new chemotypes
• Multiobjective space • Isosteres in activity
• Improvements in properties
• Want to use multiple tools in same environment
• But understand what works when
• Open Source Workflow tool – main client is free
• But support is available and can integrate commercial vendors + in-
house code as nodes
• Have released many Erl Wood nodes to KNIME community site
• http://tech.knime.org/community/erlwood
FieldAlign
Xedmin
Xedex
Xedmin •XED minimization
•2D -> 3D
Xedex • Conformational
analysis
FieldView •Launches FieldView
•View field points +
energies + other data
All nodes pass SDF
FieldAlign • Flexible alignment
of query molecules
onto template
Company Confidential Copyright © 2008 Eli Lilly and Company
Process is more than just the database search
WHY ?
Don’t want
to load all
databases
onto all
users’ PCs
Command-
line search
SOAP Web
Service
•Apache
Tomcat
node
Platform-independent communication
+ secure
intranet !
• Read in pre-built hypothesis
(MOE, Phase)
• Or sketch from template molecule
• Jmol based visualizer
• Can also annotate and filter hits,
aids manual inspection
Non-proprietary structure
Maximum Unbiased Validation (MUV) dataset
• 17 targets, total 30 ligands and 15000 decoys per target,
source: PubChem bioactivity data.
• Wide-ranging targets: hormone receptors, kinases, proteases,
GPCRs plus others (e.g. HSP90, HIV RT).
• Unbiased for chemical analogues as MUV ligands pre-
clustered with 2D fingerprint
•1.16 compounds per scaffold class
MUV: J. Chem. Inf. Model., 2009, 49 (2), 169-184.
How well do automated pharmacophore
methods do compared to 2D methods?
• Have looked at whole molecule similarity
• Is there more data if we find fragments which maintain activity?
• Matched Molecular pair analysis (MMP) • Fragments compounds and finds pairs where only one fragment differs
The mining and statistical analysis of transformations and their impact on properties of interest (e.g. solubility or activity)
left molecule right molecule transformation ΔSolubility (mgml)
-0.8
+1.2
+2.4
H F
Br OCH3
It used to be a slow and computationally expensive process...
• Pair-wise maximum common substructure extraction – O(N2)
Recently a much more efficient algorithm was published
* * >>
1) Cleave all acyclic single bonds, one by one:
2) Index all the fragments (cf. book index):
3) Enumerate the values for each key:
Hussain and Rea (2010). J. Chem. Inf. and Model., 50 (3), 339-348.
(*in an automated and unsupervised way)
Wagener and Lommerse (2006). J. Chem. Inf. and Model., 46 (2), 677-685.
Mol A >> Mol B
In: MolRegnos (IDs), structures (in RDKit format) and property values
Out: Matched pairs (left and right molecule, IDs, transformation, property values, ΔP, context, transformation atom count)
Available as an Erl Wood community contribution node
Find isosteres in chEMBL
chEMBL – Database of published medicinal chemistry activity data
– Using chEMBL_10 , total >1,000,000 compounds
Use here just human protein kinase inhibitors
Quality assurance for chEMBL data (SQL statement) • Med. chem. friendly compounds, parent structure, not downgraded,
confidence score = 9, exact IC50 or Ki values only (converted to pIC50/pKi) ~14K data points
• Compare biological values coming from the same assay ID only
Aggregate transformations; calculate and bin ΔpIC50s in 3 bins
• Good – Bad – Neutral(depending on a cut-off c = 0.4 log units)
• Each
transformation
has a neutral
count
• Absolute value
or percentage:
NeutralCount%
chEMBL workflow outputs isosteric fragments
How similar are
isosteres in 2D
fingerprint space?
In field space?
Could fields help us
find unexpected
isosteres?
• 1802 fragment pairs from chEMBL_10 kinase data set
• 481 with no rotatable bonds left or right
• Simplifies conformational analysis
• For each fragment pair
1. Swap attachment points for adamantyl
2. FieldAlign to get field similarity (Use adamantyl to
constrain overlay)
3. RDKit fingerprint similarity – topological Daylight-esque
4. Correct similarities for adamantyl
• Are there isosteric pairs with high field similarity but low RDKit
similarity?
Field
Sim
RDKit Sim
Size by Neutral
Count %
Larger more
isosteric
Pairs with high
field similarity
but low 2D
similarity
Pairs with high
field and 2D
similarity
Field
Sim
RDKit Sim
Size by Neutral
Count %
Only those with
>60% isosteric
examples
Thiophene -> Phenol
Field
Sim
RDKit Sim
Size by Neutral
Count %
Only those with
>60% isosteric
examples
Imidazole->
Morpholine?
Field
Sim
RDKit Sim
Size by Neutral
Count %
Only those with
>60% isosteric
examples
Some small
fragments
WEE1
kinase
PDB 2I06
Non-proprietary structure
(from PDB)
Solvent-
exposed
Buried
Field
Sim
RDKit Sim
Size by Neutral
Count %
Only those with
>60% isosteric
examples
Me-tetrazole ->
oxadiazole
Field
Sim
RDKit Sim
Size by Neutral
Count %
Only those with
>60% isosteric
examples
Thiophene ->
phenol
Non-proprietary structure
(from PDB)
• 6299 data points from thermodynamic solubility assay
• 423 single-point transformations
• 215 no-rotatable point transformations
• Aggregate transformations; calculate and bin ΔlogS in 3 bins
• Good – Bad – Neutral (c = 0.3 log units)
• Are there transformations which increase solubility with low
field similarity but high RDKit similarity?
Field
Sim
RDKit Sim
Size by Good
Count %
Only those with
>60% boosting
examples
Ring contraction
+ twist ?
Field
Sim
RDKit Sim
Size by Good
Count %
Only those with
>60% boosting
examples
Big boost from
morpholine
• Can mine chEMBL data for non-obvious isosteres
• Will other data sets find more?
• Would like to improve workflow to make isostere data set for
3D similarity comparison
• Improve fragmentation/conformer/ alignment handling?
• Need to include whole molecule?
• Need 3D binding site data as well to confirm isosterism?
• KNIME platform developing
• Virtual screening and evaluation environment
• Rapid experimentation with varied tools
• http://tech.knime.org/community/erlwood
George Papadatos
Juliette Pradon
Hina Patel
Nikolas Fechner
David Thorner
Michael Bodkin
KNIME, chEMBL + Cresset !
ROC curves for
retrieval of >66%
isosteric groups
Field similarity
performs better
than RDKit
But AUC = 0.68
Workflow not
optimized for
this purpose