David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'

David Evans and George Papadatos

Lilly Research Centre, Erl Wood Manor, Windlesham, UK

22nd September 2011

• Discover new chemotypes

• Multiobjective space • Isosteres in activity

• Improvements in properties

• Want to use multiple tools in same environment

• But understand what works when

• Open Source Workflow tool – main client is free

• But support is available and can integrate commercial vendors + in-

house code as nodes

• Have released many Erl Wood nodes to KNIME community site

• http://tech.knime.org/community/erlwood

FieldAlign

Xedmin

Xedex

Xedmin •XED minimization

•2D -> 3D

Xedex • Conformational

analysis

FieldView •Launches FieldView

•View field points +

energies + other data

All nodes pass SDF

FieldAlign • Flexible alignment

of query molecules

onto template

Company Confidential Copyright © 2008 Eli Lilly and Company

Process is more than just the database search

WHY ?

Don’t want

to load all

databases

onto all

users’ PCs

Command-

line search

SOAP Web

Service

•Apache

Tomcat

node

Platform-independent communication

+ secure

intranet !

• Read in pre-built hypothesis

(MOE, Phase)

• Or sketch from template molecule

• Jmol based visualizer

• Can also annotate and filter hits,

aids manual inspection

Non-proprietary structure

Maximum Unbiased Validation (MUV) dataset

• 17 targets, total 30 ligands and 15000 decoys per target,

source: PubChem bioactivity data.

• Wide-ranging targets: hormone receptors, kinases, proteases,

GPCRs plus others (e.g. HSP90, HIV RT).

• Unbiased for chemical analogues as MUV ligands pre-

clustered with 2D fingerprint

•1.16 compounds per scaffold class

MUV: J. Chem. Inf. Model., 2009, 49 (2), 169-184.

How well do automated pharmacophore

methods do compared to 2D methods?

• Have looked at whole molecule similarity

• Is there more data if we find fragments which maintain activity?

• Matched Molecular pair analysis (MMP) • Fragments compounds and finds pairs where only one fragment differs

The mining and statistical analysis of transformations and their impact on properties of interest (e.g. solubility or activity)

left molecule right molecule transformation ΔSolubility (mgml)

-0.8

+1.2

+2.4

H F

Br OCH3

It used to be a slow and computationally expensive process...

• Pair-wise maximum common substructure extraction – O(N2)

Recently a much more efficient algorithm was published

* * >>

1) Cleave all acyclic single bonds, one by one:

2) Index all the fragments (cf. book index):

3) Enumerate the values for each key:

Hussain and Rea (2010). J. Chem. Inf. and Model., 50 (3), 339-348.

(*in an automated and unsupervised way)

Wagener and Lommerse (2006). J. Chem. Inf. and Model., 46 (2), 677-685.

Mol A >> Mol B

In: MolRegnos (IDs), structures (in RDKit format) and property values

Out: Matched pairs (left and right molecule, IDs, transformation, property values, ΔP, context, transformation atom count)

Available as an Erl Wood community contribution node

Find isosteres in chEMBL

chEMBL – Database of published medicinal chemistry activity data

– Using chEMBL_10 , total >1,000,000 compounds

Use here just human protein kinase inhibitors

Quality assurance for chEMBL data (SQL statement) • Med. chem. friendly compounds, parent structure, not downgraded,

confidence score = 9, exact IC50 or Ki values only (converted to pIC50/pKi) ~14K data points

• Compare biological values coming from the same assay ID only

Aggregate transformations; calculate and bin ΔpIC50s in 3 bins

• Good – Bad – Neutral(depending on a cut-off c = 0.4 log units)

• Each

transformation

has a neutral

count

• Absolute value

or percentage:

NeutralCount%

chEMBL workflow outputs isosteric fragments

How similar are

isosteres in 2D

fingerprint space?

In field space?

Could fields help us

find unexpected

isosteres?

• 1802 fragment pairs from chEMBL_10 kinase data set

• 481 with no rotatable bonds left or right

• Simplifies conformational analysis

• For each fragment pair

1. Swap attachment points for adamantyl

2. FieldAlign to get field similarity (Use adamantyl to

constrain overlay)

3. RDKit fingerprint similarity – topological Daylight-esque

4. Correct similarities for adamantyl

• Are there isosteric pairs with high field similarity but low RDKit

similarity?

Field

Sim

RDKit Sim

Size by Neutral

Count %

Larger more

isosteric

Pairs with high

field similarity

but low 2D

similarity

Pairs with high

field and 2D

similarity

Field

Sim

RDKit Sim

Size by Neutral

Count %

Only those with

>60% isosteric

examples

Thiophene -> Phenol

Field

Sim

RDKit Sim

Size by Neutral

Count %

Only those with

>60% isosteric

examples

Imidazole->

Morpholine?

Field

Sim

RDKit Sim

Size by Neutral

Count %

Only those with

>60% isosteric

examples

Some small

fragments

WEE1

kinase

PDB 2I06


(from PDB)

Solvent-

exposed

Buried

Field

Sim

RDKit Sim

Size by Neutral

Count %

Only those with

>60% isosteric

examples

Me-tetrazole ->

oxadiazole

Field

Sim

RDKit Sim

Size by Neutral

Count %

Only those with

>60% isosteric

examples

Thiophene ->

phenol


(from PDB)

• 6299 data points from thermodynamic solubility assay

• 423 single-point transformations

• 215 no-rotatable point transformations

• Aggregate transformations; calculate and bin ΔlogS in 3 bins

• Good – Bad – Neutral (c = 0.3 log units)

• Are there transformations which increase solubility with low

field similarity but high RDKit similarity?

Field

Sim

RDKit Sim

Size by Good

Count %

Only those with

>60% boosting

examples

Ring contraction

+ twist ?

Field

Sim

RDKit Sim

Size by Good

Count %

Only those with

>60% boosting

examples

Big boost from

morpholine

• Can mine chEMBL data for non-obvious isosteres

• Will other data sets find more?

• Would like to improve workflow to make isostere data set for

3D similarity comparison

• Improve fragmentation/conformer/ alignment handling?

• Need to include whole molecule?

• Need 3D binding site data as well to confirm isosterism?

• KNIME platform developing

• Virtual screening and evaluation environment

• Rapid experimentation with varied tools

• http://tech.knime.org/community/erlwood

George Papadatos

Juliette Pradon

Hina Patel

Nikolas Fechner

David Thorner

Michael Bodkin

KNIME, chEMBL + Cresset !

ROC curves for

retrieval of >66%

isosteric groups

Field similarity

performs better

than RDKit

But AUC = 0.68

Workflow not

optimized for

this purpose

David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'

Technology

Transcript of David Evans, Eli-Lilly, 'Field-Aligned Matched Pairs'