Screening heuristics pope-final

Andy Pope

Platform Technology & Science, GlaxoSmithKline,

Collegeville PA, USA

SLAS 2012, San Diego February 4-8, 2012

Screening Heuristics & Chemical Property Bias - New directions for Lead Identification and Optimization

Screening Heuristics

Why Screening Heuristics?

1. Huge complex datasets screening wisdom? (customers)

2. Refining approaches/deliverables success rates attrition

Marketed Drugs et.

>103

Phys-chem

DMPK >105

Safety profiling

>50 FBDD >20

Some available datasets inside GSK

ELT >150

FS

>200

HTS >300

Target class

profiling >300

Program profiling

>500

GSK

Compounds + Data >>106

Public Data

Public Data

Descriptor metadata

Descriptor metadata

Descriptor metadata

Descriptor metadata

Des

crip

tor

met

ad

ata

Des

crip

tor

met

ad

ata

Des

crip

tor

met

ad

ata

Hit ID Compound

Profiles Structures

+ Properties

Other GSK Data – e.g. genomic, bio-informatic, clinical

e.g. PubChem e.g. Literature,

Connectivity

Maps

300+ HTS Campaigns – 2004-11

Assay te

chn

olo

gy (15

classes)

Target class (13 classes)

2007-11 screens – sized by count of screens

Twin approaches to screening heuristics

1. Building Collective wisdom

- Capture, combine and share the experiences of screeners and data from screens (and screeners)

2. New “big” data analysis/ insights

- Look for data patterns in large aggregated datasets

e.g. How well do different assay methods perform? What is the impact of screen quality and what should be targeted in assay development? What policies do I need in place to have a high quality screening process? Which assay technology works best?

e.g.

Do chemical properties influence the results of

screens?

How are screen results related between targets

and assay methods?

Which is the best method to use to discover hits?

How are library properties reflected in the hits?

From SBS Virtual Seminar Series 2007 - HTS Module 1

Building Collective Wisdom – a simple example

Some Questions; - What actually happens in practice as z’ varies? - What z’ should we be aiming for? - Is this affected by the type of assay? - What is the appropriate trade off between cost, robustness and sensitivity? - How are we doing?

Average Z’ of assay in HTS production



Pro

du

ctio

n f

ailu

re r

ate

(%

of

pla

tes)

Cyc

le t

ime

(w

eeks

/cam

pai

gn)

Stat

isti

cal c

ut-

off

(%

eff

ect

)

Avge. Z’

0.4-0.5

0.5-0.55

0.6-0.65

0.55-0.6

0.65-0.7

>0.8

0.75-0.8

0.7-0.75

Z’ Heuristics

- Z’ >0.8 is ideal, >0.7 acceptable - Z’ <0.7 many aspects of performance degrade (e.g. failures, cycle times, false +ve/-ve, hit confirmation) - Z’ vs “sensitivity” trade-off arguments may be based on false hunches - Target & assay type does not make a major difference

Properties, properties, properties…..

….But, do they affect screening data? ….are we selecting hits with the best properties?

….Bottom line; High cLogP (greasiness) is BAD ...This needs to be fixed at the start ..i.e in hit ID ….and tends to creep up during Lead Op.

Do compound molecular properties impact how they behave in screens?

Polar Surface Area (tPSA, Å2)

Hit

Rat

e (

%)

Compounds with tPSA 80-85 Å2

26M measured responses in this bin - 485k marked as “hit”

Hit rate = 100*(485k/26M) = 1.86%

The total polar surface area (tPSA) is defined as the surface sum over all polar atoms < 60 A2 predicts brain penetration > 140 A2 predicts poor cell penetration

- Hit rate for Compounds in specific tPSA bin

Aggregate results from all 330 campaigns 2005-2010 with >500K tests

e.g. Compound total polar surface area (tPSA) makes no difference

“hit” = % effect => 3 RSD

of sample population in

that specific screen

Size Matters……

MW

% C

pd

s in

MW

Bin

Cu

mu

lative % C

pd

s

Middle 80% of Cpds 270 470

Molecular Weight (MW)

Hit

Rat

e (

%)

1.50%

2.62%

4.0%

1.2%

Overall Hit rate rises 1.7-fold across the middle 80% of the screening deck

i.e. 70% rise in hit rate from MW = 270 to MW = 470

3.3-fold rise across full MW range - Only bins containing 1M or more records are shown

Greasiness matters most……

ClogP

% C

pd

s in

Clo

gP B

in C

um

ulative %

Cp

ds

Middle 80% of Cpds 1 5

ClogP

Hit

Rat

e (

%)

1.14%

3.31%

4.5%

1.1% Overall hit rate rises 2.9-fold across the middle 80% of the screening deck

i.e. from ClogP = 1 5

4.1-fold rise across full ClogP range

- Only bins containing 1M or more records are shown

HTS Promiscuity - cLogP cL

ogP

Inhibition frequency Index* (%)

Note; Compounds

required to have been

run in 50 HTS and

yielded > 50% effect in

a single screen to be

included

*Inhibition frequency index (IFI) = % of screens where cpd yielded >50% inhibition, where total screens run => 50

Compounds hitting ~1 target

Compounds hitting >10% of targets

Frequency at bin > Frequency at bin > Frequency at bin > Frequency at bin >

“Dark” Matter is small and polar

Mo

lecu

lar

We

igh

t (D

a)

cLo

gP

– Compounds which have not yielded >50% effect once in >50 screens

Biases translate to full-curve follow-up and beyond

cLogP

% C

om

po

un

ds

Test

ed

Molecular Weight

% C

om

po

un

ds

Test

ed

Elevated testing of large, lipophilic compounds in the full-curve phase of HTS

Reduced testing of small, polar compounds in the full-curve phase of HTS

Note; Plots represent data from 402M single-concentration responses & 2.1M full-curve results

Property bias in primary HTS hit marking are propagated forward to dose-response follow-up

SS testing FC testing FC – SS differential

Property bias detection at an individual screen level

e.g. Screens with largest response to cLogP

cLogP

Hit

rat

e a

s %

of

HR

at

cLo

gP =

3.5

Assay Technology vs. property bias H

it r

ate

as %

of

HR

at

cLo

gP =

3.5

cLogP

Colored by Hit rate (%)

e.g. By assay technology, normalized to HR for that screen at median collection cLogP value

e.g. No clear origins in any meta-data - Assay Technology, Target class, Screen quality etc. …. But effects detectable even at single screen level

1.28%

3.80%

Hit

Rat

e (

%)

ClogP

2.14%

Hit

Rat

e (

%)

(MW)

2.27%

Pretty flat

Lipophilicity trends in PubChem HTS Data

Primary data from around 100 Academic HTS campaigns obtained from

PubChem BioAssay

Lipophilicity – similar to GSK HTS Compound size – little effect

GSK screening deck (>50 HTSs, 2.01M cpds) ClogP = 0.00835*MW – 0.058, R2 = 0.18

PubChem Compounds (405k) ClogP = 0.00554*MW + 0.97, R2 = 0.09

X%

Y%

Hit

Rat

e (%

of

cpd

s >5

0%

I) a

t 1

0 u

M

ClogP

Not just HTS… Lipophilicity trends in kinase focused set screens

Primary data from ~50 focused screen campaigns against protein kinases

Lipophilicity and size – similar to GSK HTS

MW

X%

Y%

Hit

Rat

e (%

of

cpd

s >5

0%

I) a

t 1

0 u

M

Property R2, ± vs MW R2, ± vs

ClogP

MW 1, + 0.21, +

ClogP 0.21, + 1.0, +

HAC 0.92, + 0.19, +

fCsp3 0.15, + 0.00

RotBonds 0.36, + 0.04, +

tPSA 0.16, + 0.08, -

Chiral 0.02, + 0.00

HetAtmRatio 0.02, - 0.34, -

Complexity 0.31, + 0.02, +

Flexibility 0.02, + 0.00

AromRings 0.22, + 0.16, +

HBA 0.11, + 0.10, -

HBD 0.01, + 0.02, -

Bias from other simple chemical properties? H

it R

ate

(%)

Fraction of carbons that are sp3 (fCsp3)

+ve cLogP MW (HAC)

-ve fCsp3 flexibility

Improving hit marking – Property Biasing

Hit

Rat

e (%

)

Ordinary HTS Hit Marking Property-biased Hit Marking

MW

ClogP

Hit

Rat

e (%

) Ordinary HTS Hit Marking Property-biased Hit Marking

More attractive properties - promote

Less attractive properties - demote

% C

om

po

un

ds

Mean + 3 x RSD cut-off

RESPONSE (% control)

ClogP

(% o

f to

tal c

om

po

un

ds

in H

TS)

- 2004

- 2010

- D 2010 <> 2004

Year

% C

om

po

un

ds

Exce

ed

ing

Pro

pe

rty

Lim

it

New

2011

ClogP > 5

MW > 500

CCE Acquisition, Property Bounds 2004-05: Lipinski criteria (MW<500, ClogP<5) Most recently: MW<360, ClogP<3 Inclusion of DPU lead-op cpds: MW<500, ClogP<5

Evolving the screening collection…

GSK’s Compound Collection Enhancement (CCE) strategy - moving the HTS deck towards decreased size and lipophilicity with the aim of improving chemical starting points

Compounds tested in HTS test datasets

Can property biases translate into lead optimization?

Mo

re po

tent in

bio

chem

pIC

50

Cel

l - B

ioch

em

Mo

re p

ote

nt

in c

ell

Binned cLogP

Biochemical target assay

Cellular “mechanistic” target assay

Rodent DMPK, efficacy model

Med. chem

Example from current Lead Optimization Program -Cellular activity favors cLogP >4 - Directional “pull” to more lipophilic cpds? -Good DMPK at cLogP <3 - Value of cellular assay?

“patient in a

plate”

Or…….

“biochemistry

in a (grease-

selective) bag”!

Property bias in broad pharmacological profiling

Binned ClogP

Ave

rage

% o

f as

says

giv

ing

IC5

0 <

=1

0 u

M

Marketed drugs

GSK Terminated Leads & Candidates

n = ~2500

n = ~400

n = ~1000

Binned ClogP

Ave

rage

% o

f as

says

giv

ing

IC5

0 <

=1

0 u

M

GPCR’s – 17

Ion Channels – 8

Enzymes – 3

Kinases – 4

Nuclear Receptors – 2

Transporter – 3

Phenotypic – 3 (Blue Screen, Cell Heath, Phospholipidoses)

Early safety cross screening panel (eXP)

GSK Lead Op. compounds 2009-11

n = ~2500

Property bias in broad pharmacological profiling

n = ~2500

Binned ClogP

Ave

rage

% o

f as

says

giv

ing

IC5

0 <

=1

0 u

M

GPCR’s – 17

Ion Channels – 8

Enzymes – 3

Kinases – 4

Nuclear Receptors – 2

Transporter – 3

Phenotypic – 3 (Blue Screen, Cell Heath, Phospholipidoses)

Early safety cross screening panel (eXP)

GSK Lead Op. compounds 2009-11

n = ~2500

Kinome profiling – no impact of cLogP

Binned ClogP

Kinase structural classifier

% in

hib

itio

n v

alu

es (

>30

0 k

inas

e a

ssay

s)

% in

hib

itio

n v

alu

es

(>3

00

kin

ase

ass

ays)

~400 kinase Lead Op Compounds vs 300 protein kinases

Conclusions

Heuristic approaches allow both refinement of best practice and new insights

Standard screening processes favor the selection of lipophilic compounds - A contributing factor in current issues with drug Lead/Candidate property space

occupancy

- Improvement in screening collections and analysis methods can overcome this, BUT

- All this effort is wasted if Lead Optimization pathways pull compounds back towards

unfavorable property space!!

The very large datasets generated from screening have considerable value beyond the lifetime of individual campaigns

- Particularly crucial now that quality and cycle time problems are largely solved

- Many other examples exist beyond those shown here

- Please go look for these effects in your data!

Acknowledgements

Pat Brady Darren Green Stephen Pickett Sunny Hung Subhas Chakravorty Nicola Richmond Jesus Herranz Gonzalo Colmeranjo-Sanchez

…and numerous others who contributed to programs run by GSK 2004-2011…..

Tony Jurewicz Glenn Hofmann Stan Martens Jeff Gross

Snehal Bhatt Stuart Baddeley James Chan Sue Crimmin Emilio Diez Maite De Los Frailes Bob Hertzberg Deb Jaworski Ricardo Macarron Carl Machutta Julio Martin-Plaza Barry Morgan Juan Antonio Mostacero Dave Morris Dwight Morrow Mehul Patel Amy Quinn Geoff Quinique Mike Schaber Zining Wu Ana Roa And colleagues...

Screening & Compound Profiling

Screening heuristics pope-final

Technology

Transcript of Screening heuristics pope-final