A Novel SAR-Driven Approach for Identifying True High-Throughput Screening Hits S. Frank Yan, Hayk...

Post on 26-Mar-2015

215 views 0 download

Tags:

Transcript of A Novel SAR-Driven Approach for Identifying True High-Throughput Screening Hits S. Frank Yan, Hayk...

A Novel SAR-Driven Approach for Identifying True High-Throughput Screening HitsS. Frank Yan, Hayk Asatryan, Jing Li, Kaisheng Chen, and Yingyao Zhou

Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins Drive, San Diego, CA 92121, USA

ChemAxon User Group Meeting, June 2006

Modern drug discovery relies heavily on large-scale high-throughput screening (HTS) to identify potential starting points for medicinal chemistry optimization. The typical “top X” activity cutoff method used to generate hits from large amount of raw HTS data is intrinsically error-prone due to the noisy nature of single-dose HTS, which oftentimes leads to a large number of false positives. Here we propose a novel knowledge-based, SAR-driven statistical approach for primary HTS hit generation using ChemAxon technology for clustering and chemical fingerprints. The method is also implemented with SciTegic Pipeline Pilot. In a proof-of-concept study for an in-house HTS campaign, the new approach proved to be more effective in identifying confirmed active compounds in diverse chemical scaffolds containing valuable SAR information, as demonstrated by a significantly improved confirmation rate compared to the traditional “top X” cutoff method.

A Proof-of-Concept Study•HTS data from an internal project were used and results from secondary experiments were used as benchmark. The 50,000 most active compounds were selected for analysis (HTS activity < ~0.76)

•Compound clustering and fingerprinting were generated using ChemAxon software.

OPI approach

Top X method

Scaffold-based Probability Score Alone Is Sufficient to Prioritize Hits

Confirmation rate for those selected compounds

Significant Structural Diversity in the Selected Hits

Some Scaffolds Picked by OPI

SIDXXXX645

SIDXXX4148 compounds selected, 5/6

confirmed activemean = 0.05 stdev. = 0.46

SIDXXX5988 compounds selected, 7/7

confirmed activemean = 0.05 stdev. = 0.18

28 compounds selected, 12/28 confirmed active

mean = 0.11 stdev. = 0.30

57 compounds selected, 31/36 confirmed active

mean = 0.31 stdev. = 0.09

SIDXXXX000

Great Improvement over the traditional “Top X” method

Advantages of OPI Hit-picking•An individualized activity threshold for every cluster/scaffold instead of a one-fits-all cutoff

•Effective in eliminating experimental artifacts (particularly those in the high-activity region)

•Improved hit confirmation rate (85% vs. 55%)

•Hits are inherently analyzed on a cluster/scaffold basis and SAR information can be readily extracted, facilitating the hit-to-lead process

•Some level of library redundancy is required

Ontology-Based Pattern Identification* in Hit Selection

*Novel Statistical Approach for Primary High-Throughput Screening Hit SelectionS. Yan et al. J. Chem. Inf. Model. 45(6), 1784-1790, 2005 In silico gene function prediction using ontology-based pattern identificationY. Zhou et al. bioinformatics, vol.21 no. 7 2005, p1237-1245

Guilt by association Structure–activity relationship

To automatically determine a subset of compounds for each cluster/scaffold, which not only share similar structure but also similar high HTS activity

•Cluster all tested, QC-ed compounds (>1,000,000) from an HTS campaign and rank them by activity

•For one given cluster, select more and more compounds by decreasing the activity cutoff and compute the corresponding hypergeometric P-value

•The cutoff for this cluster is determined when P-value reaches minimum P0, and member compounds whose activities are higher than the cutoff are selected as potential hits and assigned a score P0

•Repeat steps 2 and 3 for all clusters

•Rank/select hits based on score P0 and HTS activity

N compounds from HTS

A cluster of n compounds

m’

Cluster probability score P0 = min P(N,n,m,m’)

Increasingly select m compounds by lowering the activity cutoff

m’ compounds (P=P0) are selected as potential hits for this compound cluster/scaffold

Lower activity, more compounds

0.12

0.18

0.23 0.26

0.41

0.50

0.19

Implementation Using Pipeline Pilot

The Hit-to-Lead ParadigmTwo important milestones that have fundamental far-reaching effects

Bleicher et al. (2003) Nat. Rev. Drug Discov., 2, 369

“Cherry-Pick” the HTS Hits

A new approach to more effectively select primary hits is urgently needed!

Low activity High activity

# o

f co

mp

ou

nd

s

An arbitrary activity cutoff

In many real cases, the

confirmation rate is often

low

~100 to ~5000

The HTS Approach

Initial HTS campaign

Quality control

Primary hit selection

Hit validation

>1,000,000 1,000,000 1,000 100

HTS

assa

y a

ctivity

Compound group

Highly active singletons

Scaffolds with good activity and good SAR

Scaffolds with good activity but okay SAR

cutoffScaffolds with very bad SAR

cutofftraditional cutoff

Likely a false positive

Scaffolds with okay activity but good SAR

Valuable SAR Is Immediately Caught for This Scaffold

Imidazopyridine

Selected hitsNot selected

0.12 0.12

0.16 0.18

0.18 0.19

0.23

0.26

0.41

0.5

0.51

0.65

0.67