A Novel SAR-Driven Approach for Identifying True High-Throughput Screening Hits S. Frank Yan, Hayk...

1
A Novel SAR-Driven Approach for Identifying True High-Throughput Screening Hits S. Frank Yan, Hayk Asatryan, Jing Li, Kaisheng Chen, and Yingyao Zhou Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins Drive, San Diego, CA 92121, USA ChemAxon User Group Meeting, June 2006 Modern drug discovery relies heavily on large-scale high-throughput screening (HTS) to identify potential starting points for medicinal chemistry optimization. The typical “top X” activity cutoff method used to generate hits from large amount of raw HTS data is intrinsically error-prone due to the noisy nature of single-dose HTS, which oftentimes leads to a large number of false positives. Here we propose a novel knowledge-based, SAR-driven statistical approach for primary HTS hit generation using ChemAxon technology for clustering and chemical fingerprints. The method is also implemented with SciTegic Pipeline Pilot. In a proof-of-concept study for an in-house HTS campaign, the new approach proved to be more effective in identifying confirmed active compounds in diverse chemical scaffolds containing valuable SAR information, as demonstrated by a significantly improved confirmation rate compared to the traditional “top X” cutoff method. A Proof-of-Concept Study HTS data from an internal project were used and results from secondary experiments were used as benchmark. The 50,000 most active compounds were selected for analysis (HTS activity < ~0.76) Compound clustering and fingerprinting were generated using ChemAxon software. OPI approach Top X method Scaffold-based Probability Score Alone Is Sufficient to Prioritize Hits Confirmation rate for those selected compounds Significant Structural Diversity in the Selected Hits Some Scaffolds Picked by OPI SIDXXXX645 SIDXXX414 8 compounds selected, 5/6 confirmed active mean = 0.05 stdev. = 0.46 SIDXXX598 8 compounds selected, 7/7 confirmed active mean = 0.05 stdev. = 0.18 28 compounds selected, 12/28 confirmed active mean = 0.11 stdev. = 0.30 57 compounds selected, 31/36 confirmed active mean = 0.31 stdev. = 0.09 SIDXXXX000 Great Improvement over the traditional “Top X” method Advantages of OPI Hit-picking An individualized activity threshold for every cluster/scaffold instead of a one- fits-all cutoff Effective in eliminating experimental artifacts (particularly those in the high- activity region) Improved hit confirmation rate (85% vs. 55%) Hits are inherently analyzed on a cluster/scaffold basis and SAR information can be readily extracted, facilitating the hit-to-lead process Ontology-Based Pattern Identification* in Hit Selection *Novel Statistical Approach for Primary High-Throughput Screening Hit Selection S. Yan et al. J. Chem. Inf. Model. 45(6), 1784-1790, 2005 In silico gene function prediction using ontology-based pattern identification Y. Zhou et al. bioinformatics, vol.21 no. 7 2005, p1237-1245 Guilt by association Structure– activity relationship To automatically determine a subset of compounds for each cluster/scaffold, which not only share similar structure but also similar high HTS activity Cluster all tested, QC-ed compounds (>1,000,000) from an HTS campaign and rank them by activity For one given cluster, select more and more compounds by decreasing the activity cutoff and compute the corresponding hypergeometric P-value The cutoff for this cluster is determined when P-value reaches minimum P 0 , and member compounds whose activities are higher than the cutoff are selected as potential hits and assigned a score P 0 Repeat steps 2 and 3 for all clusters Rank/select hits based on score P 0 and HTS activity N compounds from HTS A cluster of n compounds mCluster probability score P 0 = min P(N,n,m,m) Increasingly select m compounds by lowering the activity cutoff mcompounds (P=P 0 ) are selected as potential hits for this compound cluster/scaffold Lower activity, more compounds 0.12 0.18 0.23 0.26 0.41 0.50 0.19 Implementation Using Pipeline Pilot The Hit-to-Lead Paradigm Two important milestones that have fundamental far-reaching effects Bleicher et al. (2003) Nat. Rev. Drug Discov., 2, 369 “Cherry-Pick” the HTS Hits A new approach to more effectively select primary hits is urgently needed! Low activity High activity # of compounds An arbitrary activity cutoff In many real cases, the confirmation rate is often low ~100 to ~5000 The HTS Approach Initial HTS campaign Quality control Primary hit selection Hit validation >1,000,000 1,000,000 1,000 100 H T S a s s a y a c t i v i t y Compound group Highly active singletons Scaffolds with good activity and good SAR Scaffolds with good activity but okay SAR cutoff Scaffolds with very bad SAR cutoff traditional cutoff Likely a false positive Scaffolds with okay activity but good SAR Valuable SAR Is Immediately Caught for This Scaffold Imidazopyridine Selected hits Not selected 0.12 0.12 0.16 0.18 0.18 0.19 0.23 0.26 0.41 0.5 0.51 0.65 0.67

Transcript of A Novel SAR-Driven Approach for Identifying True High-Throughput Screening Hits S. Frank Yan, Hayk...

Page 1: A Novel SAR-Driven Approach for Identifying True High-Throughput Screening Hits S. Frank Yan, Hayk Asatryan, Jing Li, Kaisheng Chen, and Yingyao Zhou Genomics.

A Novel SAR-Driven Approach for Identifying True High-Throughput Screening HitsS. Frank Yan, Hayk Asatryan, Jing Li, Kaisheng Chen, and Yingyao Zhou

Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins Drive, San Diego, CA 92121, USA

ChemAxon User Group Meeting, June 2006

Modern drug discovery relies heavily on large-scale high-throughput screening (HTS) to identify potential starting points for medicinal chemistry optimization. The typical “top X” activity cutoff method used to generate hits from large amount of raw HTS data is intrinsically error-prone due to the noisy nature of single-dose HTS, which oftentimes leads to a large number of false positives. Here we propose a novel knowledge-based, SAR-driven statistical approach for primary HTS hit generation using ChemAxon technology for clustering and chemical fingerprints. The method is also implemented with SciTegic Pipeline Pilot. In a proof-of-concept study for an in-house HTS campaign, the new approach proved to be more effective in identifying confirmed active compounds in diverse chemical scaffolds containing valuable SAR information, as demonstrated by a significantly improved confirmation rate compared to the traditional “top X” cutoff method.

A Proof-of-Concept Study•HTS data from an internal project were used and results from secondary experiments were used as benchmark. The 50,000 most active compounds were selected for analysis (HTS activity < ~0.76)

•Compound clustering and fingerprinting were generated using ChemAxon software.

OPI approach

Top X method

Scaffold-based Probability Score Alone Is Sufficient to Prioritize Hits

Confirmation rate for those selected compounds

Significant Structural Diversity in the Selected Hits

Some Scaffolds Picked by OPI

SIDXXXX645

SIDXXX4148 compounds selected, 5/6

confirmed activemean = 0.05 stdev. = 0.46

SIDXXX5988 compounds selected, 7/7

confirmed activemean = 0.05 stdev. = 0.18

28 compounds selected, 12/28 confirmed active

mean = 0.11 stdev. = 0.30

57 compounds selected, 31/36 confirmed active

mean = 0.31 stdev. = 0.09

SIDXXXX000

Great Improvement over the traditional “Top X” method

Advantages of OPI Hit-picking•An individualized activity threshold for every cluster/scaffold instead of a one-fits-all cutoff

•Effective in eliminating experimental artifacts (particularly those in the high-activity region)

•Improved hit confirmation rate (85% vs. 55%)

•Hits are inherently analyzed on a cluster/scaffold basis and SAR information can be readily extracted, facilitating the hit-to-lead process

•Some level of library redundancy is required

Ontology-Based Pattern Identification* in Hit Selection

*Novel Statistical Approach for Primary High-Throughput Screening Hit SelectionS. Yan et al. J. Chem. Inf. Model. 45(6), 1784-1790, 2005 In silico gene function prediction using ontology-based pattern identificationY. Zhou et al. bioinformatics, vol.21 no. 7 2005, p1237-1245

Guilt by association Structure–activity relationship

To automatically determine a subset of compounds for each cluster/scaffold, which not only share similar structure but also similar high HTS activity

•Cluster all tested, QC-ed compounds (>1,000,000) from an HTS campaign and rank them by activity

•For one given cluster, select more and more compounds by decreasing the activity cutoff and compute the corresponding hypergeometric P-value

•The cutoff for this cluster is determined when P-value reaches minimum P0, and member compounds whose activities are higher than the cutoff are selected as potential hits and assigned a score P0

•Repeat steps 2 and 3 for all clusters

•Rank/select hits based on score P0 and HTS activity

N compounds from HTS

A cluster of n compounds

m’

Cluster probability score P0 = min P(N,n,m,m’)

Increasingly select m compounds by lowering the activity cutoff

m’ compounds (P=P0) are selected as potential hits for this compound cluster/scaffold

Lower activity, more compounds

0.12

0.18

0.23 0.26

0.41

0.50

0.19

Implementation Using Pipeline Pilot

The Hit-to-Lead ParadigmTwo important milestones that have fundamental far-reaching effects

Bleicher et al. (2003) Nat. Rev. Drug Discov., 2, 369

“Cherry-Pick” the HTS Hits

A new approach to more effectively select primary hits is urgently needed!

Low activity High activity

# o

f co

mp

ou

nd

s

An arbitrary activity cutoff

In many real cases, the

confirmation rate is often

low

~100 to ~5000

The HTS Approach

Initial HTS campaign

Quality control

Primary hit selection

Hit validation

>1,000,000 1,000,000 1,000 100

HTS

assa

y a

ctivity

Compound group

Highly active singletons

Scaffolds with good activity and good SAR

Scaffolds with good activity but okay SAR

cutoffScaffolds with very bad SAR

cutofftraditional cutoff

Likely a false positive

Scaffolds with okay activity but good SAR

Valuable SAR Is Immediately Caught for This Scaffold

Imidazopyridine

Selected hitsNot selected

0.12 0.12

0.16 0.18

0.18 0.19

0.23

0.26

0.41

0.5

0.51

0.65

0.67