Clinical Trial Designs for the Evaluation of Prognostic & Predictive Classifiers Richard Simon,...

35
Clinical Trial Designs for the Evaluation of Prognostic & Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov

Transcript of Clinical Trial Designs for the Evaluation of Prognostic & Predictive Classifiers Richard Simon,...

Clinical Trial Designs for the Evaluation of Prognostic & Predictive Classifiers

Richard Simon, D.Sc.

Chief, Biometric Research Branch

National Cancer Institute

http://brb.nci.nih.gov

Validation = Fitness for Intended Use

Intended Uses

Prognostic biomarkers Measured before treatment to indicate long-term outcome for patients

untreated or not receiving chemotherapy Used to determine who who doesn’t need more treatment

Predictive biomarkers Measured before treatment to identify who will benefit from a particular

treatment

Early detection biomarkers

Disease progression biomarkers

Prognostic Biomarkers in Oncology

Most gene expression signatures are developed as prognostic biomarkers.

Like numerous previously developed prognostic

markers, most will never be used because they have not been demonstrated to be therapeutically relevant

Most prognostic marker studies are not conducted with an intended use clearly in mind Most use a convenience sample of heterogeneous patients for

whom tissue is available rather than patients selected for evaluating an intended use

Prognostic Markers in Oncology

There is rarely attention to analytical validation

There is rarely a separate validation study that addresses medical utility Without a defined intended use, validation is meaningless

and impossible

Prognostic Biomarkers Can Have Medical Utility

Node Negative ER Positive Breast Cancer

Intended use is to identify patients who are likely to be cured by surgery/radiotherapy and hormonal therapy and therefore are unlikely to benefit from adjuvant chemotherapy Oncotype Dx recurrence score MammaPrint

Types of Validation

Analytical validation Accuracy in measurement of analyte Robustness and reproducibility

Clinical validation Correlation of score/classifier with clinical state or

outcome Medical utility

Actionable Use results in patient benefit

Medical Utility

Benefits patient by improving treatment decisions

Depends on context of use of the biomarker Treatment options and practice guidelines Other prognostic factors

Clinical validity vs medical utility

A prognostic signature for patients with breast cancer may correlate with outcome, but does it identify a set of patients who have such good outcome without chemotherapy that they do not require treatment?

A prognostic signature for patients with early NSCLC may correlate with outcome, but does it identify a set of patients who have poor outcome untreated and benefit from chemotherapy?

Developmental vs Validation Studies

Developmental studies screen candidate markers to develop biomarker scores or classifiers Train classifiers, optimize tuning parameters, set cut-off values for

classification

Developmental studies often use cross-validation or split-sample validation to provide a preliminary estimate of the accuracy of the marker/classifier for predicting a clinical outcome

Developmental studies generally address clinical-validity (i.e. prediction accuracy), not medical utility

Developmental vs Validation Studies

Validation studies use a previously developed, completely specified classifiers/scores

Validation studies should use analytically validated

tests and focus on medical utility, not predictive accuracy This often requires a prospective clinical trial

Marker Strategy Design

SOC is ChemorxMarker Strategy Design

Δ =π−Δ−

Marker Strategy Design

Generally very inefficient because many patients in both randomization groups receive the same treatment

So inefficient as to be an insurmountable roadblock to validation of potentially valuable classifiers

Marker Strategy Design

Sometimes poorly informative Not measuring marker in control group means that

merits of complex marker treatment strategies cannot be dissected

Requires a marker/signature to be used for determining treatment decisions which may result in inferior outcome to the SOC

Marker Strategy Design

Data is not useful for evaluation of other markers or tests

Provides no information not provided by the test-all design

SOC is ChemorxTest-All Design

For survival data

events =4(z1−α + z1−β )2 / Δ2

where Δ =log hazard ratio

For binary data

patients ; 4p(1-p)(z1−α + z1−β )2 / Δ2

where Δ =difference in proportions

p=proportion under null hypothesis

For stratification design for detecting 33% reduction in

hazard in test negative patients using C, with α =.05, β =.10events=263patients=2630 at 10% event rate

For marker strategy design

events=263/π−2

patients=2630/π−2

e.g. with π - =.5, 1052 events and 10,520 patients

proportion reduction in hazard=qnochemo −qchemo

qnochemo

where q denotes failure rate.

For 33% reduction in hazard,If qnochemo =.06, qnochemo −qchemo =.02If qnochemo =.09, qnochemo −qchemo =.03.

Using phase II data, develop predictor of response to new drugDevelop Predictor of Response to New Drug

Patient Predicted Responsive

New Drug Control

Patient Predicted Non-Responsive

Off Study

Targeted (Enrichment) Design

Evaluating the Efficiency of Targeted Design

Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004; Correction and supplement 12:3229, 2006

Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-339, 2005.

Relative efficiency of targeted design depends on proportion of patients test positive effectiveness of new drug (compared to control) for test

negative patients When less than half of patients are test positive and the

drug has little or no benefit for test negative patients, the targeted design requires dramatically fewer randomized patients than the standard design in which the marker is not used

Stratification Design for New Drug Development with Companion Diagnostic

Develop prospective analysis plan for evaluation of treatment effect and how it relates to biomarker type I error should be protected for multiple

comparisons Trial sized for evaluating treatment effect overall and

in subsets defined by test Stratifying” (balancing) the randomization may be

useful but is not a substitute for a prospective analysis plan.

Fallback Analysis Plan

Compare the new drug to the control overall for all patients ignoring the classifier. If poverall ≤ 0.01 claim effectiveness for the eligible

population as a whole Otherwise perform a single subset analysis

evaluating the new drug in the classifier + patients If psubset ≤ 0.04 claim effectiveness for the classifier +

patients.

In some cases a trial with optimal structure for evaluating a new biomarker will have been previously performed and will have pre-treatment tumor specimens archived

Under certain conditions, a focused analysis based on specimens from the previously conducted clinical trial can provide highly reliable evidence for the medical utility of a prognostic or predictive biomaker

In some cases, it may be the only way of obtaining high level evidence

Prospective-Retrospective Study

Guidelines Proposed by Simon, Paik, HayesProspective-retrospective design

1. Adequate archived tissue from an appropriately designed phase III clinical trial must be available on a sufficiently large number of patients that the appropriate biomarker analyses have adequate statistical power and that the patients included in the evaluation are clearly representative of the patients in the trial.

2. The test should be analytically and pre-analytically validated for use with archived tissue. Testing should be perform blinded to the clinical data.

3. The analysis plan for the biomarker evaluation should be completely specified in writing prior to the performance of the biomarker assays on archived tissue and should be focused on evaluation of a single completely defined classifier.

4. The results should be validated using specimens from a similar, but separate study involving archived tissues.