Use of the False Discovery Rate for Evaluating Clinical Safety Data Joseph F. Heyse Devan V....
-
Upload
jody-james -
Category
Documents
-
view
223 -
download
0
Transcript of Use of the False Discovery Rate for Evaluating Clinical Safety Data Joseph F. Heyse Devan V....
Use of the False Discovery Rate for Evaluating Clinical Safety Data
Joseph F. Heyse Devan V. Mehrotra
Clinical Biostatistics – VaccinesMerck Research Laboratories
Blue Bell, PA
Third International Conference on Multiple Comparisons Bethesda, MD August 6, 2002
Heyse/MCP2002 bl 2
Acknowledgment
This research was in collaboration with the late Professor John Tukey (Princeton University).
Heyse/MCP2002 bl 3
Outline
Motivating example
Multiplicity issues
FWER and FDR
Proposal for flagging AEs
Summary of three examples
Concluding remarks
Heyse/MCP2002 bl 4
Introduction Evaluation of safety is an important part of clinical
trials of pharmaceutical and biological products.
Adverse experiences (AEs) can be categorized as three types– Tier 1: Associated with specific hypotheses– Tier 2: Set encountered as part of trial safety
evaluation– Tier 3: Rare spontaneous reports of serious
events that require clinical evaluation
Our interest is primarily Tier 2
Heyse/MCP2002 bl 5
ICH Recommendations
ICH-E9 recommends descriptive statistical methods supplemented by confidence intervals
p-values useful to evaluate a specific difference of interest
If hypothesis tests are used, statistical adjustments for multiplicity to quantitate the Type I error are appropriate, but the Type II error is usually of more concern
p-values sometimes useful as a “flagging” device applied to a large number of safety variables to highlight differences worthy of further attention
Heyse/MCP2002 bl 6
IllustrationMultiplicity in Safety Assessment Clinical trial compared the safety and
immunogenicity of the combination vaccine COMVAX™* to its monovalent components
1 of 92 safety comparisons revealed a higher rate of unusual high-pitched crying (UHPC) following the second of a three-dose series (6.7% vs. 2.3%, p=0.016)
No medical rationale for this finding was discovered and a larger hypothesis-driven study was designed
Comparable rates were observed following vaccination in this larger trial
*COMVAX™ is a combination of HIB and HB vaccine
Heyse/MCP2002 bl 7
Motivating Example(MMRV* Vaccine)
Safety and immunogenicity vaccine trial.
Study population: healthy toddlers, 12-18 months of age
Group 1 = MMRV + PedvaxHIB on Day 0 Group 2 = MMR + PedvaxHIB on Day 0, followed
by (optional) varicella vaccine on Day 42
*MMRV is a combination measles, mumps, rubella, varicella vaccine
Heyse/MCP2002 bl 8
Motivating Example (cont’d)
Safety follow-up (local and systemic reactions)Group 1: Day 0-42 (N=148)Group 2: Day 0-42 (N=148) and Day 42-84 (N=132)
Question: Is the safety profile different if the varicella component is given as part of a combination vaccine on Day 0 compared with giving it 6 weeks later as a monovalent vaccine?
AEs: Group 1 (Day 0-42) vs. Group 2 (Day 42-84)
Heyse/MCP2002 bl 9
Clinical AE Counts (“Tier 2” AEs)
# BS ADVERSE EXPERIENCE
1 01 ASTHENIA / FATIGUE
2 01 FEVER
3 01 INFECTION, FUNGAL
4 01 INFECTION, VIRAL
5 01 MALAISE
6 03 ANOREXIA
7 03 CANDIDIASIS, ORAL
8 03 CONSTIPATION
9 03 DIARRHEA
10 03 GASTROENTERITIS, INFECTIOUS
11 03 NAUSEA
12 03 VOMITING
13 05 LYMPHADENOPATHY
Grp 1(N1=148
)X1
57
34
2
3
27
7
2
2
24
3
2
19
3
Grp 2(N2=132
)X2
40
26
0
1
20
2
0
0
10
1
7
19
2
DIFF (%)
8.2
3.3
1.4
1.3
3.1
3.2
1.4
1.4
8.6
1.3
-4.0
-1.6
0.5
p-value
.1673
.5606
.4998
.6248
.5248
.1791
.4998
.4998
.0289*
.6248
.0889
.7295
1.0000
Heyse/MCP2002 bl 10
Clinical AE Counts (“Tier 2” AEs) - cont’d
# BS ADVERSE EXPERIENCE
14 06 DEHYDRATION
15 08 CRYING
16 08 INSOMNIA
17 08 IRRITABILITY
18 09 BRONCHITIS
19 09 CONGESTION, NASAL
20 09 CONGESTION, RESPIRATORY
21 09 COUGH
22 09 INFECTION, RESPIRATORY, UPPER
23 09LARYNGOTRACHEOBRONCHITIS
24 09 PHARYNGITIS25 09 RHINORRHEA26 09 SINUSITIS
Grp 1(N1=148
)X1
0
2
2
75
4
4
1
13
28
2
13153
Grp 2(N2=132
)X2
2
0
2
43
1
2
2
8
20
1
8
14
1
DIFF (%)
-1.5
1.4
-0.2
18.1
1.9
1.2
-0.8
2.7
3.8
0.6
2.7-0.51.3
p-value
.2214
.4998
1.0000
.0025*
.3746
.6872
.6033
.4969
.4308
1.0000
.49691.0000.6248
Heyse/MCP2002 bl 11
Clinical AE Counts (“Tier 2” AEs) - cont’d
# BS ADVERSE EXPERIENCE27 09 TONSILLITIS28 09 WHEEZING29 10 BITE/STING, NON-VENOMOUS30 10 ECZEMA31 10 PRURITUS32 10 RASH33 10 RASH, DIAPER34 10 RASH, MEASLES/RUBELLA-LIKE35 10 RASH, VARICELLA-LIKE36 10 URTICARIA37 10 VIRAL EXANTHEMA38 11 CONJUNCTIVITIS39 11 OTITIS MEDIA40 11 OTORRHEA
Grp 1(N1=148
)X123422
13684010
182
Grp 2(N2=132
)X2110013212222
141
DIFF (%)0.61.32.71.40.66.52.54.61.2
-1.5-0.8-1.51.60.6
p-value 1.0000 .6248.1248.4998
1.0000.0209*.2885.0388*
.6872.2214.6033.2214.7109
1.0000
Heyse/MCP2002 bl 12
Multiplicity Issues - The Problem
Potential for too many false positive safety findings if the multiplicity problem is ignored (for “Tier 2” AEs).
This can muddy the interpretation of the
safety profile of the vaccine/drug.
Heyse/MCP2002 bl 13
Multiplicity Issues - The Challenge
To develop a procedure for tackling multiplicity that:
Provides a proper balance between “no adjustment” and “too much adjustment”.
Is easy to automate/implement.
Heyse/MCP2002 bl 14
Familywise Error Rate (FWER)
Let F = {H1,H2 … Hm} denote a family of m hypotheses.
FWER = Pr(any true Hi F is rejected).
We usually seek methods for which FWER .
Benjamini & Hochberg (1995) argue that, in certain settings, requiring control of the FWER is often too conservative. They suggest controlling the “false discovery rate” instead, as a more powerful alternative.
Benjamini , Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, B, 57, 289-300.
Heyse/MCP2002 bl 15
False Discovery Rate (FDR)(Benjamini & Hochberg)
.0 as 00 Define rejected.y incorrectl are which hypotheses
null rejected of proportion expected RV
EFDR
Declared
InsignificantDeclared
Significant Total
# of true Hi U V m0
# of false Hi T S m m0
Total m R R m
Heyse/MCP2002 bl 16
False Discovery Rate (FDR) (cont’d)
(Benjamini & Hochberg)
FDR FWER {equality holds if m = m0}. Effect of correlations on FDR is an area of research.
mj
p if H,,H,H jectRe jj21 {This controls FDR at }
0
mm
1mj,pj
m,p~minp~
pp~ :values-p Adjusted
j1jj
mm
Unadjusted p-values .0193 .0280 .2038 .4941
FDR-adjusted p-values .0560 .0560 .2718 .4941
Example
Heyse/MCP2002 bl 17
Proposal for Flagging AEs
We routinely summarize AEs by body system (BS).
s body systems (i = 1, 2, …, s)
ki AEs associated with body system i
pij = between-group p-value for the jth AE within ith BS (e.g., based on two-tailed Fisher’s exact test.)
Heyse/MCP2002 bl 18
Proposal for Flagging AEs (cont’d)
Step 1Ignore AEs for which the total incidence is so low that a rejection even at the unadjusted 0.05 level is impossible.
Step 2Among the remaining AEs, flag those for which the p-value achieves statistical significance after adjusting for multiplicity using a “Double FDR” approach.
Heyse/MCP2002 bl 19
Double FDR Approach
Define This represents the strongest safety “signal” for body system i.
1st level FDR adjustment – Apply FDR adjustment to– Let
2nd level FDR adjustment– Within body system i, apply FDR adjustment to – Let
.pppminpiik,2i,1i
*i
*s
*2
*1 p,,p,p
*i
*i p adjusted-FDR p~
ijij p adjusted-FDR p~
si1,pppiik,2i,1i
Heyse/MCP2002 bl 20
Double FDR Approach (cont’d)
Proposed Flagging Rule
Flag AE(i,j) if
What values of 1 and 2 should we use?
2ij1*i p~ and p~
Heyse/MCP2002 bl 21
Choosing 1 and 2
Set 2 = and use either (a) or (b) below for 1.
(a)Using resampling (non-parametric bootstrap) to determine the largest data-dependent 1 ( 2) that ensures FDR .
OR
(b)Choose 1 ( 2) independent of the data. For
example, let , and estimate the
resulting FDR using resampling.
2or 2
21
Heyse/MCP2002 bl 22
Resampling Procedure Purpose
– To estimate the false discovery rates of the following:
– To determine the largest 1( 2) that guarantees
FDR when using DFDR(1, 2).
NOADJ
FULLFDR()
DFDR(1, 2)
No multiplicity adjustment; flag AE if unadjusted p < .05
Full FDR adjustment (ignore BS grouping)
Double FDR adjustment for selected (1, 2)
Heyse/MCP2002 bl 23
Resampling Procedure (cont’d)
Details1. POOL data from both treatment groups into a
common population. Sample with replacement from this common population, to simulate many repetitions of the original trial.
This procedure:a) simulates a true null situation (Group 1 =
Group 2).b) preserves the correlation structure of original
data.
2. Implement our proposal for flagging AEs using the NOADJ, FULLFDR(), and DFDR(1, 2) approaches, and calculate the corresponding FDRs.
Heyse/MCP2002 bl 24
MMRV Example - Resampling Results
Y = # of incorrectly flagged AEs*
Distribution of Y (%)
Method 0 1 2 3 FDR (% )
NOADJ 48.8 33.0 12.9 5.3 51.2
FULLFDR(.10) 95.2 4.0 0.6 0.2 4.8
DFDR(.02, .05) 97.0 2.5 0.4 0.1 3.0
DFDR(.05, .05) 91.2 7.3 1.1 0.4 8.8
DFDR(.05, .10) 90.9 6.4 1.9 0.8 9.1
DFDR(.10, .10) 79.8 13.0 5.2 2.0 20.2
* out of 40; 2000 simulations
Heyse/MCP2002 bl 25
MMRV Example - Resampling Results
2
1 0.05 0.10 0.15
0.01 1.45 1.45 1.450.02 3.00 3.00 3.000.03 4.70 4.70 4.700.04 7.10 7.15 7.150.05 8.80 9.15 9.150.06 11.70 11.700.07 13.65 13.700.08 16.35 16.500.09 18.85 19.250.10 20.25 21.300.11 24.250.12 25.600.13 27.750.14 29.900.15 31.25
DFDR(1, 2): Estimated FDR (%)
Max. Acceptable FDR () 5% 10% 15%
(2 = ) (.03,.05) (.05,.10) (.07,.15)
Heyse/MCP2002 bl 26
First Level FDR Adjustment
Body System IDNervous systemSkinDigestive systemBody site unspecifiedSpecial sensesMetabolic / immuneRespiratoryHematologic and lymphatic
Number of AE Types
397531
111
Unadjusted p-value0.00250.02090.02890.16730.22140.22140.37461.0000
FDR Adjusted
p-value0.02000.07710.07710.29520.29520.29520.42811.0000
Heyse/MCP2002 bl 27
Second Level FDR AdjustmentBody System 08: Nervous System and Psychiatric
Adverse Experience
Irritability
Crying
Insomnia
Unadjusted p-value
0.0025
0.4998
1.0000
FDR Adjusted p-
value
0.0075
0.7497
1.0000
Heyse/MCP2002 bl 28
Summary of Three ExamplesFlagged AEs
DFDR Adjustment, maximum FDR (% ):Trial(# of subs.)
#of
AEsNo Multiplicity
Adjustment 15% 10% 5%
PedvaxHIB
(N=681)15
FDR ~ 43%Irritability
Upper Resp. Inf.Rash
1=.07,=.15Irritability
Upper Resp. Inf.
1=.05,=.10
1=.02,=.05
MMRV(N=280)
40
FDR ~ 51%Irritability
RashM/R-like rash
Diarrhea
1=.07,=.15Irritability
RashM/R-like rash
Diarrhea
1=.05,=.10Irritability
Irritability
COMVAX
(N=811)58
FDR ~ 87%Erythema
RashRhinorrhea
Heyse/MCP2002 bl 29
Concluding Remarks
Current approach of flagging AEs based on unadjusted p-values (or C.I.s) can result in excessive false positive safety findings. These can cause undue concern for approval/labeling, and can affect post-marketing commitments.
Under our proposal, the unadjusted p-values (or C.I.s) would still be reported. The Double FDR multiplicity adjustment is a method to facilitate the interpretation of the unadjusted p-values.
Heyse/MCP2002 bl 30
Concluding Remarks (cont’d)
Our proposal for tackling multiplicity will:
– substantially reduce the percentage of incorrectly flagged AEs.
– be better accepted if described a priori in the protocol/DAP rather than on a post-hoc basis.
– facilitate comparable interpretation of safety results across studies, with respect to Type I error.