On responder analyses when a continuous variable is dichotomized and measurement error is present
-
Upload
michael-kunz -
Category
Documents
-
view
212 -
download
0
Transcript of On responder analyses when a continuous variable is dichotomized and measurement error is present
On responder analyses when a continuous variable is dichotomized
and measurement error is present
Michael Kunz
Bayer Schering Pharma AG, 13342 Berlin, Germany
Received 25 March 2010, revised 23 July 2010, accepted 5 November 2010
In clinical studies results are often reported as proportions of responders, i.e. the proportion ofsubjects who fulfill a certain response criterion is reported, although the underlying variable ofinterest is continuous. In this paper, we consider the situation where a subject is defined as aresponder if the (error-free) continuous measurements post-treatment are below a certain fraction of(error-free) continuous measurements obtained pre-treatment. Focus is on the one-sample case, butan extension to the two-sample case is also presented. The bias of different estimates for the pro-portion of responders is derived and compared. In addition, an asymptotically unbiased ML-typeestimate for the proportion of responders is presented. The results are illustrated using data obtainedin a clinical study investigating pre-menstrual dysphoric disorder (PMDD).
Key words: Bias; Clinical relevance; Design of clinical studies; Responder analyses.
Supporting Information for this article is available from the author or on the WWW underhttp://dx.doi.org/10.1002/bimj.201000069.
1 Introduction
Measurements in clinical studies are often continuous variables. This holds not only e.g. for safetyvariables such as many laboratory variables, blood pressure, heart rate, etc., but also for efficacyvariables like e.g. tumor size in oncology studies or certain psychometric scales which can be treatedas continuous.
When analyzing clinical studies, subjects are often classified as responders or non-responderswhen a response-criterion is fulfilled or not. Often subjects are defined as responders if a continuousvariable of interest is below a certain threshold. One impact of this approach is a loss of statisticalpower, as e.g. reported by Cochran (1968), Fleiss (1986) and Fedorov et al. (2009). Oppenheimerand Kher (1999) show that such a procedure may be problematic as in the presence of measurementerror the resulting canonical non-parametric estimate for the proportion of responders is no longerunbiased in general. The effects of misclassification of data derived from a 2� 2 table on odds anddifferences in proportions have been investigated by e.g. Cochran (1968), Goldberg (1975), andNewell (1964).
In addition, an immediately evident weakness of a classification into responders and non-responders is the necessity to define a threshold. Even if this threshold was defined in advance –what e.g. in ICH E9 (1998) is considered as a prerequisite in the setting of a confirmatory clinicalstudy – it will often be difficult to justify and may turn out to be a matter of subsequent discussionsafter completion of the clinical study. Despite this property, responder analyses have a certainpopularity when illustrating results due to its apparent simplicity. Fedorov et al. (2009) regard
*Corresponding author: e-mail: [email protected]
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim
Biometrical Journal 53 (2011) 1, 137–155 DOI: 10.1002/bimj.201000069 137
responder analyses as a useful tool for reporting final results of clinical trials to a non-statisticalcommunity, while at the same time they recommend to avoid dichotomization on the data analysisstage. However, also among statisticians the role of responder analyses is controversially debated.Senn and Julious (2009) hold a position which is rather critical toward responder analyses. In ananswer to Senn (2003) Lewis (2004) adopts a position which is more in favor of this type of analyses.Issues in the causal interpretation of response in general depend on the sources of variation apartfrom treatment. As pointed out by Senn (2001, 2004), but also Kalow et al. (1998), to address thisquestion properly different designs like e.g. cross-over designs have to be chosen.
In many cases responder analyses are (ab)used to assess clinical relevance of study results after astatistical significant result has been obtained on the continuous variable of interest, refer alsoCPMP ‘Points to consider on Multiplicity Issues in Clinical Trials’ (2002) and CPMP ‘Guideline onthe Choice of the Non-Inferiority Margin’ (2006). For further insight into this topic refer e.g. thework of Kieser et al. (2004) and the references cited there.
As mentioned above, unbiased estimates of proportions of responders are not necessarilyavailable but in addition it is also known that on the subject level a substantial amount of subjects iserroneously classified as a responder, a non-responder, resp. These misclassifications may also resultin misleading further exploratory analyses, when the classification into responders and non-re-sponders is used to define subgroups.
However, responder analyses are nevertheless not only mentioned in the methodological guide-lines ICH E9 (1998), CPMP ‘Points to consider on Multiplicity Issues in Clinical Trials’ (2002), andCPMP ‘Guideline on the Choice of the Non-Inferiority Margin’ (2006), but also in several clinicalEMA guidelines. One example for the latter category of guidelines is the CPMP ‘Note for Guidanceon Clinical Investigation of medicinal products in the Treatment of Epileptic Disorders’ (2001),where a responder is a subject who has a reduction in seizure frequency of at least 50%. Accordingto this guideline, the responder criterion should even be used as the primary variable. An otherexample is the CPMP ‘Note for guidance on Clinical Investigation of Medicinal Products in theTreatment of Depression’ (2002), where for short-term clinical studies the improvement should alsobe expressed as the proportion of responders, where a 50% improvement on the usual rating scalesis accepted as a clinically relevant response. Refer also the CPMP ‘Guideline on Clinical In-vestigation of Medicinal Products used in Weight Control’ (2008), where responders are defined assubjects showing a weight loss of at least 10% at the end of a 12-month treatment period.
Further examples where responder definitions are introduced are the American College ofRheumatology (ACR) definitions of responders in clinical studies investigating rheumatism. Onepart of the ACR20 definition is the requirement that in three out of the five variables of physician’sand patient’s assessment of disease activity, patient’s assessment of pain and physical function, andlevels of an acute-phase reactant at least 20% improvement are present (ACR, 2007). A further areawhere responder definitions are used is the RECIST (Response Evaluation Criteria In Solid Tu-mors) criteria in oncology. According to these criteria a subject is defined as a (partial) responder,when the diameter of the sum of the largest diameter of the target lesions is 30% below the baselinevalue, refer Therasse et al. (2000).
In this paper the situation is considered, in which a subject is defined as a responder, if the (error-free) post-measurement has a value less than or equal to a certain fraction l, 0olo1, of the (error-free) pre-measurement. It is obvious that the pre-measurement and the post-measurement cannot beassumed to be independent. This setting is related to a situation considered by Irwig et al. (1990) inconnection with a work on exposure–response relations in an epidemiological setting.
However, although such responder definitions as mentioned above are used in practice the bias ofcorresponding estimates has to the knowledge of the author not yet been derived and discussed inthe statistical literature. One scope of this paper is to give further elucidation on the possible pitfallsthat may arise when responder definitions of the type mentioned above are introduced. In Section 3,the situation is described where exactly one pre- and exactly one post-measurement are availableand misclassifications on the subject level are addressed. In this section also an extension to the
138 M. Kunz: On responder analyses
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com
two-sample case is given. In Section 4, the more general situation is considered where more than onepre- or more than one post-measurements are available. The bias of several different non-parametric(or better naive) estimates is derived and compared.
In Section 5, as an add-on an asymptotically unbiased parametric estimate for the proportion ofresponders is introduced. In Section 6, the estimates for proportions of responders discussed in thispaper have been applied to data of a clinical study investigating premenstrual dysphoric disorder(PMDD).
2 Notations and model assumptions
Let for treatment t, t5 1, 2, subject i, i5 1, y, n; j5 1, 2; Xijt denote the observed jth pre-measurement, Tit the true (or error-free) pre-measurement, Yijt the observed jth post-measurement,and Uit the true (or error-free) post-measurement. It is assumed that the following relations hold:
Xijt ¼Tit1Eijt;
Yijt ¼Uit1Fijt:
It is assumed throughout this paper that the error terms Eijt, Fijt, resp., are all independentlyidentically distributed (i.i.d) normal with expectation 0 and standard deviation spre, spost, resp., i.e.
PEijt ¼ Nð0;s2preÞ, P
Fijt ¼ Nð0;s2postÞ. It is assumed in addition that the Tit are i.i.d. normal, i.e.
PTit ¼ Nðm;s21Þ and that the Uit are i.i.d. normal, i.e. PUit ¼ Nðnt;s2
2Þ. The correlation between Tit
and Uit is assumed to be r for all i, t.This implies that both the pre-measurements Xijt, the post-measurements Yijt, resp., are
Nðm;s211s2
preÞ-distributed, Nðnt;s221s2
postÞ-distributed, resp.Let F denote the distribution function of the standard normal distribution, i.e. F (x)5P(Xrx),
xAR for a standard normal random variable X.Let A1, A2, B1, B2 denote independent N(0, 1)-distributed random variables. Following usual
terminology capital letters denote random variables. E denotes the expectation, V the variance, ands the standard deviation of a random variable.
Definitions for responders using the terminology defined in this section will be given in thefollowing sections.
Throughout this paper, the terminology measurement error is used. It should be kept in mind thatthe model assumptions also cover the situation where a subject has an individual mean, e.g. for thepre-measurements, but what is observed is a random variation around this individual mean. Thismodeling is e.g. adequate when a psychometric scale is used, where it is known that there is a ratherlarge intra-individual variation, as also reflected in the example from a clinical study presented inthis paper.
The terminology is chosen such that the two-sample case can also be considered. However, inlarge parts of this paper only the one-sample case is considered. When the one-sample case isconsidered the corresponding index t is not necessary and is therefore omitted from the formulas.
3 Impact of the measurement error on misclassifications on the subject level
In this section, the impact of the measurement error on the bias is investigated. In a first step, theone-sample case is considered, while in a second step similar considerations are applied to the two-sample case.
Biometrical Journal 53 (2011) 1 139
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com
3.1 The one-sample case
The proportion of responders p, where p PðUi � l � TiÞ shall be estimated. First, the situation isconsidered where only one pre-measurement and only one post-measurement can be observed. Afirst (non-parametric) estimate pa for P(Uir l �Ti) could be the proportion of subjects for which therelation Yi1 � l � Xi1 holds, i.e.
pa
Pni¼1 1ðYi1�l�Xi1Þ
n:
Please note that subscript a is chosen to distinguish this estimate from other estimates considered insections below. Such an estimate is often chosen, as – due to the presence of the measurement error– only Yi1 � l � Xi1 can be observed, but it cannot be observed whether Uirl �Ti holds true.
It is obvious that there will be subjects, who are classified as responders based on the observedmeasurements Xi1 and Yi1, although they would have to be classified as non-responders, if the truemeasurements Ti and Ui were observed. Those subjects are usually defined as false positives, i.e.
FPa fYi1 � l � Xi1g \ fUi4l � Tig:
Analogously false negative, true positive, and true negative subjects are defined as
FNa fYi14l � Xi1g \ fUi � l � Tig;
TPa fYi1 � l � Xi1g \ fUi � l � Tig;
TNa fYi14l � Xi1g \ fUi4l � Tig:
Let
Z1 A1ðs2r� s1lÞ1s2A2
ffiffiffiffiffiffiffiffiffiffiffiffiffi1� r2
p� lspre B11spost B2;
Z2 A1ðs2r� s1lÞ1s2A2
ffiffiffiffiffiffiffiffiffiffiffiffiffi1� r2
p:
Remark 3.1.
(i) Z1 has the same distribution as the random variable Yi1� lXi1� (n� lm).(ii) Z2 has the same distribution as the random variable Ui� lTi� (n� lm).
Z�1 and Z�2 are defined as the standardized versions of Z1 and Z2, i.e. Z�1 ¼ Z1=sðZ1Þ andZ�2 ¼ Z2=sðZ2Þ. By definition Z�1 and Z�2 are N (0,1)-distributed. Straightforward calculationsestablish that the correlation c between Z�1 and Z�2 is c ¼ sðZ2Þ=sðZ1Þ. Note that for s(Z1) ands(Z2), the following relations hold:
sðZ1Þ ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðs2r� s1lÞ
21s22ð1� r2Þ1l2s2
pre1s2post
q;
sðZ2Þ ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðs2r� s1lÞ
21s22ð1� r2Þ
q:
These standard deviations depend on the unknown standard deviation s1 of the pre-measurement,the unknown standard deviation s2 of the post-measurement, the unknown correlation r betweenTi and Ui, the threshold l and the unknown standard deviations spre and spost of the measurementerrors.
Theorem 3.2. Using the notation introduced above and defining d n� lm the following holds:
(i) EðpaÞ � p ¼ F dsðZ2Þ
� �� F d
sðZ1Þ
� �(ii) pa has a positive bias, i.e. EðpaÞ � PðUi � lTiÞ40, d40:(iii) pa has a negative bias, i.e. EðpaÞ � PðUi � lTiÞo0, do0:(iv) pa has no bias, i.e. EðpaÞ ¼ PðUi � lTiÞ , d ¼ 0:
140 M. Kunz: On responder analyses
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com
Proof: Appendix.Theorem 3.2 (i) presents an explicit formula for the bias. This formula will usually be difficult to
use directly in practice, as the necessary standard deviations will often not be available. However,based on this formula the direction of the bias can immediately be derived, as the direction of thebias only depends on n, m, and l. It should be noted that n and m are much easier to estimate inpractice compared to the standard deviations, which may not be available when there is only onepre-measurement and only one post-measurement available.
It can also be seen from the formula that the bias is negative, when the true proportion ofresponders is greater than 0.5, while the bias is positive, when the true proportion of responders issmaller than 0.5. However, when applying these considerations on the direction of the bias to realdata one has to consider that for Theorem 3.2 one crucial assumption is that the measurement errorfollows a distribution which is symmetric around 0.
Theorem 3.2 is illustrated in Table 1 that displays the expected proportions of true responders, falsepositives (FPa), false negatives (FNa), and the bias (expected proportion of observed responders –expected proportion of true responders) for different threshold values l, different correlations betweenpre- and post-measurement r and different measurement errors s for the following scenario:
(i) Expectation of the pre-measurement Ti is EðTiÞ m ¼ 500.(ii) Expectation of the post-measurement Ui is EðUiÞ n ¼ 300.(iii) Standard deviations of pre-measurement and post-measurement are identicalffiffiffiffiffiffiffiffiffiffiffiffi
VðTiÞp
s1 ¼ffiffiffiffiffiffiffiffiffiffiffiffiVðUiÞ
ps2 ¼ 40.
Table 1 was created using standard SAS-functions.The standard deviations s of the measurement error have been chosen as s 2 f20; 28:3; 34:6g.
This corresponds to intraclass coefficients of correlation VðTiÞ=VðXi1Þ between Ti and Xi1 of 0.8,0.67, 0.57. Note that the standard deviation s of the measurement error is assumed to be equal forthe pre- and post-measurements.
In this example, it is obvious that there is a substantial amount of misclassifications on the subject level.This may be critical, when these classifications are the basis for further exploratory subgroup-analyses.
From the proof of Theorem 3.2 it can easily be seen that not only in this example, but also ingeneral the bias increases, when the standard deviation s of the measurement error increases.
The examples presented in this paper indicate that in many cases the bias is smaller when thecorrelation coefficient r between the pre- and the post-measurement is smaller, while at the same time theother parameters for the distributions to be considered are constant. However, as can also be seen fromFig. 1 for the case l50.7, s520 this is not case in general, i.e. even when s15s2 and spre5spost canbe assumed, there is no monotone dependency between the correlation coefficient r and the induced bias.
The focus of this paper is on the bias of corresponding estimates when conducting responderanalyses. In this paper, we are always considering the case where complete data are available,although this is often not the case in clinical research. In the context of a study that is designed tofulfill regulatory requirements the preferred analysis set is almost always the Full Analysis Set. Thisimplies that in order to follow the Intention-To-Treat(ITT)-principle all subjects should be classifiedas a responder or non-responder, i.e. also those that do not have a measurement of the continuousvariable of interest. In the statistical literature several imputation techniques are described thatcould be used to impute the underlying continuous variable of interest. An introduction to the topicof missing data in clinical trials is e.g. given by Carpenter and Kenward (2007). However, as theoften applied estimates of response rates are biased due to the measurement error, it cannot beexpected that the presence of another form of bias will cause the overall bias to vanish. To theknowledge of the author there is no literature available where it is investigated how large the biascan be when in addition to the measurement error also missing data for the continuous variable ofinterest are present. A deeper investigation of the bias that missing data introduce in addition incase a continuous variable is dichotomized could be the basis of the further work.
Biometrical Journal 53 (2011) 1 141
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com
Table 1 Expected proportions of false positives (P(FPa)), false negatives (P(FNa)), true proportionof responders, and the bias PðFPaÞ � PðFNaÞ ¼ EðpaÞ � p for different correlations r between Ti
and Ui, different standard deviations s of the measurement error, and different thresholds l.(EðTiÞ ¼ 500;EðUiÞ ¼ 300;
ffiffiffiffiffiffiffiffiffiffiffiffiVðTiÞ
p¼
ffiffiffiffiffiffiffiffiffiffiffiffiVðUiÞ
p¼ 40)
l r s P(FPa) P(FNa) P(FPa)1P(FNa) p EðpaÞ � p
0.5 0.1 20.0 0.057 0.028 0.086 0.122 0.029
28.3 0.087 0.035 0.122 0.122 0.052
34.6 0.110 0.038 0.148 0.122 0.072
0.3 20.0 0.059 0.025 0.084 0.100 0.033
28.3 0.090 0.030 0.121 0.100 0.060
34.6 0.115 0.033 0.148 0.100 0.082
0.5 20.0 0.059 0.021 0.080 0.074 0.038
28.3 0.093 0.025 0.118 0.074 0.069
34.6 0.120 0.026 0.146 0.074 0.094
0.7 20.0 0.058 0.014 0.072 0.046 0.043
28.3 0.095 0.017 0.112 0.046 0.078
34.6 0.124 0.018 0.142 0.046 0.107
0.9 20.0 0.051 0.006 0.058 0.017 0.045
28.3 0.092 0.007 0.099 0.017 0.085
34.6 0.125 0.007 0.132 0.017 0.118
0.6 0.1 20.0 0.077 0.077 0.154 0.500 0.000
28.3 0.101 0.101 0.203 0.500 0.000
34.6 0.117 0.117 0.234 0.500 0.000
0.3 20.0 0.084 0.084 0.168 0.500 0.000
28.3 0.110 0.110 0.219 0.500 0.000
34.6 0.126 0.126 0.252 0.500 0.000
0.5 20.0 0.094 0.094 0.188 0.500 0.000
28.3 0.121 0.121 0.241 0.500 0.000
34.6 0.137 0.137 0.273 0.500 0.000
0.7 20.0 0.108 0.108 0.216 0.500 0.000
28.3 0.136 0.136 0.271 0.500 0.000
34.6 0.151 0.151 0.303 0.500 0.000
0.9 20.0 0.133 0.133 0.265 0.500 0.000
28.3 0.159 0.159 0.318 0.500 0.000
34.6 0.173 0.173 0.346 0.500 0.000
0.7 0.1 20.0 0.032 0.062 0.094 0.859 �0.029
28.3 0.040 0.092 0.132 0.859 �0.053
34.6 0.044 0.116 0.159 0.859 �0.072
0.3 20.0 0.029 0.064 0.093 0.887 �0.036
28.3 0.035 0.098 0.133 0.887 �0.063
34.6 0.038 0.123 0.161 0.887 �0.086
0.5 20.0 0.023 0.066 0.089 0.920 �0.043
28.3 0.027 0.103 0.130 0.920 �0.077
34.6 0.029 0.132 0.161 0.920 �0.103
0.7 20.0 0.013 0.065 0.078 0.960 �0.052
28.3 0.015 0.107 0.122 0.960 �0.092
34.6 0.016 0.139 0.155 0.960 �0.124
0.9 20.0 0.002 0.051 0.053 0.995 �0.049
28.3 0.002 0.100 0.102 0.995 �0.098
34.6 0.002 0.138 0.140 0.995 �0.136
142 M. Kunz: On responder analyses
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com
3.2 The two-sample case
In the previous section the bias of an estimate for the one-sample case was derived. In this section anextension to the two-sample case is given. As a generalization of p defined in the section before letpt PðUit � l � TitÞ denote the proportion of responders in treatment group t and
pat
Pni¼1 1ðYi1t�l�Xi1tÞ
n
the corresponding estimate. From Theorem 3.2, it can easily be seen after defining dt nt � lm thatunder the general assumptions made in this paper the following holds for the bias for the differenceof proportions of responders under treatment 1 and treatment 2:
Eðpa1Þ � Eðpa2Þ � ðp1 � p2Þ ¼ Fd1
sðZ2Þ
� �� F
d1
sðZ1Þ
� �
� Fd2
sðZ2Þ
� �� F
d2
sðZ1Þ
� �� �:
ð1Þ
Formula (1) is illustrated in Table 2 that presents for treatment t 2 f1; 2g the expected proportionof true responders for treatment t, the bias for treatment t (expected proportion of observed
-0.15
-0.1
-0.05
0
0.05
0.1
0.15
0.1 0.3 0.5 0.7 0.9
Correlation Coefficient ρ
Bia
s
λ=0.5; σ=20
λ=0.5; σ=28.3
λ=0.5; σ=34.6
λ=0.7; σ=20
λ=0.7; σ=28.3
λ=0.7; σ=34.6
Figure 1 Bias for the scenarios displayed in Table 1, i.e. expectation of the pre-mea-surement Ti is EðTiÞ m ¼ 500, expectation of the post-measurement Ui isEðUiÞ n ¼ 300, standard deviations of pre-measurement and post-measurement areidentical
ffiffiffiffiffiffiffiffiffiffiffiffiVðTiÞ
p¼
ffiffiffiffiffiffiffiffiffiffiffiffiVðUiÞ
p¼ 40, s denotes the standard deviation of the measurement
error, which is assumed to be equal for pre- and post-measurements.
Biometrical Journal 53 (2011) 1 143
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com
responders for treatment t – expected proportion of true responders for treatment t), and the biasfor the difference of true proportions of responders ðEðpa1Þ � Eðpa2Þ � ðp1 � p2ÞÞ for differentcorrelations r, different expectations n2 of treatment 2 and different measurement errors s for thefollowing scenario:
(i) Expectation of the pre-measurement Tit is EðTi1Þ ¼ EðTi2Þ m ¼ 500.(ii) Expectation of the post-measurement Ui1 of treatment 1 is EðUi1Þ n1 ¼ 280.(iii) Standard deviations of pre-measurement and post-measurement are all identicalffiffiffiffiffiffiffiffiffiffiffiffiffi
VðTi1Þp
¼ffiffiffiffiffiffiffiffiffiffiffiffiffiVðTi2Þ
p¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiVðUi1Þ
p¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiVðUi2Þ
ps1 ¼ 40.
(iv) Threshold l5 0.6.
Table 2 Expected true proportion of responders for treatments 1 and 2, p1 and p2, the biasesEðpa1Þ � p1 and Eðpa2Þ � p2, and the bias for the difference of true proportion responders Eðpa1Þ �Eðpa2Þ � ðp1 � p2Þ for different correlations r, expectations n2 of treatment 2 and different standarddeviations of the measurement error. (EðTi1Þ ¼ EðTi2Þ ¼ 500;
ffiffiffiffiffiffiffiffiffiffiffiffiffiVðTi1Þ
p¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiVðTi2Þ
p¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiVðUi1Þ
p¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffi
VðUi2Þp
¼ 40. Standard deviations of all measurement errors equal.)
r n2 s(Eijt) p1 Eðpa1Þ � p1 p2 Eðpa2Þ � p2 p1� p2 Eðpa1Þ � Eðpa2Þ � ðp1 � p2Þ
0.1 240 20.0 0.673 �0.019 0.911 �0.027 �0.238 0.009
30.0 0.673 �0.035 0.911 �0.056 �0.238 0.020
260 20.0 0.673 �0.019 0.815 �0.029 �0.142 0.010
30.0 0.673 �0.035 0.815 �0.055 �0.142 0.020
300 20.0 0.673 �0.019 0.500 0.000 0.173 �0.019
30.0 0.673 �0.035 0.500 0.000 0.173 �0.035
320 20.0 0.673 �0.019 0.327 0.019 0.347 �0.037
30.0 0.673 �0.035 0.327 0.035 0.347 �0.071
0.3 240 20.0 0.691 �0.024 0.933 �0.031 �0.242 0.006
30.0 0.691 �0.045 0.933 �0.063 �0.242 0.018
260 20.0 0.691 �0.024 0.841 �0.035 �0.150 0.011
30.0 0.691 �0.045 0.841 �0.067 �0.150 0.022
300 20.0 0.691 �0.024 0.500 0.000 0.191 �0.024
30.0 0.691 �0.045 0.500 0.000 0.191 �0.045
320 20.0 0.691 �0.024 0.309 0.024 0.383 �0.049
30.0 0.691 �0.045 0.309 0.045 0.383 �0.090
0.5 240 20.0 0.717 �0.034 0.957 �0.034 �0.240 0.000
30.0 0.717 �0.060 0.957 �0.070 �0.240 0.010
260 20.0 0.717 �0.034 0.874 �0.045 �0.157 0.011
30.0 0.717 �0.060 0.874 �0.083 �0.157 0.024
300 20.0 0.717 �0.034 0.500 0.000 0.217 �0.034
30.0 0.717 �0.060 0.500 0.000 0.217 �0.060
320 20.0 0.717 �0.034 0.283 0.034 0.434 �0.067
30.0 0.717 �0.060 0.283 0.060 0.434 �0.119
0.7 240 20.0 0.756 �0.051 0.981 �0.034 �0.225 �0.017
30.0 0.756 �0.086 0.981 �0.074 �0.225 �0.011
260 20.0 0.756 �0.051 0.917 �0.058 �0.161 0.007
30.0 0.756 �0.086 0.917 �0.106 �0.161 0.021
300 20.0 0.756 �0.051 0.500 0.000 0.256 �0.051
30.0 0.756 �0.086 0.500 0.000 0.256 �0.086
320 20.0 0.756 �0.051 0.244 0.051 0.512 �0.102
30.0 0.756 �0.086 0.244 0.086 0.512 �0.171
144 M. Kunz: On responder analyses
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com
Table 2 was created using standard SAS-functions. The standard deviations of the measurementerror have been chosen as s 2 f20; 30g. r has been chosen as r 2 f0:1; 0:3; 0:5; 0:7g. The expectationn2 of treatment 2 was chosen as n2 2 f240; 260; 300; 320g.
It is obvious that the absolute value of the bias in the two sample case, i.e. jEðpa1Þ � Eðpa2Þ �ðp1 � p2Þj is larger than each single absolute bias in the two treatment groups, if the biases Eðpa1Þ �p1 and Eðpa2Þ � p2 have different signs. However, Eðpa1Þ � Eðpa2Þ � ðp1 � p2Þ may be positive ornegative even if d1 and d2 have the same sign, as can be seen from Table 2 and Fig. 2.
4 A comparison of several non-parametric estimates for the proportion of
responders
The most important factor that contributes to the bias is the standard deviation of themeasurement error. In principle, this measurement error could be minimized by not havingonly one but several pre- and post-measurements within one subject and by definingestimates for the proportion of responders in analog to pa by taking for each subject themean of all available pre-, post-measurements, resp. However, in clinical practice it will not alwaysbe realistic to have several repeats of measurements within one subject. In the following theorem,the bias of four different estimates pa; pb; pc, and pd for the probability p ¼ PðUi � l � TiÞ iscompared:
(i) One pre-measurement, one post-measurement available. ~Xai Xi1 and ~Ya
i Yi1.(ii) Two pre-measurements, one post-measurement available. ~Xb
i ðXi11Xi2Þ=2 and~Ybi Yi1.
(iii) One pre-measurement, two post-measurements available. ~Xci Xi1 and ~Yc
i ðYi11Yi2Þ=2.(iv) Two pre-measurement, two post-measurements available. ~Xd
i ðXi11Xi2Þ=2 and~Ydi ðYi11Yi2Þ=2.
Theorem 4.1. Let spre ¼ spost s denote the measurement error of the pre- and post-mea-surements. Let d n� lm. Let
pb ¼
Pni¼1 1ð ~Yb
i�l� ~Xb
iÞ
n;
pc ¼
Pni¼1 1ð ~Yc
i�l� ~Xc
iÞ
n;
pd ¼
Pni¼1 1ð ~Yd
i�l� ~Xd
iÞ
n:
The following relations hold for the bias introduced by the estimates pa � pd :
(i) EðpbÞ � p ¼ F �d=
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffis2ðZ2Þ1
l2
211
� �s2
r� �� p
(ii) EðpcÞ � p ¼ F �d=ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffis2ðZ2Þ1 l21 1
2
s2
q� �� p
(iii) Eðpd Þ � p ¼ F �d=ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffis2ðZ2Þ1
12ð11l2Þs2
q� �� p
(iv) jEðpaÞ � pj � jEðpbÞ � pj � jEðpcÞ � pj � jEðpdÞ � pj:
Biometrical Journal 53 (2011) 1 145
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com
Proof: AppendixRemark 4.2
(i) From the proof, it can easily be seen that the inequalities in (iv) above are strict forn – lm 6¼ 0.
(ii) It should be noted that it is only assumed that the standard deviations of the measurementerrors are equal but not that the standard deviations of Ti and Ui are equal.
The bias of the estimates pb; pc; pd that use the measurements obtained from morethan exactly one pre- and one post-measurements can be derived using the same techniquesas were used to derive the bias of pa. The inequality in Theorem 4.1 (iv) shows that the biasdecreases to a greater extent when exactly one pre- and two post-measurements are availablecompared to the situation, when exactly two pre-measurements and exactly on post-measurementare available.
Bias for the two-sample case
-0.2
-0.15
-0.1
-0.05
0
0.05
0.1 0.3 0.5 0.7
Correlation ρ
Bia
s
E=240, σ=20
E=240, σ=30
E=260, σ=20
E=260, σ=30
E=300, σ=20
E=300, σ=30
E=320, σ=20
E=320, σ=30
Figure 2 Bias for the scenarios displayed in Table 2, i.e. expectation of the pre-mea-surement Tit is EðTi1Þ ¼ EðTi2Þ m ¼ 500; expectation of the post-measurement Ui1 oftreatment 1 is EðUiÞ n1 ¼ 280; standard deviations of pre-measurement and post-measurement are all identical
ffiffiffiffiffiffiffiffiffiffiffiffiffiVðTi1Þ
p¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiVðTi2Þ
p¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiVðUi1Þ
p¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiVðUi2Þ
p¼ 40;
threshold l5 0.6. E ¼ EðUi2Þ n2 denotes the expectation of post-treatment undertreatment 2. s denotes the standard deviation of the measurement error.
146 M. Kunz: On responder analyses
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com
Theorem 4.1 is illustrated in Table 3 that shows the bias of the four estimates investigated inTheorem 4.1 for different correlations r between Ti and Ui, different standard deviations s, anddifferent thresholds l. The assumptions on the means and standard deviations are the same as forTable 4, i.e. m5 500, n5 300, s1 5s2 5 40.
5 An asymptotically unbiased ML-type estimate for the proportion of
responders
In this section an ML-type approach to calculate the proportion of responders p at least asymp-totically unbiased if two pre- and two post-measurements per subject are available is presented. Theapproach is rather general in the sense that it is neither a prerequisite that the standard deviations ofTi and Ui are equal, nor is it a prerequisite that the standard deviations of the measurement errors ofEij and Fij are equal.
Theorem 5.1. Let �Xi ðXi11Xi2Þ=2, �Yi ðYi11Yi2Þ=2 denote the individual arithmetic mean ofthe pre-, post-measurements, resp. All variables are assumed to be normally distributed. Let
m1
n
Xni¼1
�Xi; ð2Þ
n1
n
Xni¼1
�Yi; ð3Þ
s2pre
1
2
1
n
Xni¼1
ðXi2 � Xi1Þ2;
s2post
1
2
1
n
Xni¼1
ðYi2 � Yi1Þ2;
s21
1
n� 1
Xni¼1
ð �Xi � mÞ2 �1
2s2pre; ð4Þ
s22
1
n� 1
Xni¼1
ð �Yi � nÞ2 �1
2s2post; ð5Þ
r max �1; min 1;1
n�1
Pni¼1 ð
�Xi � mÞð �Yi � nÞs1s2
!� 1fs2
140g\fs2
240g
!( ); ð6Þ
mML Flm� nffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðs2r� s1lÞ21s2
2ð1� r2Þq
0B@
1CA:
The following relation holds:
limn!1
mML ¼ PðUi � lTiÞ:
Proof: AppendixRemark 5.2
(i) The estimates s21 and s2
2 are unbiased estimates for s21 and s2
2 but are not 40 a.s. fornoN.
Biometrical Journal 53 (2011) 1 147
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com
Table 3 Bias for different estimates of the proportion of true responders p for different correlationsr between Ti and Ui, different standard deviations s of the measurement error, and differentthresholds l. (EðTiÞ ¼ 500;EðUiÞ ¼ 300;
ffiffiffiffiffiffiffiffiffiffiffiffiVðTiÞ
p¼
ffiffiffiffiffiffiffiffiffiffiffiffiVðUiÞ
p¼ 40)
l r s p EðpaÞ � p EðpbÞ � p EðpcÞ � p Eðpd Þ � p
0.5 0.1 20.0 0.122 0.029 0.026 0.018 0.015
28.3 0.122 0.052 0.048 0.034 0.029
34.6 0.122 0.072 0.066 0.048 0.041
0.3 20.0 0.100 0.033 0.030 0.021 0.017
28.3 0.100 0.060 0.055 0.039 0.033
34.6 0.100 0.082 0.075 0.055 0.047
0.5 20.0 0.074 0.038 0.035 0.024 0.020
28.3 0.074 0.069 0.063 0.045 0.038
34.6 0.074 0.094 0.087 0.063 0.054
0.7 20.0 0.046 0.043 0.039 0.027 0.023
28.3 0.046 0.078 0.072 0.051 0.043
34.6 0.046 0.107 0.099 0.072 0.062
0.9 20.0 0.017 0.045 0.041 0.027 0.022
28.3 0.017 0.085 0.078 0.054 0.045
34.6 0.017 0.118 0.109 0.078 0.066
0.6 0.1 20.0 0.500 0.000 0.000 0.000 0.000
28.3 0.500 0.000 0.000 0.000 0.000
34.6 0.500 0.000 0.000 0.000 0.000
0.3 20.0 0.500 0.000 0.000 0.000 0.000
28.3 0.500 0.000 0.000 0.000 0.000
34.6 0.500 0.000 0.000 0.000 0.000
0.5 20.0 0.500 0.000 0.000 0.000 0.000
28.3 0.500 0.000 0.000 0.000 0.000
34.6 0.500 0.000 0.000 0.000 0.000
0.7 20.0 0.500 0.000 0.000 0.000 0.000
28.3 0.500 0.000 0.000 0.000 0.000
34.6 0.500 0.000 0.000 0.000 0.000
0.9 20.0 0.500 0.000 0.000 0.000 0.000
28.3 0.500 0.000 0.000 0.000 0.000
34.6 0.500 0.000 0.000 0.000 0.000
0.7 0.1 20.0 0.859 �0.029 �0.025 �0.020 �0.016
28.3 0.859 �0.053 �0.046 �0.038 �0.029
34.6 0.859 �0.072 �0.063 �0.053 �0.042
0.3 20.0 0.887 �0.036 �0.030 �0.025 �0.019
28.3 0.887 �0.063 �0.055 �0.045 �0.036
34.6 0.887 �0.086 �0.075 �0.063 �0.050
0.5 20.0 0.920 �0.043 �0.037 �0.030 �0.023
28.3 0.920 �0.077 �0.067 �0.055 �0.043
34.6 0.920 �0.103 �0.091 �0.077 �0.061
0.7 20.0 0.960 �0.052 �0.044 �0.035 �0.027
28.3 0.960 �0.092 �0.080 �0.066 �0.052
34.6 0.960 �0.124 �0.109 �0.092 �0.073
0.9 20.0 0.995 �0.049 �0.040 �0.031 �0.022
28.3 0.995 �0.098 �0.083 �0.066 �0.049
34.6 0.995 �0.136 �0.119 �0.098 �0.075
148 M. Kunz: On responder analyses
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com
(ii) The estimates s2pre and s2
post are unbiased estimates for s2pre and s2
post.(iii) If s1 5s2 or spre 5spost can be assumed, more efficient estimates are possible.(iv) This approach does not allow to classify subjects on the subject level into responders or
non-responders.(v) The definition of r was chosen to ensure that r takes values between �1 and 1.
The estimate for the proportion of responders using the ML-type estimator introduced in The-orem 5.1 is asymptotically unbiased. A small simulation study was conducted to assess how fast theexpectation of ML-type estimator converges to the expected proportion of true responders.The simulation results for the situation considered in Table 1, i.e. m5 500, n5 300, s1 5s2 5 40 arepresented for a study with n5 10 subjects, 10 000 studies simulated. From these results,it can be concluded that the bias of the ML-type-estimate is rather small already for smallsample sizes.
6 Example: Illustration of responder analyses using a premenstrual
dysphoric disorder study
PMDD is a medical disorder characterized by debiliating mood and behavioral changes, and fre-quently, somatic complaints in the week preceeding menstruation. Yonkers et al. (2005) reportthe results of a parallel study of three cycles of treatment with an oral contraceptive or placebo.
Table 4 Expectation of the ML-type estimate pML, its standard error seðpMLÞ based on simulation,expectation of the true proportion of responders p and the corresponding theoretical standard errorse(p). (m5 500, n5 300, s1 5s2 5 40.)
l r s pML seðpMLÞ p se(p)
0.5 0.1 20.0 0.126 0.087 0.122 0.103
28.3 0.123 0.092 0.122 0.103
0.3 20.0 0.104 0.079 0.100 0.095
28.3 0.104 0.087 0.100 0.095
0.5 20.0 0.081 0.069 0.074 0.083
28.3 0.084 0.077 0.074 0.083
0.7 20.0 0.056 0.057 0.046 0.066
28.3 0.060 0.066 0.046 0.066
0.6 0.1 20.0 0.500 0.148 0.500 0.158
28.3 0.498 0.164 0.500 0.158
0.3 20.0 0.504 0.150 0.500 0.158
28.3 0.500 0.171 0.500 0.158
0.5 20.0 0.503 0.158 0.500 0.158
28.3 0.497 0.182 0.500 0.158
0.7 20.0 0.502 0.170 0.500 0.158
28.3 0.500 0.197 0.500 0.158
0.7 0.1 20.0 0.856 0.094 0.859 0.110
28.3 0.858 0.099 0.859 0.110
0.3 20.0 0.883 0.084 0.887 0.100
28.3 0.884 0.092 0.887 0.100
0.5 20.0 0.915 0.072 0.920 0.086
28.3 0.913 0.080 0.920 0.086
0.7 20.0 0.950 0.055 0.960 0.062
28.3 0.944 0.064 0.960 0.062
Biometrical Journal 53 (2011) 1 149
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com
The primary target variable in this study is the DRSP (Daily Record of Severity of Problems)-score.The DRSP-score is a widely used psychometric scale for this disorder consisting of 21 individualitems. Each individual item had to be rated on a discrete six-point scale from 1 to 6, while smallvalues represent a more favorable health state compared to higher values. The questionaire was tobe completed every day, both during the pre-treatment- and the treatment cycles. To calculate theDRSP-score for each cycle, each of the 21 individual items was first averaged over the last 5 daysbefore menses and then the averages for the 21 individual items were summed up.
We are using the placebo data of this study to determine the proportion of responders whenapplying the four estimates for proportions of responders introduced in Sections 3 and 4 and theML-type estimate introduced in Section 5. Study participants had to record the DRSP-score first inup to three pre-treatment cycles. The results of the pre-treatment cycles were used to establish thediagnosis of PMDD. Subjects diagnosed with PMDD were eligible for randomization to threecycles of treatment (oral contraceptive or placebo). We are restricting the analysis to complete casesin the placebo group (n5 130), i.e. cases who had a DRSP-score in both pre-treatment cycles andtreatment cycle 2 and treatment cycle 3. We are in the following denoting these measurements aspre-measurement 1, pre-measurement 2, post-measurement 1, post-measurement 2, resp. Table 5shows some characteristics of the placebo-data to be analyzed, while a graphical display is given inFig. 3. It is obvious that the means for pre-measurement 1 and pre-measurement 2 do not differmuch. However, there is a rather high standard deviation for the difference pre-measurement 1minus pre-measurement 2, indicative for a large intra-individual variation, which was identified inthis paper as an important source of bias. In the terminology introduced in the previous sections thestandard deviation of 16.9 for the difference pre-measurement 1 minus pre-measurement 2 has to bedivided by
ffiffiffi2p
to obtain the estimate 12.0 for the standard deviation of the measurement error.Similar considerations hold true for post-measurement 1 and post-measurement 2.
A responder is defined as a subject who has a post-treatment value less than 50% of the baselinevalue. This responder definition is in line with a recently released CPMP draft guidance ontreatment of PMDD (2010). Table 6 presents the results of the different estimates introduced in thispaper using the response criterion of a 50% reduction of the DRSP-score.
In this example, it is obvious that there is a remarkable difference between the proportion ofresponders. As suggested from Theorem 4.1, the proportion of responders shows a declinepa4pb4pc4pd . All estimates pa; pb; pc, and pd are larger than the ML-type estimate. However, theresult for the ML-type estimate should also be judged with care, as for this estimate it has to beassumed that the data are normally distributed, which may not necessarily be the case for the post-measurement data.
7 Discussion
Responder analyses are a type of analysis that on the surface seems to be very easy to interpret, butwhen one digs deeper are potentially very misleading. As also mentioned by Oppenheimer et al.
Table 5 DRSP-scores in a PMDD-study (n5 130).
Measurement Mean Standard deviation
Pre-measurement 1 78.6 19.1
Pre-measurement 2 80.0 19.4
Pre-measurement 1–Pre-measurement 2 �1.5 16.9
Post-measurement 1 47.5 22.7
Post-measurement 2 47.3 24.8
Post-measurement 1–Post-measurement 2 0.2 19.1
150 M. Kunz: On responder analyses
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com
(1999) and Senn (2004) these analyses have to be set in comparison to the negative statisticalproperties of such an approach.
In this paper the situation is considered, in which a subject is defined as a responder, if the(error-free) post-measurement has a value less than or equal to a certain fraction l, 0olo1, of the(error-free) pre-measurement when all continuous variables considered are assumed as normallydistributed. This explicit setting of a responder definition is to the knowledge of the author not yetpresent in the statistical literature.
For the conclusions on the sign of the bias it is important to keep in mind that it is alwaysassumed that the random variables considered, including the measurement errors, are normallydistributed.
One result obtained in this paper in the setting of a normal distributed measurement error is thatalso here considerable bias is present when canonical non-parametric (naive) estimates for pro-portions of responders are used. Especially, if the true proportion of responders is rather close to 0or rather close to 1 the bias can be substantial, while the bias is less pronounced when the trueproportion of responders is close to 0.5. However, this only holds true for the bias itself. Whenconsidering the subject level even a larger proportion of subjects is misclassified when the trueproportion of responders is close to 0.5 compared to the situation when the true proportion ofresponders is rather close to 0 or rather close to 1. Therefore especially when the observed pro-portion of responders is close to 0.5, the classification into responders and non-responders shouldnot form the basis of further exploratory subgroup analyses, as e.g. in the example presented in thispaper around 30% of all subjects may be misclassified.
Table 6 Proportions of subjects showing at least a 50% reduction in DRSP-score using differentresponder estimates, including the estimates for the calculation of the ML-type estimate accordingto Theorem 5.1.
pa pb pc pd ML-type estimate m n s2pre s2
post s21 s2
2 r
46.9 44.6 40.8 40.0 34.2 79.3 47.4 142.4 181.2 228.1 383.6 0.27
Figure 3 Summary results for DRSP-scores obtained in a PMDD-study.
Biometrical Journal 53 (2011) 1 151
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com
Focus of this paper was on the one-sample case. However, the two-sample case was also touched.It can easily be seen that if the true proportion of responders is above 50% in one sample while thetrue proportion of responders is below 50% in the other sample the biases introduced in the twosamples have different signs. It was shown via an example that even in case of normally distributedmeasurement errors there are situations where the expected proportion of responders is larger than50% in both treatment groups, but the bias for the difference of proportions may be positive in onesetting, but negative in another.
By having not one but several pre-measurements or having not one but several post-measure-ments the variance of the measurement error can be reduced. Therefore, it is not surprising that thebias of the resulting non-parametric estimate for the proportion of responders is smaller. It wasestablished that in the situation where the measurement errors of the pre- and post-measurementsare similar and only one pre-measurement and only one post-measurement is available the additionof exactly one further post-measurement reduces the bias to a greater extent than the addition ofexactly one further pre-measurement.
The results presented in this paper may e.g. be of interest, when comparing proportionsof responders obtained in several different studies. If such comparisons are intended it isrecommended to carefully assess whether the responder definitions are really identical orwhether the measurement error is reduced through several measurements. In connectionwith this it may also be advisable to assess whether the measurement errors in the two studies arecomparable. This may especially be advisable for small and large proportions of responders ob-served, as in these cases a small variation in the measurement error does not only affect the pointestimate itself, but also markedly increases the variance of the point estimate for the proportion ofresponders.
In addition an asymptotically unbiased ML-type estimate for the proportion of responders whentwo pre-measurements and two post-measurements within one subject are available was presented.
Acknowledgements The author is grateful to two anonymous referees and one anonymous Associate Editorfor helpful comments on a first version of this article.
Conflict of Interest
The PMDD-study reported in Section 6 was conducted by Berlex Inc. Today Berlex Inc. belongs toBayer HealthCare AG. The author is an employee of Bayer Schering Pharma AG, which belongs to BayerHealthCare AG.
Appendix : Proofs
Proof of Theorem 3.2. For the proof of (i) consider the following equation:
PðYi1 � lXi1Þ � p
¼PðYi1 � lXi1 � 0Þ � PðUi � lTi � 0Þ
¼PðZ1 � �dÞ � PðZ2 � �dÞ
¼Fd
sðZ2Þ
� �� F
d
sðZ1Þ
� �:
(ii) – (iv) follow then immediately from (i) and from s(Z2)os(Z1).Proof of Theorem 4.1. In analogy to the definitions of Z1 and Z2 introduced in Section 3 the
following random variables are defined:
152 M. Kunz: On responder analyses
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com
Z1;a Z1;
Z1;b A1ðs2r� s1lÞ1s2A2
ffiffiffiffiffiffiffiffiffiffiffiffiffi1� r2
p�
lffiffiffi2p sB11sB2;
Z1;c A1ðs2r� s1lÞ1s2A2
ffiffiffiffiffiffiffiffiffiffiffiffiffi1� r2
p� lsB11
1ffiffiffi2p sB2;
Z1;d A1ðs2r� s1lÞ1s2A2
ffiffiffiffiffiffiffiffiffiffiffiffiffi1� r2
p1
1ffiffiffi2p ð�lsB11sB2Þ;
Z2;a Z2;b Z2;c :¼Z2;d
A1ðs2r� s1lÞ1s2A2
ffiffiffiffiffiffiffiffiffiffiffiffiffi1� r2
p:
The standard deviations sðZa1Þ, sðZ
b1Þ, sðZc
1Þ, sðZd1 Þ resp. of Za
1, Zb1, Z
c1, and Zd
1 resp. are asfollows:
sðZa1Þ ¼ sðZ1Þ;
sðZb1Þ ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðs2r� s1lÞ
21s22ð1� r2Þ1
l2
211
� �s2
s;
sðZc1Þ ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðs2r� s1lÞ
21s22ð1� r2Þ1 l21
1
2
� �s2
s;
sðZd1 Þ ¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðs2r� s1lÞ
21s22ð1� r2Þ1
1
2ð11l2Þs2
r:
(i)–(iii) follow then immediately from the deviation of these standard deviations keeping in mindthat Z1;A has the same distribution as the random variable ~YA
i � l ~XAi � d for A 2 fa; b; c; dg.
It is obvious that
sðZ1;aÞ4sðZ1;bÞ4sðZ1;cÞ4sðZ1;dÞ:
(iv) As an example it is established that jEðpaÞ � pj � jEðpbÞ � pj. Without loss of generality, it canbe assumed that the bias is positive, i.e. that d40:
EðpaÞ � p� ðEðpbÞ � pÞ
¼ EðpaÞ � EðpbÞ
¼ P Z1;a � �d
� P Z1;b � �d
¼ Fd
sðZ1;bÞ
� �� F
d
sðZ1;aÞ
� �40:
Proof of Theorem 5.1. Straightforward calculations establish that Eðs2preÞ ¼ s2
pre andEðs2
postÞ ¼ s2post.
As Ti and Eij are independent it follows from �Xi ¼ Ti1ðEi11Ei2Þ=2 that
Varð �XiÞ ¼ VarðTiÞ11
2VarðEi1Þ ¼ s2
111
2s2pre:
As
E1
n� 1
Xn
i¼1ð �Xi � mÞ2
� �¼ Varð �XiÞ
Biometrical Journal 53 (2011) 1 153
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com
it follows that E ðs21Þ ¼ VarðTiÞ. The proof that E ðs2
2Þ ¼ s22 is similar.
limn!1 EðrÞ ¼a:s:
r follows immediately from Covð �Xi; �YiÞ ¼ CovðTi;UiÞ.From
PðUi � lTiÞ ¼ Flm� nffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ðs2r� s1lÞ21s2
2ð1� r2Þq
0B@
1CA
and as equations (2) – (6) establish at least asymptotically unbiased estimates for m, n, s21, s
22, and r,
Theorem 5.1 follows.
References
American College of Rheumatology (2007). A proposed revision to the ACR20: the hybrid measure ofAmerican College of Rheumatology Response. Arthritis and Rheumatism (Arthritis Care and Research)57, 193–202.
Carpenter, J. R. and Kenward, G. M. (2007). Missing data in randomised controlled trials – a practical guide.www.missingdata.org.uk
CPMP/EWP/566/98 Rev. 1 (2001) Note for guidance on clinical investigation of medicinal products in thetreatment of epileptic disorders. EMA, London. www.ema.europa.eu
CPMP/EWP/518/97 Rev. 1 (2002). Note for guidance on clinical investigation of medicinal products in thetreatment of depression. EMA, London. www.ema.europa.eu
CPMP/EWP/908/99 (2002). Points to consider on multiplicity issues in clinical trials. EMA: London.www.ema.europa.eu
CPMP/EWP/2158/99 (2006). Guideline on the choice of the non-inferiority margin. EMA: London. www.ema.europa.eu
CPMP/EWP/281/96 Rev. 1 (2008). Guideline on clinical investigation of medicinal products used in weightcontrol. EMA, London. www.ema.europa.eu
CPMP/EWP/607022/2009 (2010). Guideline on the treatment of premenstrual dysphoric disorder (PMDD) –draft. EMA: London. www.ema.europa.eu
Cochran, W. G. (1968). The errors of measurement in statistics. Technometrics 10, 637–666.Fedorov, V., Mannino, F. and Zhang, R. (2009). Consequences of dichotomization. Pharmaceutical Statistics
8, 50–61Fleiss, J. L. (1986). The Design of Measurements in Statistics. Wiley, New York.Goldberg, J. D. (1975). The effects of misclassification on the bias in the difference between two
proportions and the relative odds in the fourfold table. Journal of the American Statistical Association 70,561–567.
ICH (1998). E9 – Statistical Principles for Clinical Trials. www.ich.orgIrwig, L. M., Groeneveld, H. T. and Simpson, J. M. (1990). Correcting for measurement error in an exposure-
response relationship based on dichotomising a continuous dependent variable. Australian Journal ofStatistics 32, 261–269
Kalow, W., Tang, B. K. and Endrenyi, L. (1998). Hypothesis: comparisons of inter- and intra-individualvariations can substitute for twin studies in drug research. Pharmacogenetics 8, 50–61
Kieser, M., Rohmel, J. and Friede, T. (2004). Power and sample size determination when assessing the clinicalrelevance of trial results by responder analyses. Statistics in Medicine 23, 3287–3305.
Lewis, J. A. (2004). In defence of dichomtomy. Pharmaceutical Statistics 3, 77–79Newell, D. J. (1964). Errors in the interpretation of errors in epidemiology. American Journal of Public Health
54, 598–602.Oppenheimer, L. and Kher, U. (1999). The impact of measurement error on the comparison of two treatments
using a responder analysis. Statistics in Medicine, 18, 2177–2188.Senn, S. (2001). Individual therapy: New dawn or false dawn? Drug Information Journal 35, 1479–1494Senn, S. (2003). Disappointing Dichotomies. Pharmaceutical Statistics 2, 239–240Senn, S. (2004). Individual response to treatment: is it a valid assumption? British Medical Journal, 329,
966–968
154 M. Kunz: On responder analyses
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com
Senn, S. and Julious, S. (2009). Measurement in clinical trials: a neglected issue for statisticians? Statistics inMedicine 28, 3189–3209
Therasse, P., Arbuck, S. G., Eisenhaver, E. A., Wanders, J., Kaplan, R. S., Rubinstein, L., Verweij, J., VanGlabbeke, M., Van Oosterom, A. T., Christian, M. C. and Gwyther, S. G. (2000). New guidelines toevaluate response to treatment in solid tumors. Journal of the National Cancer Institute 92, 205–216.
Yonkers, K. A., Brown, C., Pearlstein, T. B., Foegh, M., Sampson-Landers, C. and Rapkin, A. (2005). Efficacyof a new low-dose oral contraceptive with drospirenone in premenstrual dysphoric disorder. Obstetrics andGynecology 106, 492–501.
Biometrical Journal 53 (2011) 1 155
r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com