On responder analyses when a continuous variable is dichotomized and measurement error is present

19
On responder analyses when a continuous variable is dichotomized and measurement error is present Michael Kunz Bayer Schering Pharma AG, 13342 Berlin, Germany Received 25 March 2010, revised 23 July 2010, accepted 5 November 2010 In clinical studies results are often reported as proportions of responders, i.e. the proportion of subjects who fulfill a certain response criterion is reported, although the underlying variable of interest is continuous. In this paper, we consider the situation where a subject is defined as a responder if the (error-free) continuous measurements post-treatment are below a certain fraction of (error-free) continuous measurements obtained pre-treatment. Focus is on the one-sample case, but an extension to the two-sample case is also presented. The bias of different estimates for the pro- portion of responders is derived and compared. In addition, an asymptotically unbiased ML-type estimate for the proportion of responders is presented. The results are illustrated using data obtained in a clinical study investigating pre-menstrual dysphoric disorder (PMDD). Key words: Bias; Clinical relevance; Design of clinical studies; Responder analyses. Supporting Information for this article is available from the author or on the WWW under http://dx.doi.org/10.1002/bimj.201000069. 1 Introduction Measurements in clinical studies are often continuous variables. This holds not only e.g. for safety variables such as many laboratory variables, blood pressure, heart rate, etc., but also for efficacy variables like e.g. tumor size in oncology studies or certain psychometric scales which can be treated as continuous. When analyzing clinical studies, subjects are often classified as responders or non-responders when a response-criterion is fulfilled or not. Often subjects are defined as responders if a continuous variable of interest is below a certain threshold. One impact of this approach is a loss of statistical power, as e.g. reported by Cochran (1968), Fleiss (1986) and Fedorov et al. (2009). Oppenheimer and Kher (1999) show that such a procedure may be problematic as in the presence of measurement error the resulting canonical non-parametric estimate for the proportion of responders is no longer unbiased in general. The effects of misclassification of data derived from a 2 2 table on odds and differences in proportions have been investigated by e.g. Cochran (1968), Goldberg (1975), and Newell (1964). In addition, an immediately evident weakness of a classification into responders and non- responders is the necessity to define a threshold. Even if this threshold was defined in advance – what e.g. in ICH E9 (1998) is considered as a prerequisite in the setting of a confirmatory clinical study – it will often be difficult to justify and may turn out to be a matter of subsequent discussions after completion of the clinical study. Despite this property, responder analyses have a certain popularity when illustrating results due to its apparent simplicity. Fedorov et al. (2009) regard *Corresponding author: e-mail: [email protected] r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim Biometrical Journal 53 (2011) 1, 137–155 DOI: 10.1002/bimj.201000069 137

Transcript of On responder analyses when a continuous variable is dichotomized and measurement error is present

On responder analyses when a continuous variable is dichotomized

and measurement error is present

Michael Kunz

Bayer Schering Pharma AG, 13342 Berlin, Germany

Received 25 March 2010, revised 23 July 2010, accepted 5 November 2010

In clinical studies results are often reported as proportions of responders, i.e. the proportion ofsubjects who fulfill a certain response criterion is reported, although the underlying variable ofinterest is continuous. In this paper, we consider the situation where a subject is defined as aresponder if the (error-free) continuous measurements post-treatment are below a certain fraction of(error-free) continuous measurements obtained pre-treatment. Focus is on the one-sample case, butan extension to the two-sample case is also presented. The bias of different estimates for the pro-portion of responders is derived and compared. In addition, an asymptotically unbiased ML-typeestimate for the proportion of responders is presented. The results are illustrated using data obtainedin a clinical study investigating pre-menstrual dysphoric disorder (PMDD).

Key words: Bias; Clinical relevance; Design of clinical studies; Responder analyses.

Supporting Information for this article is available from the author or on the WWW underhttp://dx.doi.org/10.1002/bimj.201000069.

1 Introduction

Measurements in clinical studies are often continuous variables. This holds not only e.g. for safetyvariables such as many laboratory variables, blood pressure, heart rate, etc., but also for efficacyvariables like e.g. tumor size in oncology studies or certain psychometric scales which can be treatedas continuous.

When analyzing clinical studies, subjects are often classified as responders or non-responderswhen a response-criterion is fulfilled or not. Often subjects are defined as responders if a continuousvariable of interest is below a certain threshold. One impact of this approach is a loss of statisticalpower, as e.g. reported by Cochran (1968), Fleiss (1986) and Fedorov et al. (2009). Oppenheimerand Kher (1999) show that such a procedure may be problematic as in the presence of measurementerror the resulting canonical non-parametric estimate for the proportion of responders is no longerunbiased in general. The effects of misclassification of data derived from a 2� 2 table on odds anddifferences in proportions have been investigated by e.g. Cochran (1968), Goldberg (1975), andNewell (1964).

In addition, an immediately evident weakness of a classification into responders and non-responders is the necessity to define a threshold. Even if this threshold was defined in advance –what e.g. in ICH E9 (1998) is considered as a prerequisite in the setting of a confirmatory clinicalstudy – it will often be difficult to justify and may turn out to be a matter of subsequent discussionsafter completion of the clinical study. Despite this property, responder analyses have a certainpopularity when illustrating results due to its apparent simplicity. Fedorov et al. (2009) regard

*Corresponding author: e-mail: [email protected]

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim

Biometrical Journal 53 (2011) 1, 137–155 DOI: 10.1002/bimj.201000069 137

responder analyses as a useful tool for reporting final results of clinical trials to a non-statisticalcommunity, while at the same time they recommend to avoid dichotomization on the data analysisstage. However, also among statisticians the role of responder analyses is controversially debated.Senn and Julious (2009) hold a position which is rather critical toward responder analyses. In ananswer to Senn (2003) Lewis (2004) adopts a position which is more in favor of this type of analyses.Issues in the causal interpretation of response in general depend on the sources of variation apartfrom treatment. As pointed out by Senn (2001, 2004), but also Kalow et al. (1998), to address thisquestion properly different designs like e.g. cross-over designs have to be chosen.

In many cases responder analyses are (ab)used to assess clinical relevance of study results after astatistical significant result has been obtained on the continuous variable of interest, refer alsoCPMP ‘Points to consider on Multiplicity Issues in Clinical Trials’ (2002) and CPMP ‘Guideline onthe Choice of the Non-Inferiority Margin’ (2006). For further insight into this topic refer e.g. thework of Kieser et al. (2004) and the references cited there.

As mentioned above, unbiased estimates of proportions of responders are not necessarilyavailable but in addition it is also known that on the subject level a substantial amount of subjects iserroneously classified as a responder, a non-responder, resp. These misclassifications may also resultin misleading further exploratory analyses, when the classification into responders and non-re-sponders is used to define subgroups.

However, responder analyses are nevertheless not only mentioned in the methodological guide-lines ICH E9 (1998), CPMP ‘Points to consider on Multiplicity Issues in Clinical Trials’ (2002), andCPMP ‘Guideline on the Choice of the Non-Inferiority Margin’ (2006), but also in several clinicalEMA guidelines. One example for the latter category of guidelines is the CPMP ‘Note for Guidanceon Clinical Investigation of medicinal products in the Treatment of Epileptic Disorders’ (2001),where a responder is a subject who has a reduction in seizure frequency of at least 50%. Accordingto this guideline, the responder criterion should even be used as the primary variable. An otherexample is the CPMP ‘Note for guidance on Clinical Investigation of Medicinal Products in theTreatment of Depression’ (2002), where for short-term clinical studies the improvement should alsobe expressed as the proportion of responders, where a 50% improvement on the usual rating scalesis accepted as a clinically relevant response. Refer also the CPMP ‘Guideline on Clinical In-vestigation of Medicinal Products used in Weight Control’ (2008), where responders are defined assubjects showing a weight loss of at least 10% at the end of a 12-month treatment period.

Further examples where responder definitions are introduced are the American College ofRheumatology (ACR) definitions of responders in clinical studies investigating rheumatism. Onepart of the ACR20 definition is the requirement that in three out of the five variables of physician’sand patient’s assessment of disease activity, patient’s assessment of pain and physical function, andlevels of an acute-phase reactant at least 20% improvement are present (ACR, 2007). A further areawhere responder definitions are used is the RECIST (Response Evaluation Criteria In Solid Tu-mors) criteria in oncology. According to these criteria a subject is defined as a (partial) responder,when the diameter of the sum of the largest diameter of the target lesions is 30% below the baselinevalue, refer Therasse et al. (2000).

In this paper the situation is considered, in which a subject is defined as a responder, if the (error-free) post-measurement has a value less than or equal to a certain fraction l, 0olo1, of the (error-free) pre-measurement. It is obvious that the pre-measurement and the post-measurement cannot beassumed to be independent. This setting is related to a situation considered by Irwig et al. (1990) inconnection with a work on exposure–response relations in an epidemiological setting.

However, although such responder definitions as mentioned above are used in practice the bias ofcorresponding estimates has to the knowledge of the author not yet been derived and discussed inthe statistical literature. One scope of this paper is to give further elucidation on the possible pitfallsthat may arise when responder definitions of the type mentioned above are introduced. In Section 3,the situation is described where exactly one pre- and exactly one post-measurement are availableand misclassifications on the subject level are addressed. In this section also an extension to the

138 M. Kunz: On responder analyses

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

two-sample case is given. In Section 4, the more general situation is considered where more than onepre- or more than one post-measurements are available. The bias of several different non-parametric(or better naive) estimates is derived and compared.

In Section 5, as an add-on an asymptotically unbiased parametric estimate for the proportion ofresponders is introduced. In Section 6, the estimates for proportions of responders discussed in thispaper have been applied to data of a clinical study investigating premenstrual dysphoric disorder(PMDD).

2 Notations and model assumptions

Let for treatment t, t5 1, 2, subject i, i5 1, y, n; j5 1, 2; Xijt denote the observed jth pre-measurement, Tit the true (or error-free) pre-measurement, Yijt the observed jth post-measurement,and Uit the true (or error-free) post-measurement. It is assumed that the following relations hold:

Xijt ¼Tit1Eijt;

Yijt ¼Uit1Fijt:

It is assumed throughout this paper that the error terms Eijt, Fijt, resp., are all independentlyidentically distributed (i.i.d) normal with expectation 0 and standard deviation spre, spost, resp., i.e.

PEijt ¼ Nð0;s2preÞ, P

Fijt ¼ Nð0;s2postÞ. It is assumed in addition that the Tit are i.i.d. normal, i.e.

PTit ¼ Nðm;s21Þ and that the Uit are i.i.d. normal, i.e. PUit ¼ Nðnt;s2

2Þ. The correlation between Tit

and Uit is assumed to be r for all i, t.This implies that both the pre-measurements Xijt, the post-measurements Yijt, resp., are

Nðm;s211s2

preÞ-distributed, Nðnt;s221s2

postÞ-distributed, resp.Let F denote the distribution function of the standard normal distribution, i.e. F (x)5P(Xrx),

xAR for a standard normal random variable X.Let A1, A2, B1, B2 denote independent N(0, 1)-distributed random variables. Following usual

terminology capital letters denote random variables. E denotes the expectation, V the variance, ands the standard deviation of a random variable.

Definitions for responders using the terminology defined in this section will be given in thefollowing sections.

Throughout this paper, the terminology measurement error is used. It should be kept in mind thatthe model assumptions also cover the situation where a subject has an individual mean, e.g. for thepre-measurements, but what is observed is a random variation around this individual mean. Thismodeling is e.g. adequate when a psychometric scale is used, where it is known that there is a ratherlarge intra-individual variation, as also reflected in the example from a clinical study presented inthis paper.

The terminology is chosen such that the two-sample case can also be considered. However, inlarge parts of this paper only the one-sample case is considered. When the one-sample case isconsidered the corresponding index t is not necessary and is therefore omitted from the formulas.

3 Impact of the measurement error on misclassifications on the subject level

In this section, the impact of the measurement error on the bias is investigated. In a first step, theone-sample case is considered, while in a second step similar considerations are applied to the two-sample case.

Biometrical Journal 53 (2011) 1 139

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

3.1 The one-sample case

The proportion of responders p, where p PðUi � l � TiÞ shall be estimated. First, the situation isconsidered where only one pre-measurement and only one post-measurement can be observed. Afirst (non-parametric) estimate pa for P(Uir l �Ti) could be the proportion of subjects for which therelation Yi1 � l � Xi1 holds, i.e.

pa

Pni¼1 1ðYi1�l�Xi1Þ

n:

Please note that subscript a is chosen to distinguish this estimate from other estimates considered insections below. Such an estimate is often chosen, as – due to the presence of the measurement error– only Yi1 � l � Xi1 can be observed, but it cannot be observed whether Uirl �Ti holds true.

It is obvious that there will be subjects, who are classified as responders based on the observedmeasurements Xi1 and Yi1, although they would have to be classified as non-responders, if the truemeasurements Ti and Ui were observed. Those subjects are usually defined as false positives, i.e.

FPa fYi1 � l � Xi1g \ fUi4l � Tig:

Analogously false negative, true positive, and true negative subjects are defined as

FNa fYi14l � Xi1g \ fUi � l � Tig;

TPa fYi1 � l � Xi1g \ fUi � l � Tig;

TNa fYi14l � Xi1g \ fUi4l � Tig:

Let

Z1 A1ðs2r� s1lÞ1s2A2

ffiffiffiffiffiffiffiffiffiffiffiffiffi1� r2

p� lspre B11spost B2;

Z2 A1ðs2r� s1lÞ1s2A2

ffiffiffiffiffiffiffiffiffiffiffiffiffi1� r2

p:

Remark 3.1.

(i) Z1 has the same distribution as the random variable Yi1� lXi1� (n� lm).(ii) Z2 has the same distribution as the random variable Ui� lTi� (n� lm).

Z�1 and Z�2 are defined as the standardized versions of Z1 and Z2, i.e. Z�1 ¼ Z1=sðZ1Þ andZ�2 ¼ Z2=sðZ2Þ. By definition Z�1 and Z�2 are N (0,1)-distributed. Straightforward calculationsestablish that the correlation c between Z�1 and Z�2 is c ¼ sðZ2Þ=sðZ1Þ. Note that for s(Z1) ands(Z2), the following relations hold:

sðZ1Þ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðs2r� s1lÞ

21s22ð1� r2Þ1l2s2

pre1s2post

q;

sðZ2Þ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðs2r� s1lÞ

21s22ð1� r2Þ

q:

These standard deviations depend on the unknown standard deviation s1 of the pre-measurement,the unknown standard deviation s2 of the post-measurement, the unknown correlation r betweenTi and Ui, the threshold l and the unknown standard deviations spre and spost of the measurementerrors.

Theorem 3.2. Using the notation introduced above and defining d n� lm the following holds:

(i) EðpaÞ � p ¼ F dsðZ2Þ

� �� F d

sðZ1Þ

� �(ii) pa has a positive bias, i.e. EðpaÞ � PðUi � lTiÞ40, d40:(iii) pa has a negative bias, i.e. EðpaÞ � PðUi � lTiÞo0, do0:(iv) pa has no bias, i.e. EðpaÞ ¼ PðUi � lTiÞ , d ¼ 0:

140 M. Kunz: On responder analyses

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Proof: Appendix.Theorem 3.2 (i) presents an explicit formula for the bias. This formula will usually be difficult to

use directly in practice, as the necessary standard deviations will often not be available. However,based on this formula the direction of the bias can immediately be derived, as the direction of thebias only depends on n, m, and l. It should be noted that n and m are much easier to estimate inpractice compared to the standard deviations, which may not be available when there is only onepre-measurement and only one post-measurement available.

It can also be seen from the formula that the bias is negative, when the true proportion ofresponders is greater than 0.5, while the bias is positive, when the true proportion of responders issmaller than 0.5. However, when applying these considerations on the direction of the bias to realdata one has to consider that for Theorem 3.2 one crucial assumption is that the measurement errorfollows a distribution which is symmetric around 0.

Theorem 3.2 is illustrated in Table 1 that displays the expected proportions of true responders, falsepositives (FPa), false negatives (FNa), and the bias (expected proportion of observed responders –expected proportion of true responders) for different threshold values l, different correlations betweenpre- and post-measurement r and different measurement errors s for the following scenario:

(i) Expectation of the pre-measurement Ti is EðTiÞ m ¼ 500.(ii) Expectation of the post-measurement Ui is EðUiÞ n ¼ 300.(iii) Standard deviations of pre-measurement and post-measurement are identicalffiffiffiffiffiffiffiffiffiffiffiffi

VðTiÞp

s1 ¼ffiffiffiffiffiffiffiffiffiffiffiffiVðUiÞ

ps2 ¼ 40.

Table 1 was created using standard SAS-functions.The standard deviations s of the measurement error have been chosen as s 2 f20; 28:3; 34:6g.

This corresponds to intraclass coefficients of correlation VðTiÞ=VðXi1Þ between Ti and Xi1 of 0.8,0.67, 0.57. Note that the standard deviation s of the measurement error is assumed to be equal forthe pre- and post-measurements.

In this example, it is obvious that there is a substantial amount of misclassifications on the subject level.This may be critical, when these classifications are the basis for further exploratory subgroup-analyses.

From the proof of Theorem 3.2 it can easily be seen that not only in this example, but also ingeneral the bias increases, when the standard deviation s of the measurement error increases.

The examples presented in this paper indicate that in many cases the bias is smaller when thecorrelation coefficient r between the pre- and the post-measurement is smaller, while at the same time theother parameters for the distributions to be considered are constant. However, as can also be seen fromFig. 1 for the case l50.7, s520 this is not case in general, i.e. even when s15s2 and spre5spost canbe assumed, there is no monotone dependency between the correlation coefficient r and the induced bias.

The focus of this paper is on the bias of corresponding estimates when conducting responderanalyses. In this paper, we are always considering the case where complete data are available,although this is often not the case in clinical research. In the context of a study that is designed tofulfill regulatory requirements the preferred analysis set is almost always the Full Analysis Set. Thisimplies that in order to follow the Intention-To-Treat(ITT)-principle all subjects should be classifiedas a responder or non-responder, i.e. also those that do not have a measurement of the continuousvariable of interest. In the statistical literature several imputation techniques are described thatcould be used to impute the underlying continuous variable of interest. An introduction to the topicof missing data in clinical trials is e.g. given by Carpenter and Kenward (2007). However, as theoften applied estimates of response rates are biased due to the measurement error, it cannot beexpected that the presence of another form of bias will cause the overall bias to vanish. To theknowledge of the author there is no literature available where it is investigated how large the biascan be when in addition to the measurement error also missing data for the continuous variable ofinterest are present. A deeper investigation of the bias that missing data introduce in addition incase a continuous variable is dichotomized could be the basis of the further work.

Biometrical Journal 53 (2011) 1 141

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Table 1 Expected proportions of false positives (P(FPa)), false negatives (P(FNa)), true proportionof responders, and the bias PðFPaÞ � PðFNaÞ ¼ EðpaÞ � p for different correlations r between Ti

and Ui, different standard deviations s of the measurement error, and different thresholds l.(EðTiÞ ¼ 500;EðUiÞ ¼ 300;

ffiffiffiffiffiffiffiffiffiffiffiffiVðTiÞ

ffiffiffiffiffiffiffiffiffiffiffiffiVðUiÞ

p¼ 40)

l r s P(FPa) P(FNa) P(FPa)1P(FNa) p EðpaÞ � p

0.5 0.1 20.0 0.057 0.028 0.086 0.122 0.029

28.3 0.087 0.035 0.122 0.122 0.052

34.6 0.110 0.038 0.148 0.122 0.072

0.3 20.0 0.059 0.025 0.084 0.100 0.033

28.3 0.090 0.030 0.121 0.100 0.060

34.6 0.115 0.033 0.148 0.100 0.082

0.5 20.0 0.059 0.021 0.080 0.074 0.038

28.3 0.093 0.025 0.118 0.074 0.069

34.6 0.120 0.026 0.146 0.074 0.094

0.7 20.0 0.058 0.014 0.072 0.046 0.043

28.3 0.095 0.017 0.112 0.046 0.078

34.6 0.124 0.018 0.142 0.046 0.107

0.9 20.0 0.051 0.006 0.058 0.017 0.045

28.3 0.092 0.007 0.099 0.017 0.085

34.6 0.125 0.007 0.132 0.017 0.118

0.6 0.1 20.0 0.077 0.077 0.154 0.500 0.000

28.3 0.101 0.101 0.203 0.500 0.000

34.6 0.117 0.117 0.234 0.500 0.000

0.3 20.0 0.084 0.084 0.168 0.500 0.000

28.3 0.110 0.110 0.219 0.500 0.000

34.6 0.126 0.126 0.252 0.500 0.000

0.5 20.0 0.094 0.094 0.188 0.500 0.000

28.3 0.121 0.121 0.241 0.500 0.000

34.6 0.137 0.137 0.273 0.500 0.000

0.7 20.0 0.108 0.108 0.216 0.500 0.000

28.3 0.136 0.136 0.271 0.500 0.000

34.6 0.151 0.151 0.303 0.500 0.000

0.9 20.0 0.133 0.133 0.265 0.500 0.000

28.3 0.159 0.159 0.318 0.500 0.000

34.6 0.173 0.173 0.346 0.500 0.000

0.7 0.1 20.0 0.032 0.062 0.094 0.859 �0.029

28.3 0.040 0.092 0.132 0.859 �0.053

34.6 0.044 0.116 0.159 0.859 �0.072

0.3 20.0 0.029 0.064 0.093 0.887 �0.036

28.3 0.035 0.098 0.133 0.887 �0.063

34.6 0.038 0.123 0.161 0.887 �0.086

0.5 20.0 0.023 0.066 0.089 0.920 �0.043

28.3 0.027 0.103 0.130 0.920 �0.077

34.6 0.029 0.132 0.161 0.920 �0.103

0.7 20.0 0.013 0.065 0.078 0.960 �0.052

28.3 0.015 0.107 0.122 0.960 �0.092

34.6 0.016 0.139 0.155 0.960 �0.124

0.9 20.0 0.002 0.051 0.053 0.995 �0.049

28.3 0.002 0.100 0.102 0.995 �0.098

34.6 0.002 0.138 0.140 0.995 �0.136

142 M. Kunz: On responder analyses

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

3.2 The two-sample case

In the previous section the bias of an estimate for the one-sample case was derived. In this section anextension to the two-sample case is given. As a generalization of p defined in the section before letpt PðUit � l � TitÞ denote the proportion of responders in treatment group t and

pat

Pni¼1 1ðYi1t�l�Xi1tÞ

n

the corresponding estimate. From Theorem 3.2, it can easily be seen after defining dt nt � lm thatunder the general assumptions made in this paper the following holds for the bias for the differenceof proportions of responders under treatment 1 and treatment 2:

Eðpa1Þ � Eðpa2Þ � ðp1 � p2Þ ¼ Fd1

sðZ2Þ

� �� F

d1

sðZ1Þ

� �

� Fd2

sðZ2Þ

� �� F

d2

sðZ1Þ

� �� �:

ð1Þ

Formula (1) is illustrated in Table 2 that presents for treatment t 2 f1; 2g the expected proportionof true responders for treatment t, the bias for treatment t (expected proportion of observed

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.1 0.3 0.5 0.7 0.9

Correlation Coefficient ρ

Bia

s

λ=0.5; σ=20

λ=0.5; σ=28.3

λ=0.5; σ=34.6

λ=0.7; σ=20

λ=0.7; σ=28.3

λ=0.7; σ=34.6

Figure 1 Bias for the scenarios displayed in Table 1, i.e. expectation of the pre-mea-surement Ti is EðTiÞ m ¼ 500, expectation of the post-measurement Ui isEðUiÞ n ¼ 300, standard deviations of pre-measurement and post-measurement areidentical

ffiffiffiffiffiffiffiffiffiffiffiffiVðTiÞ

ffiffiffiffiffiffiffiffiffiffiffiffiVðUiÞ

p¼ 40, s denotes the standard deviation of the measurement

error, which is assumed to be equal for pre- and post-measurements.

Biometrical Journal 53 (2011) 1 143

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

responders for treatment t – expected proportion of true responders for treatment t), and the biasfor the difference of true proportions of responders ðEðpa1Þ � Eðpa2Þ � ðp1 � p2ÞÞ for differentcorrelations r, different expectations n2 of treatment 2 and different measurement errors s for thefollowing scenario:

(i) Expectation of the pre-measurement Tit is EðTi1Þ ¼ EðTi2Þ m ¼ 500.(ii) Expectation of the post-measurement Ui1 of treatment 1 is EðUi1Þ n1 ¼ 280.(iii) Standard deviations of pre-measurement and post-measurement are all identicalffiffiffiffiffiffiffiffiffiffiffiffiffi

VðTi1Þp

¼ffiffiffiffiffiffiffiffiffiffiffiffiffiVðTi2Þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiVðUi1Þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiVðUi2Þ

ps1 ¼ 40.

(iv) Threshold l5 0.6.

Table 2 Expected true proportion of responders for treatments 1 and 2, p1 and p2, the biasesEðpa1Þ � p1 and Eðpa2Þ � p2, and the bias for the difference of true proportion responders Eðpa1Þ �Eðpa2Þ � ðp1 � p2Þ for different correlations r, expectations n2 of treatment 2 and different standarddeviations of the measurement error. (EðTi1Þ ¼ EðTi2Þ ¼ 500;

ffiffiffiffiffiffiffiffiffiffiffiffiffiVðTi1Þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiVðTi2Þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiVðUi1Þ

p¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffi

VðUi2Þp

¼ 40. Standard deviations of all measurement errors equal.)

r n2 s(Eijt) p1 Eðpa1Þ � p1 p2 Eðpa2Þ � p2 p1� p2 Eðpa1Þ � Eðpa2Þ � ðp1 � p2Þ

0.1 240 20.0 0.673 �0.019 0.911 �0.027 �0.238 0.009

30.0 0.673 �0.035 0.911 �0.056 �0.238 0.020

260 20.0 0.673 �0.019 0.815 �0.029 �0.142 0.010

30.0 0.673 �0.035 0.815 �0.055 �0.142 0.020

300 20.0 0.673 �0.019 0.500 0.000 0.173 �0.019

30.0 0.673 �0.035 0.500 0.000 0.173 �0.035

320 20.0 0.673 �0.019 0.327 0.019 0.347 �0.037

30.0 0.673 �0.035 0.327 0.035 0.347 �0.071

0.3 240 20.0 0.691 �0.024 0.933 �0.031 �0.242 0.006

30.0 0.691 �0.045 0.933 �0.063 �0.242 0.018

260 20.0 0.691 �0.024 0.841 �0.035 �0.150 0.011

30.0 0.691 �0.045 0.841 �0.067 �0.150 0.022

300 20.0 0.691 �0.024 0.500 0.000 0.191 �0.024

30.0 0.691 �0.045 0.500 0.000 0.191 �0.045

320 20.0 0.691 �0.024 0.309 0.024 0.383 �0.049

30.0 0.691 �0.045 0.309 0.045 0.383 �0.090

0.5 240 20.0 0.717 �0.034 0.957 �0.034 �0.240 0.000

30.0 0.717 �0.060 0.957 �0.070 �0.240 0.010

260 20.0 0.717 �0.034 0.874 �0.045 �0.157 0.011

30.0 0.717 �0.060 0.874 �0.083 �0.157 0.024

300 20.0 0.717 �0.034 0.500 0.000 0.217 �0.034

30.0 0.717 �0.060 0.500 0.000 0.217 �0.060

320 20.0 0.717 �0.034 0.283 0.034 0.434 �0.067

30.0 0.717 �0.060 0.283 0.060 0.434 �0.119

0.7 240 20.0 0.756 �0.051 0.981 �0.034 �0.225 �0.017

30.0 0.756 �0.086 0.981 �0.074 �0.225 �0.011

260 20.0 0.756 �0.051 0.917 �0.058 �0.161 0.007

30.0 0.756 �0.086 0.917 �0.106 �0.161 0.021

300 20.0 0.756 �0.051 0.500 0.000 0.256 �0.051

30.0 0.756 �0.086 0.500 0.000 0.256 �0.086

320 20.0 0.756 �0.051 0.244 0.051 0.512 �0.102

30.0 0.756 �0.086 0.244 0.086 0.512 �0.171

144 M. Kunz: On responder analyses

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Table 2 was created using standard SAS-functions. The standard deviations of the measurementerror have been chosen as s 2 f20; 30g. r has been chosen as r 2 f0:1; 0:3; 0:5; 0:7g. The expectationn2 of treatment 2 was chosen as n2 2 f240; 260; 300; 320g.

It is obvious that the absolute value of the bias in the two sample case, i.e. jEðpa1Þ � Eðpa2Þ �ðp1 � p2Þj is larger than each single absolute bias in the two treatment groups, if the biases Eðpa1Þ �p1 and Eðpa2Þ � p2 have different signs. However, Eðpa1Þ � Eðpa2Þ � ðp1 � p2Þ may be positive ornegative even if d1 and d2 have the same sign, as can be seen from Table 2 and Fig. 2.

4 A comparison of several non-parametric estimates for the proportion of

responders

The most important factor that contributes to the bias is the standard deviation of themeasurement error. In principle, this measurement error could be minimized by not havingonly one but several pre- and post-measurements within one subject and by definingestimates for the proportion of responders in analog to pa by taking for each subject themean of all available pre-, post-measurements, resp. However, in clinical practice it will not alwaysbe realistic to have several repeats of measurements within one subject. In the following theorem,the bias of four different estimates pa; pb; pc, and pd for the probability p ¼ PðUi � l � TiÞ iscompared:

(i) One pre-measurement, one post-measurement available. ~Xai Xi1 and ~Ya

i Yi1.(ii) Two pre-measurements, one post-measurement available. ~Xb

i ðXi11Xi2Þ=2 and~Ybi Yi1.

(iii) One pre-measurement, two post-measurements available. ~Xci Xi1 and ~Yc

i ðYi11Yi2Þ=2.(iv) Two pre-measurement, two post-measurements available. ~Xd

i ðXi11Xi2Þ=2 and~Ydi ðYi11Yi2Þ=2.

Theorem 4.1. Let spre ¼ spost s denote the measurement error of the pre- and post-mea-surements. Let d n� lm. Let

pb ¼

Pni¼1 1ð ~Yb

i�l� ~Xb

n;

pc ¼

Pni¼1 1ð ~Yc

i�l� ~Xc

n;

pd ¼

Pni¼1 1ð ~Yd

i�l� ~Xd

n:

The following relations hold for the bias introduced by the estimates pa � pd :

(i) EðpbÞ � p ¼ F �d=

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffis2ðZ2Þ1

l2

211

� �s2

r� �� p

(ii) EðpcÞ � p ¼ F �d=ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffis2ðZ2Þ1 l21 1

2

s2

q� �� p

(iii) Eðpd Þ � p ¼ F �d=ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffis2ðZ2Þ1

12ð11l2Þs2

q� �� p

(iv) jEðpaÞ � pj � jEðpbÞ � pj � jEðpcÞ � pj � jEðpdÞ � pj:

Biometrical Journal 53 (2011) 1 145

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Proof: AppendixRemark 4.2

(i) From the proof, it can easily be seen that the inequalities in (iv) above are strict forn – lm 6¼ 0.

(ii) It should be noted that it is only assumed that the standard deviations of the measurementerrors are equal but not that the standard deviations of Ti and Ui are equal.

The bias of the estimates pb; pc; pd that use the measurements obtained from morethan exactly one pre- and one post-measurements can be derived using the same techniquesas were used to derive the bias of pa. The inequality in Theorem 4.1 (iv) shows that the biasdecreases to a greater extent when exactly one pre- and two post-measurements are availablecompared to the situation, when exactly two pre-measurements and exactly on post-measurementare available.

Bias for the two-sample case

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1 0.3 0.5 0.7

Correlation ρ

Bia

s

E=240, σ=20

E=240, σ=30

E=260, σ=20

E=260, σ=30

E=300, σ=20

E=300, σ=30

E=320, σ=20

E=320, σ=30

Figure 2 Bias for the scenarios displayed in Table 2, i.e. expectation of the pre-mea-surement Tit is EðTi1Þ ¼ EðTi2Þ m ¼ 500; expectation of the post-measurement Ui1 oftreatment 1 is EðUiÞ n1 ¼ 280; standard deviations of pre-measurement and post-measurement are all identical

ffiffiffiffiffiffiffiffiffiffiffiffiffiVðTi1Þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiVðTi2Þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiVðUi1Þ

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiVðUi2Þ

p¼ 40;

threshold l5 0.6. E ¼ EðUi2Þ n2 denotes the expectation of post-treatment undertreatment 2. s denotes the standard deviation of the measurement error.

146 M. Kunz: On responder analyses

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Theorem 4.1 is illustrated in Table 3 that shows the bias of the four estimates investigated inTheorem 4.1 for different correlations r between Ti and Ui, different standard deviations s, anddifferent thresholds l. The assumptions on the means and standard deviations are the same as forTable 4, i.e. m5 500, n5 300, s1 5s2 5 40.

5 An asymptotically unbiased ML-type estimate for the proportion of

responders

In this section an ML-type approach to calculate the proportion of responders p at least asymp-totically unbiased if two pre- and two post-measurements per subject are available is presented. Theapproach is rather general in the sense that it is neither a prerequisite that the standard deviations ofTi and Ui are equal, nor is it a prerequisite that the standard deviations of the measurement errors ofEij and Fij are equal.

Theorem 5.1. Let �Xi ðXi11Xi2Þ=2, �Yi ðYi11Yi2Þ=2 denote the individual arithmetic mean ofthe pre-, post-measurements, resp. All variables are assumed to be normally distributed. Let

m1

n

Xni¼1

�Xi; ð2Þ

n1

n

Xni¼1

�Yi; ð3Þ

s2pre

1

2

1

n

Xni¼1

ðXi2 � Xi1Þ2;

s2post

1

2

1

n

Xni¼1

ðYi2 � Yi1Þ2;

s21

1

n� 1

Xni¼1

ð �Xi � mÞ2 �1

2s2pre; ð4Þ

s22

1

n� 1

Xni¼1

ð �Yi � nÞ2 �1

2s2post; ð5Þ

r max �1; min 1;1

n�1

Pni¼1 ð

�Xi � mÞð �Yi � nÞs1s2

!� 1fs2

140g\fs2

240g

!( ); ð6Þ

mML Flm� nffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ðs2r� s1lÞ21s2

2ð1� r2Þq

0B@

1CA:

The following relation holds:

limn!1

mML ¼ PðUi � lTiÞ:

Proof: AppendixRemark 5.2

(i) The estimates s21 and s2

2 are unbiased estimates for s21 and s2

2 but are not 40 a.s. fornoN.

Biometrical Journal 53 (2011) 1 147

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Table 3 Bias for different estimates of the proportion of true responders p for different correlationsr between Ti and Ui, different standard deviations s of the measurement error, and differentthresholds l. (EðTiÞ ¼ 500;EðUiÞ ¼ 300;

ffiffiffiffiffiffiffiffiffiffiffiffiVðTiÞ

ffiffiffiffiffiffiffiffiffiffiffiffiVðUiÞ

p¼ 40)

l r s p EðpaÞ � p EðpbÞ � p EðpcÞ � p Eðpd Þ � p

0.5 0.1 20.0 0.122 0.029 0.026 0.018 0.015

28.3 0.122 0.052 0.048 0.034 0.029

34.6 0.122 0.072 0.066 0.048 0.041

0.3 20.0 0.100 0.033 0.030 0.021 0.017

28.3 0.100 0.060 0.055 0.039 0.033

34.6 0.100 0.082 0.075 0.055 0.047

0.5 20.0 0.074 0.038 0.035 0.024 0.020

28.3 0.074 0.069 0.063 0.045 0.038

34.6 0.074 0.094 0.087 0.063 0.054

0.7 20.0 0.046 0.043 0.039 0.027 0.023

28.3 0.046 0.078 0.072 0.051 0.043

34.6 0.046 0.107 0.099 0.072 0.062

0.9 20.0 0.017 0.045 0.041 0.027 0.022

28.3 0.017 0.085 0.078 0.054 0.045

34.6 0.017 0.118 0.109 0.078 0.066

0.6 0.1 20.0 0.500 0.000 0.000 0.000 0.000

28.3 0.500 0.000 0.000 0.000 0.000

34.6 0.500 0.000 0.000 0.000 0.000

0.3 20.0 0.500 0.000 0.000 0.000 0.000

28.3 0.500 0.000 0.000 0.000 0.000

34.6 0.500 0.000 0.000 0.000 0.000

0.5 20.0 0.500 0.000 0.000 0.000 0.000

28.3 0.500 0.000 0.000 0.000 0.000

34.6 0.500 0.000 0.000 0.000 0.000

0.7 20.0 0.500 0.000 0.000 0.000 0.000

28.3 0.500 0.000 0.000 0.000 0.000

34.6 0.500 0.000 0.000 0.000 0.000

0.9 20.0 0.500 0.000 0.000 0.000 0.000

28.3 0.500 0.000 0.000 0.000 0.000

34.6 0.500 0.000 0.000 0.000 0.000

0.7 0.1 20.0 0.859 �0.029 �0.025 �0.020 �0.016

28.3 0.859 �0.053 �0.046 �0.038 �0.029

34.6 0.859 �0.072 �0.063 �0.053 �0.042

0.3 20.0 0.887 �0.036 �0.030 �0.025 �0.019

28.3 0.887 �0.063 �0.055 �0.045 �0.036

34.6 0.887 �0.086 �0.075 �0.063 �0.050

0.5 20.0 0.920 �0.043 �0.037 �0.030 �0.023

28.3 0.920 �0.077 �0.067 �0.055 �0.043

34.6 0.920 �0.103 �0.091 �0.077 �0.061

0.7 20.0 0.960 �0.052 �0.044 �0.035 �0.027

28.3 0.960 �0.092 �0.080 �0.066 �0.052

34.6 0.960 �0.124 �0.109 �0.092 �0.073

0.9 20.0 0.995 �0.049 �0.040 �0.031 �0.022

28.3 0.995 �0.098 �0.083 �0.066 �0.049

34.6 0.995 �0.136 �0.119 �0.098 �0.075

148 M. Kunz: On responder analyses

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

(ii) The estimates s2pre and s2

post are unbiased estimates for s2pre and s2

post.(iii) If s1 5s2 or spre 5spost can be assumed, more efficient estimates are possible.(iv) This approach does not allow to classify subjects on the subject level into responders or

non-responders.(v) The definition of r was chosen to ensure that r takes values between �1 and 1.

The estimate for the proportion of responders using the ML-type estimator introduced in The-orem 5.1 is asymptotically unbiased. A small simulation study was conducted to assess how fast theexpectation of ML-type estimator converges to the expected proportion of true responders.The simulation results for the situation considered in Table 1, i.e. m5 500, n5 300, s1 5s2 5 40 arepresented for a study with n5 10 subjects, 10 000 studies simulated. From these results,it can be concluded that the bias of the ML-type-estimate is rather small already for smallsample sizes.

6 Example: Illustration of responder analyses using a premenstrual

dysphoric disorder study

PMDD is a medical disorder characterized by debiliating mood and behavioral changes, and fre-quently, somatic complaints in the week preceeding menstruation. Yonkers et al. (2005) reportthe results of a parallel study of three cycles of treatment with an oral contraceptive or placebo.

Table 4 Expectation of the ML-type estimate pML, its standard error seðpMLÞ based on simulation,expectation of the true proportion of responders p and the corresponding theoretical standard errorse(p). (m5 500, n5 300, s1 5s2 5 40.)

l r s pML seðpMLÞ p se(p)

0.5 0.1 20.0 0.126 0.087 0.122 0.103

28.3 0.123 0.092 0.122 0.103

0.3 20.0 0.104 0.079 0.100 0.095

28.3 0.104 0.087 0.100 0.095

0.5 20.0 0.081 0.069 0.074 0.083

28.3 0.084 0.077 0.074 0.083

0.7 20.0 0.056 0.057 0.046 0.066

28.3 0.060 0.066 0.046 0.066

0.6 0.1 20.0 0.500 0.148 0.500 0.158

28.3 0.498 0.164 0.500 0.158

0.3 20.0 0.504 0.150 0.500 0.158

28.3 0.500 0.171 0.500 0.158

0.5 20.0 0.503 0.158 0.500 0.158

28.3 0.497 0.182 0.500 0.158

0.7 20.0 0.502 0.170 0.500 0.158

28.3 0.500 0.197 0.500 0.158

0.7 0.1 20.0 0.856 0.094 0.859 0.110

28.3 0.858 0.099 0.859 0.110

0.3 20.0 0.883 0.084 0.887 0.100

28.3 0.884 0.092 0.887 0.100

0.5 20.0 0.915 0.072 0.920 0.086

28.3 0.913 0.080 0.920 0.086

0.7 20.0 0.950 0.055 0.960 0.062

28.3 0.944 0.064 0.960 0.062

Biometrical Journal 53 (2011) 1 149

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

The primary target variable in this study is the DRSP (Daily Record of Severity of Problems)-score.The DRSP-score is a widely used psychometric scale for this disorder consisting of 21 individualitems. Each individual item had to be rated on a discrete six-point scale from 1 to 6, while smallvalues represent a more favorable health state compared to higher values. The questionaire was tobe completed every day, both during the pre-treatment- and the treatment cycles. To calculate theDRSP-score for each cycle, each of the 21 individual items was first averaged over the last 5 daysbefore menses and then the averages for the 21 individual items were summed up.

We are using the placebo data of this study to determine the proportion of responders whenapplying the four estimates for proportions of responders introduced in Sections 3 and 4 and theML-type estimate introduced in Section 5. Study participants had to record the DRSP-score first inup to three pre-treatment cycles. The results of the pre-treatment cycles were used to establish thediagnosis of PMDD. Subjects diagnosed with PMDD were eligible for randomization to threecycles of treatment (oral contraceptive or placebo). We are restricting the analysis to complete casesin the placebo group (n5 130), i.e. cases who had a DRSP-score in both pre-treatment cycles andtreatment cycle 2 and treatment cycle 3. We are in the following denoting these measurements aspre-measurement 1, pre-measurement 2, post-measurement 1, post-measurement 2, resp. Table 5shows some characteristics of the placebo-data to be analyzed, while a graphical display is given inFig. 3. It is obvious that the means for pre-measurement 1 and pre-measurement 2 do not differmuch. However, there is a rather high standard deviation for the difference pre-measurement 1minus pre-measurement 2, indicative for a large intra-individual variation, which was identified inthis paper as an important source of bias. In the terminology introduced in the previous sections thestandard deviation of 16.9 for the difference pre-measurement 1 minus pre-measurement 2 has to bedivided by

ffiffiffi2p

to obtain the estimate 12.0 for the standard deviation of the measurement error.Similar considerations hold true for post-measurement 1 and post-measurement 2.

A responder is defined as a subject who has a post-treatment value less than 50% of the baselinevalue. This responder definition is in line with a recently released CPMP draft guidance ontreatment of PMDD (2010). Table 6 presents the results of the different estimates introduced in thispaper using the response criterion of a 50% reduction of the DRSP-score.

In this example, it is obvious that there is a remarkable difference between the proportion ofresponders. As suggested from Theorem 4.1, the proportion of responders shows a declinepa4pb4pc4pd . All estimates pa; pb; pc, and pd are larger than the ML-type estimate. However, theresult for the ML-type estimate should also be judged with care, as for this estimate it has to beassumed that the data are normally distributed, which may not necessarily be the case for the post-measurement data.

7 Discussion

Responder analyses are a type of analysis that on the surface seems to be very easy to interpret, butwhen one digs deeper are potentially very misleading. As also mentioned by Oppenheimer et al.

Table 5 DRSP-scores in a PMDD-study (n5 130).

Measurement Mean Standard deviation

Pre-measurement 1 78.6 19.1

Pre-measurement 2 80.0 19.4

Pre-measurement 1–Pre-measurement 2 �1.5 16.9

Post-measurement 1 47.5 22.7

Post-measurement 2 47.3 24.8

Post-measurement 1–Post-measurement 2 0.2 19.1

150 M. Kunz: On responder analyses

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

(1999) and Senn (2004) these analyses have to be set in comparison to the negative statisticalproperties of such an approach.

In this paper the situation is considered, in which a subject is defined as a responder, if the(error-free) post-measurement has a value less than or equal to a certain fraction l, 0olo1, of the(error-free) pre-measurement when all continuous variables considered are assumed as normallydistributed. This explicit setting of a responder definition is to the knowledge of the author not yetpresent in the statistical literature.

For the conclusions on the sign of the bias it is important to keep in mind that it is alwaysassumed that the random variables considered, including the measurement errors, are normallydistributed.

One result obtained in this paper in the setting of a normal distributed measurement error is thatalso here considerable bias is present when canonical non-parametric (naive) estimates for pro-portions of responders are used. Especially, if the true proportion of responders is rather close to 0or rather close to 1 the bias can be substantial, while the bias is less pronounced when the trueproportion of responders is close to 0.5. However, this only holds true for the bias itself. Whenconsidering the subject level even a larger proportion of subjects is misclassified when the trueproportion of responders is close to 0.5 compared to the situation when the true proportion ofresponders is rather close to 0 or rather close to 1. Therefore especially when the observed pro-portion of responders is close to 0.5, the classification into responders and non-responders shouldnot form the basis of further exploratory subgroup analyses, as e.g. in the example presented in thispaper around 30% of all subjects may be misclassified.

Table 6 Proportions of subjects showing at least a 50% reduction in DRSP-score using differentresponder estimates, including the estimates for the calculation of the ML-type estimate accordingto Theorem 5.1.

pa pb pc pd ML-type estimate m n s2pre s2

post s21 s2

2 r

46.9 44.6 40.8 40.0 34.2 79.3 47.4 142.4 181.2 228.1 383.6 0.27

Figure 3 Summary results for DRSP-scores obtained in a PMDD-study.

Biometrical Journal 53 (2011) 1 151

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Focus of this paper was on the one-sample case. However, the two-sample case was also touched.It can easily be seen that if the true proportion of responders is above 50% in one sample while thetrue proportion of responders is below 50% in the other sample the biases introduced in the twosamples have different signs. It was shown via an example that even in case of normally distributedmeasurement errors there are situations where the expected proportion of responders is larger than50% in both treatment groups, but the bias for the difference of proportions may be positive in onesetting, but negative in another.

By having not one but several pre-measurements or having not one but several post-measure-ments the variance of the measurement error can be reduced. Therefore, it is not surprising that thebias of the resulting non-parametric estimate for the proportion of responders is smaller. It wasestablished that in the situation where the measurement errors of the pre- and post-measurementsare similar and only one pre-measurement and only one post-measurement is available the additionof exactly one further post-measurement reduces the bias to a greater extent than the addition ofexactly one further pre-measurement.

The results presented in this paper may e.g. be of interest, when comparing proportionsof responders obtained in several different studies. If such comparisons are intended it isrecommended to carefully assess whether the responder definitions are really identical orwhether the measurement error is reduced through several measurements. In connectionwith this it may also be advisable to assess whether the measurement errors in the two studies arecomparable. This may especially be advisable for small and large proportions of responders ob-served, as in these cases a small variation in the measurement error does not only affect the pointestimate itself, but also markedly increases the variance of the point estimate for the proportion ofresponders.

In addition an asymptotically unbiased ML-type estimate for the proportion of responders whentwo pre-measurements and two post-measurements within one subject are available was presented.

Acknowledgements The author is grateful to two anonymous referees and one anonymous Associate Editorfor helpful comments on a first version of this article.

Conflict of Interest

The PMDD-study reported in Section 6 was conducted by Berlex Inc. Today Berlex Inc. belongs toBayer HealthCare AG. The author is an employee of Bayer Schering Pharma AG, which belongs to BayerHealthCare AG.

Appendix : Proofs

Proof of Theorem 3.2. For the proof of (i) consider the following equation:

PðYi1 � lXi1Þ � p

¼PðYi1 � lXi1 � 0Þ � PðUi � lTi � 0Þ

¼PðZ1 � �dÞ � PðZ2 � �dÞ

¼Fd

sðZ2Þ

� �� F

d

sðZ1Þ

� �:

(ii) – (iv) follow then immediately from (i) and from s(Z2)os(Z1).Proof of Theorem 4.1. In analogy to the definitions of Z1 and Z2 introduced in Section 3 the

following random variables are defined:

152 M. Kunz: On responder analyses

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Z1;a Z1;

Z1;b A1ðs2r� s1lÞ1s2A2

ffiffiffiffiffiffiffiffiffiffiffiffiffi1� r2

p�

lffiffiffi2p sB11sB2;

Z1;c A1ðs2r� s1lÞ1s2A2

ffiffiffiffiffiffiffiffiffiffiffiffiffi1� r2

p� lsB11

1ffiffiffi2p sB2;

Z1;d A1ðs2r� s1lÞ1s2A2

ffiffiffiffiffiffiffiffiffiffiffiffiffi1� r2

p1

1ffiffiffi2p ð�lsB11sB2Þ;

Z2;a Z2;b Z2;c :¼Z2;d

A1ðs2r� s1lÞ1s2A2

ffiffiffiffiffiffiffiffiffiffiffiffiffi1� r2

p:

The standard deviations sðZa1Þ, sðZ

b1Þ, sðZc

1Þ, sðZd1 Þ resp. of Za

1, Zb1, Z

c1, and Zd

1 resp. are asfollows:

sðZa1Þ ¼ sðZ1Þ;

sðZb1Þ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðs2r� s1lÞ

21s22ð1� r2Þ1

l2

211

� �s2

s;

sðZc1Þ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðs2r� s1lÞ

21s22ð1� r2Þ1 l21

1

2

� �s2

s;

sðZd1 Þ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðs2r� s1lÞ

21s22ð1� r2Þ1

1

2ð11l2Þs2

r:

(i)–(iii) follow then immediately from the deviation of these standard deviations keeping in mindthat Z1;A has the same distribution as the random variable ~YA

i � l ~XAi � d for A 2 fa; b; c; dg.

It is obvious that

sðZ1;aÞ4sðZ1;bÞ4sðZ1;cÞ4sðZ1;dÞ:

(iv) As an example it is established that jEðpaÞ � pj � jEðpbÞ � pj. Without loss of generality, it canbe assumed that the bias is positive, i.e. that d40:

EðpaÞ � p� ðEðpbÞ � pÞ

¼ EðpaÞ � EðpbÞ

¼ P Z1;a � �d

� P Z1;b � �d

¼ Fd

sðZ1;bÞ

� �� F

d

sðZ1;aÞ

� �40:

Proof of Theorem 5.1. Straightforward calculations establish that Eðs2preÞ ¼ s2

pre andEðs2

postÞ ¼ s2post.

As Ti and Eij are independent it follows from �Xi ¼ Ti1ðEi11Ei2Þ=2 that

Varð �XiÞ ¼ VarðTiÞ11

2VarðEi1Þ ¼ s2

111

2s2pre:

As

E1

n� 1

Xn

i¼1ð �Xi � mÞ2

� �¼ Varð �XiÞ

Biometrical Journal 53 (2011) 1 153

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

it follows that E ðs21Þ ¼ VarðTiÞ. The proof that E ðs2

2Þ ¼ s22 is similar.

limn!1 EðrÞ ¼a:s:

r follows immediately from Covð �Xi; �YiÞ ¼ CovðTi;UiÞ.From

PðUi � lTiÞ ¼ Flm� nffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi

ðs2r� s1lÞ21s2

2ð1� r2Þq

0B@

1CA

and as equations (2) – (6) establish at least asymptotically unbiased estimates for m, n, s21, s

22, and r,

Theorem 5.1 follows.

References

American College of Rheumatology (2007). A proposed revision to the ACR20: the hybrid measure ofAmerican College of Rheumatology Response. Arthritis and Rheumatism (Arthritis Care and Research)57, 193–202.

Carpenter, J. R. and Kenward, G. M. (2007). Missing data in randomised controlled trials – a practical guide.www.missingdata.org.uk

CPMP/EWP/566/98 Rev. 1 (2001) Note for guidance on clinical investigation of medicinal products in thetreatment of epileptic disorders. EMA, London. www.ema.europa.eu

CPMP/EWP/518/97 Rev. 1 (2002). Note for guidance on clinical investigation of medicinal products in thetreatment of depression. EMA, London. www.ema.europa.eu

CPMP/EWP/908/99 (2002). Points to consider on multiplicity issues in clinical trials. EMA: London.www.ema.europa.eu

CPMP/EWP/2158/99 (2006). Guideline on the choice of the non-inferiority margin. EMA: London. www.ema.europa.eu

CPMP/EWP/281/96 Rev. 1 (2008). Guideline on clinical investigation of medicinal products used in weightcontrol. EMA, London. www.ema.europa.eu

CPMP/EWP/607022/2009 (2010). Guideline on the treatment of premenstrual dysphoric disorder (PMDD) –draft. EMA: London. www.ema.europa.eu

Cochran, W. G. (1968). The errors of measurement in statistics. Technometrics 10, 637–666.Fedorov, V., Mannino, F. and Zhang, R. (2009). Consequences of dichotomization. Pharmaceutical Statistics

8, 50–61Fleiss, J. L. (1986). The Design of Measurements in Statistics. Wiley, New York.Goldberg, J. D. (1975). The effects of misclassification on the bias in the difference between two

proportions and the relative odds in the fourfold table. Journal of the American Statistical Association 70,561–567.

ICH (1998). E9 – Statistical Principles for Clinical Trials. www.ich.orgIrwig, L. M., Groeneveld, H. T. and Simpson, J. M. (1990). Correcting for measurement error in an exposure-

response relationship based on dichotomising a continuous dependent variable. Australian Journal ofStatistics 32, 261–269

Kalow, W., Tang, B. K. and Endrenyi, L. (1998). Hypothesis: comparisons of inter- and intra-individualvariations can substitute for twin studies in drug research. Pharmacogenetics 8, 50–61

Kieser, M., Rohmel, J. and Friede, T. (2004). Power and sample size determination when assessing the clinicalrelevance of trial results by responder analyses. Statistics in Medicine 23, 3287–3305.

Lewis, J. A. (2004). In defence of dichomtomy. Pharmaceutical Statistics 3, 77–79Newell, D. J. (1964). Errors in the interpretation of errors in epidemiology. American Journal of Public Health

54, 598–602.Oppenheimer, L. and Kher, U. (1999). The impact of measurement error on the comparison of two treatments

using a responder analysis. Statistics in Medicine, 18, 2177–2188.Senn, S. (2001). Individual therapy: New dawn or false dawn? Drug Information Journal 35, 1479–1494Senn, S. (2003). Disappointing Dichotomies. Pharmaceutical Statistics 2, 239–240Senn, S. (2004). Individual response to treatment: is it a valid assumption? British Medical Journal, 329,

966–968

154 M. Kunz: On responder analyses

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com

Senn, S. and Julious, S. (2009). Measurement in clinical trials: a neglected issue for statisticians? Statistics inMedicine 28, 3189–3209

Therasse, P., Arbuck, S. G., Eisenhaver, E. A., Wanders, J., Kaplan, R. S., Rubinstein, L., Verweij, J., VanGlabbeke, M., Van Oosterom, A. T., Christian, M. C. and Gwyther, S. G. (2000). New guidelines toevaluate response to treatment in solid tumors. Journal of the National Cancer Institute 92, 205–216.

Yonkers, K. A., Brown, C., Pearlstein, T. B., Foegh, M., Sampson-Landers, C. and Rapkin, A. (2005). Efficacyof a new low-dose oral contraceptive with drospirenone in premenstrual dysphoric disorder. Obstetrics andGynecology 106, 492–501.

Biometrical Journal 53 (2011) 1 155

r 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim www.biometrical-journal.com