Selection of Controls in Case-Control Studies · Abbreviation: RDO, random digit dialing. 1...

13
American Journal of Epidemiology Vol. 135, No 9 Copyright C 1992 by The Johns Hopkins University School of Hygiene and Public Health Printed in U.S.A. All rights reserved Selection of Controls in Case-Control Studies II. Types of Controls Shotom Wacholder, 1 Debra T. Sllverman, 1 Joseph K. McLaughlin, 1 and Jack S. Mandel 2 Types of control groups are evaluated using the principles described in paper 1 of the series, "Selection of Controls in Case-Control Studies" (S. Wacholder et al. Am J Epidemiol 1992;135:1019-28). Advantages and disadvantages of population controls, neighborhood controls, hospital or registry controls, medical practice controls, friend controls, and relative controls are considered. Problems with the use of deceased controls and proxy respondents are discussed. Am J Epidemiol 1992;135:1029-41. bias (epidemiology); epidemidogic methods; retrospective studies In this paper, we apply the comparability principles of study base, deconfounding, and comparable accuracy presented in our pre- vious paper (1) to the practical problem of choosing a control group. A number of choices for sources of controls are discussed and evaluated within the framework of the principles. We also offer specific suggestions that we believe are useful in choosing con- trols. POPULATION CONTROLS In a study with a primary base, where the focus is the disease experience of a popula- tion during a specified time interval in a defined geographic area, randomly sampled controls from that population satisfy the study base principle. More complex sam- pling schemes, such as frequency matching or cluster sampling (2, 3), are also appropri- ate, as long as the analysis properly accounts Received for publication May 8, 1991, and in final form February 11, 1992. Abbreviation: RDO, random digit dialing. 1 Biostatistjcs Branch, National Cancer Institute, Be- thesda, MD. 2 Department of Environmental and Occupational Health, School of Public Health, University of Minnesota, Minneapolis, MN. Reprint requests to Dr. Shotom Wacholder, Biostatisttcs Branch, National Cancer Institute, 6130 Executive Blvd., EPN 403, Rockville, MD 20892. for the sampling plan. When a roster iden- tifying all members of the base is available, controls can be selected simply as a random sample from that roster as in a nested case- control study (2) or a case-cohort study (2). When the probability of case identifica- tion among members of a primary base de- pends on a variable, the study base principle is violated and there can be selection bias, unless control selection depends proportion- ally on values of that variable. For example, when the probability of disease diagnosis depends on access to medical care, a hospital control series with similar dependence on access might be more appropriate than pop- ulation controls (4). Or, in a case-control study of occupational risk factors using pop- ulation controls, investigators might con- sider excluding cases diagnosed at smaller hospitals in the catchment region for logistic reasons. If these hospitals tend to serve rural communities, however, urban occupations may be overrepresented among the cases. An alternative type of control, such as hos- pital controls, or stratification by geographic factors in the design or analysis may alleviate this problem. Advantages of population controls There are a number of advantages of pop- ulation controls. 1029 at McGill University Libraries on December 2, 2010 aje.oxfordjournals.org Downloaded from

Transcript of Selection of Controls in Case-Control Studies · Abbreviation: RDO, random digit dialing. 1...

American Journal of Epidemiology Vol. 135, No 9Copyright C 1992 by The Johns Hopkins University School of Hygiene and Public Health Printed in U.S.A.All rights reserved

Selection of Controls in Case-Control Studies

II. Types of Controls

Shotom Wacholder,1 Debra T. Sllverman,1 Joseph K. McLaughlin,1 and Jack S. Mandel2

Types of control groups are evaluated using the principles described in paper 1 ofthe series, "Selection of Controls in Case-Control Studies" (S. Wacholder et al. Am JEpidemiol 1992;135:1019-28). Advantages and disadvantages of population controls,neighborhood controls, hospital or registry controls, medical practice controls, friendcontrols, and relative controls are considered. Problems with the use of deceasedcontrols and proxy respondents are discussed. Am J Epidemiol 1992;135:1029-41.

bias (epidemiology); epidemidogic methods; retrospective studies

In this paper, we apply the comparabilityprinciples of study base, deconfounding, andcomparable accuracy presented in our pre-vious paper (1) to the practical problem ofchoosing a control group. A number ofchoices for sources of controls are discussedand evaluated within the framework of theprinciples. We also offer specific suggestionsthat we believe are useful in choosing con-trols.

POPULATION CONTROLS

In a study with a primary base, where thefocus is the disease experience of a popula-tion during a specified time interval in adefined geographic area, randomly sampledcontrols from that population satisfy thestudy base principle. More complex sam-pling schemes, such as frequency matchingor cluster sampling (2, 3), are also appropri-ate, as long as the analysis properly accounts

Received for publication May 8, 1991, and in final formFebruary 11, 1992.

Abbreviation: RDO, random digit dialing.1 Biostatistjcs Branch, National Cancer Institute, Be-

thesda, MD.2 Department of Environmental and Occupational

Health, School of Public Health, University of Minnesota,Minneapolis, MN.

Reprint requests to Dr. Shotom Wacholder, BiostatisttcsBranch, National Cancer Institute, 6130 Executive Blvd.,EPN 403, Rockville, MD 20892.

for the sampling plan. When a roster iden-tifying all members of the base is available,controls can be selected simply as a randomsample from that roster as in a nested case-control study (2) or a case-cohort study (2).

When the probability of case identifica-tion among members of a primary base de-pends on a variable, the study base principleis violated and there can be selection bias,unless control selection depends proportion-ally on values of that variable. For example,when the probability of disease diagnosisdepends on access to medical care, a hospitalcontrol series with similar dependence onaccess might be more appropriate than pop-ulation controls (4). Or, in a case-controlstudy of occupational risk factors using pop-ulation controls, investigators might con-sider excluding cases diagnosed at smallerhospitals in the catchment region for logisticreasons. If these hospitals tend to serve ruralcommunities, however, urban occupationsmay be overrepresented among the cases.An alternative type of control, such as hos-pital controls, or stratification by geographicfactors in the design or analysis may alleviatethis problem.

Advantages of population controls

There are a number of advantages of pop-ulation controls.

1029

at McG

ill University Libraries on D

ecember 2, 2010

aje.oxfordjournals.orgD

ownloaded from

1030 Wacholder et al.

Same study base. Selection of populationcontrols from a primary base ensures thatthe controls are drawn from the same sourcepopulation as the case series (the study baseprinciple (I)). The mechanisms for samplingfrom a population are similar to those com-monly used in survey research.

Exclusions. The definition of the base canencompass the exclusions, e.g., not being inthe catchment area at the appropriate time,being a previous case of the disease understudy, or not being at risk of the diseaseunder study, such as women who have hada hysterectomy in a uterine cancer study.

Extrapolation to base. The distribution ofexposures in the controls can be readily ex-trapolated to the base for purposes such ascalculations of absolute or attributable risk(5) or to learn about the distribution ofexposures in the population; for example, adetailed diet questionnaire could be used tostudy differences in food consumption be-tween blacks and whites. By contrast, a hos-pital control series that is appropriate for astudy using cases drawn from the same hos-pital cannot be used for estimating attribut-able risk in a population without makingassumptions about the representativeness ofthe case series in that population.

Disadvantages of population controls

Population controls can be inappropriatewhen there is incomplete case ascertainmentor when even approximate random sam-pling of the study base is impossible becauseof nonresponse or inadequacies of the sam-pling frame (6). When case ascertainmentis incomplete and probability of ascertain-ment depends on the factors being studied,hospital-based or other controls may be pref-erable (7).

Population controls have some other dis-advantages.

Inconvenience. Sampling from the popu-lation instead of using a more readily avail-able series, such as other hospitalized pa-tients, can be less convenient and moreexpensive.

Recall bias. Differences between cases andhealthy controls can lead to violation of the

comparable accuracy principle. Despite aninterviewer's best efforts to have the subject'sresponse refer to the period before disease,the responses by a previously hospitalizedcase may reflect modifications in exposuredue to the disease itself, such as drinking lesscoffee or alcohol after an ulcer, or due tochanges in perception of past habits afterbecoming ill.

Less motivation. Population controls maybe less motivated to cooperate than hospitalcontrols.

Selection of population controls when aroster exists

Selection of population controls is sim-plest when there is a complete listing of thestudy base. It is useful if the roster has atelephone number or at least an address withwhich to make contact with the subject.Rosters that have been used include thefollowing: annual residence lists, compiledby law in some areas such as Massachusetts(8) and several European and Asian coun-tries (9); birth certificate records for studiesof disease in children (10); Health Care Fi-nancing Administration files for Medicarerecipients, with coverage of about 98 percentof the United States population aged 65years and over (11); and electoral lists pre-pared for each election by a door-to-doorsurvey in such countries as Great Britainand Canada. It is important to rememberthat, when the cases are identified by amethod other than follow-up of subjects onthe roster, such as through a disease registry,cases who are not included on the roster,such as those who are not citizens whenusing electoral lists, should be excluded fromthe study.

Selection of population controls when noroster exists

When no roster exists, it can be difficultto ensure that every eligible subject in thestudy base has the same chance of selection.Bias can be induced when methods rely oncontacts with a household, either by tele-phone (12) or in person, and only one eligi-ble control per household is selected. Con-

at McG

ill University Libraries on D

ecember 2, 2010

aje.oxfordjournals.orgD

ownloaded from

Types of Control Groups 1031

sider a study of a disease in children wherecontrols must be within 2 years of age of thecase to which they are matched and onlyone control per household will be selected.A child with a sibling of similar age is lesslikely to be selected as a control than onewith no siblings. This violates the study baseprinciple and, since cases are selected regard-less of the ages of other family members, canlead to bias in estimating the effects of vari-ables related to family size (12). Independentdetermination of whether to select each sub-ject would eliminate this problem, or morethan one control can be selected from thesame home, and the possible dependence intheir responses can be accounted for in theanalysis (3).

Random digit dialing. Random digit dial-ing (RDD) can be used to select populationcontrols when no roster exists. RDD and itsvariants generate sets of telephone numberswithout relying on a directory that wouldnot have new or unpublished numbers (13).The aim of RDD is to ensure that eachresidential number has an equal chance ofselection, while minimizing the number ofphone calls to nonresidential or inappro-priate numbers (13).

Investigators have flexibility in the detailsof RDD (13, 14). In the standard method(13), a random sample is drawn from work-ing sets of telephone exchanges provided bythe telephone company, i.e., the first severalnumbers of the complete telephone number,typically eight of ten (including area code)in the United States. The number is thencompleted with two random numbers. Thecomplete number is dialed to determinewhether it is a working residential phone. Ifit is, a predetermined number of calls aremade to that exchange; if not, the exchangeis discarded. The extra steps are included inorder to reduce the numbers of calls to ex-changes that have relatively few residentialnumbers.

Does RDD satisfy the study base principleor can it generate a selection bias? It is easyto show that each phone number in the areahas the same chance of being reached. Theprobability that an exchange is selected isproportional to the number of working

numbers in the exchange, but the probabilitythat a particular number is chosen from thatexchange is inversely proportional to thenumber of working numbers in the ex-change. However, the goal in control selec-tion is a random sample of eligible subjects,not of telephone numbers. Incompletephone coverage, residences that can bereached by more than one phone number,more than one person in the household whois eligible to be a control, and nonresponsecan all lead to possible selection bias, unlessaccommodation is made in the design oranalysis. Stratification on numbers of tele-phone lines and eligible residents in thehousehold can alleviate some of the prob-lems. Advances in technology, such as an-swering machines and call forwarding, haveadded complications to this method.

The first contact with a household is mostoften used for screening and to obtain acensus of the household. Information on theaddress of the house and on the name, age,sex, and race of each household member isobtained. Based on the responses, a sam-pling frame is generated, and a random sam-ple of identified eligible subjects is selectedto be controls. These individuals can thenbe contacted by letter, by telephone, or inperson for interview. When the samplingscheme is simple, such as when it is basedsimply on age and sex, the census and theinterview itself can be done in a one-stepprocess (14, 15), thereby reducing the overallpercentage of refusal.

When telephone coverage is low, RDDwill miss a substantial proportion of subjectsand can result in biased estimates of theeffects of exposures related to telephone cov-erage, such as socioeconomic status. Whiletelephone coverage in the United States is93 percent, it is lower for residents of theSouth, for blacks, and for the poor, in 1986,only 56 percent of Southern blacks living inhouseholds with yearly income below$5,000 had telephones (16). In the UnitedStates, the difference in the proportion ofsmokers in households with phones and inthose without phones is 1 percent, thoughthis difference is 4 percent among those withincome below $5,000 (16). Incomplete tele-

at McG

ill University Libraries on D

ecember 2, 2010

aje.oxfordjournals.orgD

ownloaded from

1032 Wachdder et al.

phone coverage is often, therefore, a smallerproblem than nonresponse and refusal.

The clustering of exchanges in RDD re-sults in a sample that is not random, sincenot every possible subset of the populationcan be chosen. This clustering does not biasestimates of effects but can lead to overlyoptimistic estimates of the precision of pointestimates unless addressed in the analysis(3).

Sometimes a variation of RDD is usedwith controls selected using the exchange ofthe case in order to generate subjectsmatched on factors that are difficult to mea-sure but are believed to be related to geo-graphic area (17,18). It is not clear, however,whether "neighborhood matching" wouldsatisfy the study base criterion. Would con-trols selected to have similar phone numbersactually become cases (e.g., be admitted tothe study hospital, if the cases were a hospitalseries) had they been diagnosed with disease?Moreover, the extent to which such a pro-cedure matches on area or region has notbeen empirically demonstrated. Matchingon primary care practice, discussed below,may be a better approach in some situations.

RDD can be expensive and time-consum-ing when targeting subgroups of the popu-lation. For example, an average of almost35 households must be screened to identifyone black male, 64 households to identifyone Hispanic male between 20 and 29 yearsof age (19), and almost 70 to identify onemale aged 75 or older (13). Hence, somestudies in the United States use Health CareFinancing Administration rosters as a sub-stitute to identify controls over age 65.

RDD is an option for identifying controlsfrom a particular ethnic group who tend tobe clustered in certain neighborhoods be-cause less effort is expended in nonproduc-tive exchanges (13). The Donnelley database (Donnelley Marketing InformationServices, Stanford, Connecticut), which clas-sifies exchanges by race and income, canalso be used for stratification to identifyblacks and Hispanics more cost effectively(19); it is not so helpful, however, for iden-tifying groups such as Asian Americanswhose residential patterns are less concen-

trated geographically (19). However, thistechnique may violate the study base prin-ciple and lead to bias when the exposuresare related to cultural factors associated withthe diversity of the subject's neighborhood,since eligible subjects living in heteroge-neous areas may be underrepresented in thecontrol group. For example, Asian Ameri-cans living in so-called Chinatowns are likelyto have life-styles and diets different fromthose living in an ethnically heterogeneousneighborhood.

Neighborhood controls. Population con-trols can also be selected using residences,rather than telephone numbers, as the sam-pling unit. This strategy can be particularlyuseful when telephone coverage is low. Inarea probability sampling, controls are se-lected randomly from a roster of residences,perhaps obtained from a recent census.However, creating a roster when one is notavailable can be extremely expensive.

Instead of using a random sample from aroster of residences, "neighborhood con-trols" are typically selected from residencesin the same city block or other geographicarea as the case, in an attempt to reduce thevariability of factors such as access to med-ical care and socioeconomic status. Forneighborhood controls to satisfy the studybase principle, one must consider the baseas divided into geographically defined strata.Use of neighborhood controls in a study witha secondary base may not satisfy the princi-ple; for example, a neighbor who would notbe admitted to the same hospital under thecircumstances that led to admission of thecase would be outside the base. Thus, biascould result if the source of cases is a reli-giously affiliated hospital and the neighbor-hood is religiously heterogeneous.

Neighborhood controls are usually chosendeterministically (nonrandomly) within astratum defined geographically. If the selec-tion is not random, one must rely on anassumption that the selection process is in-dependent of the exposure, which is equiv-alent to the exposure distribution being thesame as that in the study base (1, 20, 21).To protect that independence, the inter-viewer should not be given the flexibility to

at McG

ill University Libraries on D

ecember 2, 2010

aje.oxfordjournals.orgD

ownloaded from

Types of Control Groups 1033

choose which house to select. Instead, a par-ticular algorithm, such as the one depictedin reference 22 or perhaps based on a reversedirectory (sorted by address), should be usedin order to avoid bias arising from inter-viewer selection of residences that appearmore likely to cooperate.

Ideally, the neighborhood control shouldhave been a resident of the house when theindex case was diagnosed. Controls who re-cently moved into a neighborhood and arechosen to match cases from the neighbor-hood diagnosed several years earlier shouldbe excluded since they are outside the studybase. Excluding controls who have movedinto the neighborhood since diagnosis of thecase reduces the study base problem but doesnot solve it, since people who moved out ofthe neighborhood will still be missed (I).Whenever cases are diagnosed several yearsbefore control selection, use of current resi-dents (or any other current roster) raises thepossibility of distortion of the distributionsof factors associated with mortality and mi-gration, particularly if socioeconomic or eth-nic characteristics of the neighborhood havechanged. Old reverse directories, visits tolong-term residents, property tax records,old plat maps, and registers of deeds can beused to historically reconstruct neighbor-hoods (23) in order to find neighborhoodcontrols contemporaneous with the case.

Neighborhood controls have two main ad-vantages. 1) Control selection does not re-quire the existence of a roster or use of atelephone, and 2) confounding factors asso-ciated with neighborhood may be balancedbetween cases and controls.

Thus, neighborhood controls can be anattractive alternative for studies with a pri-mary base when no roster of the populationis available or, possibly, for studies wherethe cases are obtained from hospital lists.The disadvantages of neighborhood controlsinclude the potential for not satisfying thestudy base principle, particularly in studieswith a secondary base; the high cost associ-ated with contacting each potential control(24); the use of the household as the sam-pling unit, as in selection of controls bytelephone; and the difficulty in documenting

nonresponse, since one does not know thenumber of eligible subjects in homes forwhich there is no response. In addition, theremay be overmatching on the study exposurebecause of similarities between cases andcontrols from the same neighborhood onexposures related to residence, particularlyin buildings with more than one householdunit. These multiunit dwellings, especiallyapartment buildings, present additionalproblems because of the difficulty of enum-erating all household units within the build-ing for sampling. Moreover, access to thebuildings themselves can be a problem inmany large cities.

HOSPITAL OR DISEASE REGISTRYCONTROLS

When a list of admissions or dischargesfrom a hospital or clinic is the source of thecases, the same list can be used as the sourceof controls, too. The points below regardinghospital controls also apply to disease regis-try controls, drawn, for example, from atumor or malformation registry. One attrac-tion of hospital controls is that one canreasonably assume that patients admitted tothe same hospital as the cases are membersof the same (secondary) base (1, 20). Themost serious danger with hospital controls isthat choosing subjects with other diseasesmay jeopardize the assumption of represent-ativeness of exposure (1, 20, 21), namely,that the distribution of the exposures understudy in the controls is the same as that in arandom sample from the base that producedthe cases. This is equivalent to assuming thatthere is no relation between exposure andthe diagnoses used to determine inclusion ofcontrols. For example, use of other womenwho undergo dilatation and curettage ascontrols for a study of endometrial cancer(25) probably meets the assumption regard-ing membership in the secondary base butfails the representativeness of exposure as-sumption when estrogen use is related toconditions indicating dilatation and curett-age (26-28).

The representativeness of exposure as-sumption is not quite as difficult to meet as

at McG

ill University Libraries on D

ecember 2, 2010

aje.oxfordjournals.orgD

ownloaded from

1034 Wacholder et al.

it may seem because it must hold only withinstrata used in the analysis or conditionallyon factors for which adjustment will bemade in the analysis (1, 20). Thus, for ex-ample, stratification by sex eliminates biasfor an exposure even if the exposure is as-sociated with a control disease uncondition-ally, as long as it is unassociated with thecontrol disease separately for males and fe-males.

Hospital or registry controls are usuallymore appropriate than are population con-trols if a sizable fraction of diseased subjectsin the base will not become cases in thestudy and if the ones who do have differentexposures from those who do not. For ex-ample, multiple sclerosis patients referred toan academic center about 60 miles (about100 km) away were found to have demo-graphic and severity characteristics differentfrom those of other multiple sclerosis pa-tients at the center from the same area whowere not referred (29). It would be difficultto reflect this heterogeneous referral patternusing population controls. Use of hospitalcontrols from another disease with a similarreferral pattern might provide more assur-ance that all subjects share the same studybase; alternatively, stratifying by geographyor by referral status might be effective.

Use of a hospital control series consistingof subjects with a disease the outward man-ifestations of which are identical to those ofthe disease of interest can eliminate onesource of selection bias. When differentialdiagnosis is made on these subjects, the oneswith the index disease become cases and theones with the "imitation" disease becomecontrols. If the imitation disease is unrelatedto the exposure of interest, these controlswould be appropriate; Miettinen (20, p. 79)describes them as "ideal."

Hospital controls have other advantages.Comparable quality of information. A ma-

jor advantage is that generally hospital con-trols are more comparable to cases withrespect to quality of information, since theytoo have been ill and hospitalized. However,careful consideration of the environmentwhere information is gathered, the content

and phrasing of the questions, and the dis-eases to be included in the control series isneeded, since simply being sick does notnecessarily entail comparable accuracy andavoidance of recall bias. Selecting hospitalcontrols with conditions that are believed tolead to similar errors in recall may alleviatesome of the problems that cause this formof information bias. For example, in a studyof birth-related risk factors for testicular can-cer in men treated at a military or tertiarycare hospital, controls were age-matchedmen with other cancers, presumed to beunrelated to the study exposures, at the samehospitals (30). Since subjects' mothers pro-vided information on the key exposures,which occurred during early childhood, theuse of hospital controls was particularly ap-propriate because it ensured that the sons ofall the mothers interviewed for the study hadhad malignancies (30). Further, the studyhospitals drew patients from across theUnited States, so these controls were likelyto have referral patterns similar to those ofthe cases (30).

Convenience. Hospital controls may bethe most convenient choice when controlswill be asked to provide bodily fluids or toundergo a physical examination, as whenlooking for dysplastic nevi in a study of skinmelanoma.

In addition to the need to satisfy the rep-resentativeness of exposure assumptionsnoted above, hospital controls have otherdifficulties.

Different catchments. Even when controlsare identified from the same registry or hos-pital as the cases, the catchments for differ-ent diseases within the same hospital maybe different (31), violating the study baseprinciple. For example, an urban teachinghospital associated with a medical centermay provide primary medical care to poorpeople in the neighborhood and also serveas a tertiary referral center providing sophis-ticated services for certain medical condi-tions. Restricting the study base to peopleliving in the vicinity of the hospital canalleviate the problem (20, 21) but may re-duce the number of cases substantially.

at McG

ill University Libraries on D

ecember 2, 2010

aje.oxfordjournals.orgD

ownloaded from

Types of Control Groups 1035

Stratification by distance between hospitaland residence or by referral status might bean effective alternative.

Berkson's bias. If the study exposure isrelated to the risk of being hospitalized forthe control disease, the exposure distributionin the series may not reflect the base. Forexample, diabetics are more likely to beadmitted to the hospital with heart diseasethan are nondiabetics, which could biasstudies focusing on diet. This is an exampleof Berkson's bias, which is caused by selec-tion of subjects into a study differentially onfactors related to exposure (32).

Composition of a hospital control series

We believe that the best strategy regardingthe selection of diseases to form a hospital-or disease registry-based control group is toexclude from the control series all conditionslikely to be related to exposure (20, 33). Thepayoff for the extra effort in the study designwill be more confidence in the validity ofthe results. If there is an association, subjectsadmitted to the hospital for the disease needto be excluded from the control series (34);however, a previous history of the diseaseshould not be grounds for exclusion, unlessthe exclusion is also applied to cases (35).These exclusion rules apply regardless ofwhether the association is positive or nega-tive, causal or not.

In theory, a possible association of expo-sure with a control disease should be assessedafter controlling for confounders includedin the analysis. If adjustment for a con-founder eliminates a crude association be-tween exposure and a potential control dis-ease, adjustment for that same confounderin the analysis of the study will eliminatethe bias caused by using that disease as asource of controls.

Similarly, an association between the con-trol disease and a confounder is acceptable,if the effect of the confounder is controlledin the analysis. Also, patients with any dis-ease that cannot be clearly distinguishedfrom the study disease should be excluded

from the control series to reduce bias due tomisclassification of disease.

If there is complete confidence that a sin-gle disease is unrelated to the exposure ofinterest, the entire control series may beselected from among patients with thatdisease. However, only rarely is there con-vincing evidence that the assumption of in-dependence of the study exposure and acontrol disease is satisfied. Therefore, inclu-sion of patients with several diseases mini-mizes potential bias if any one disease turnsout to be related to exposure (36, 37). Whenrelated diseases, such as other cancers for astudy of a particular cancer or other peri-natal outcomes for a study of birth defects,are used, the possibility of information biasmay be reduced (38, 39). Again, however,any of the diseases that are related to theexposure (based on a priori knowledge)should be excluded (33). Overall, we rec-ommend using more diseases rather thanfewer; this protects the investigators iflater evidence links one or more of the con-trol diseases positively or negatively to anexposure.

CONTROLS FROM A MEDICALPRACTICE

Choosing controls from the primary med-ical practice of the cases can be a usefulstrategy when it is otherwise difficult to findcontrols who are comparable to cases onaccess to medical care or referral to special-ized clinics. For example, medical practicecontrols may be more appropriate than hos-pital controls when cases are drawn from anurban teaching hospital, because potentialsubjects admitted to this hospital may bemixtures of poor clinic patients and high-socioeconomic-status private patients in fardifferent ratios from those of the case series.

Controls selected from the same medicalpractices as the cases are drawn from theappropriate secondary base (I, 20), if onecan make the assumption that two patientsin the same primary care practice with thesame presentation would follow the samepathway through the medical care system. Adisadvantage of medical practice controls is

at McG

ill University Libraries on D

ecember 2, 2010

aje.oxfordjournals.orgD

ownloaded from

1036 Wacholder et al.

the extra complexity entailed by a randomselection process for controls within severaldifferent practices.

The study base principle can be jeopard-ized with medical practice controls, sincethe exposure distribution for controls maynot be the same as that in the study base, aswhen patients choose a physician for reasonsrelating to particular conditions that arethemselves related to exposure. Similarly, ifthose conditions lead to modification of ex-posure subsequent to symptoms or treat-ment, the study base principle can be vio-lated, unless the timing of the changes canbe ascertained. For example, medical prac-tice controls were used in a study that re-ported a positive association between coffeedrinking and pancreatic cancer risk (40).Because the controls were patients with gas-trointestinal disorders, some of them hadconditions that were either caused by coffeedrinking or treated by removing coffee fromthe diet, and thus the level of coffee drinkingwas not representative of the study base. Themagnitude of the bias introduced by inclu-sion of such controls is dependent on theproportion of controls with diet-alteringconditions and the relations of these diseasesto coffee consumption (41). Control condi-tions associated with a confounder do notneed to be excluded, if there will be adjust-ment for the confounder in the analysis.Thus, smoking-related conditions not re-lated to coffee drinking might be includedin the control series (34).

FRIEND CONTROLS

Friends of cases may be a more conven-ient and inexpensive source of controls thanare other alternatives. Controls can be se-lected from a list of friends or associatesobtained from the case at little extra effortwhile the case is being interviewed. Friendsmay be likely to use the medical system insimilar ways. Moreover, biases due to socialclass are reduced since usually the case andfriend control will be of a similar socioeco-nomic background.

Nonetheless, we have strong reservations

about the use of friend controls. A possibletheoretical justification of friend controls isthat the base is divided into mutually exclu-sive "friendship strata" and that the expo-sure of a friend control is representative ofthat of the friendship stratum. Alternativejustifications for friend controls are as anonrandom sample from the base, if friendswill all be in the same study base, or, if not,as an indirect way to probe the base (1, 20).None of these rationales is very persuasive.It is unrealistic to believe that the study baseis divided into mutually exclusive friendshipstrata and that the controls are selected fromonly within the case's stratum (42). Even ifthis were true, the control selection withinthe stratum is deterministic and possiblyrelated to exposure (1, 42). The credibilityof representativeness of exposure is low forfactors related to sociability, such as gregar-iousness or, possibly, smoking, diet, or al-cohol consumption, because sociable peopleare more likely to be selected as controlsthan are loners (42).

"Friendly control" bias was suspected ina case-control study of patients with insulin-dependent diabetes mellitus, where friendcontrols were designated by the parents ofcases (43). Insulin-dependent diabetes mel-litus cases were found to be more likely tohave learning problems, to have few friends,to dislike school, and to have recent illnessin the family. While these findings could bedue to true risk factors or to recall bias, theyare more likely to be due to selection bias,since children perceived to have problemsmay be less likely to be identified as friendsand, therefore, as controls (44). Further,there was some evidence suggesting that par-ents gave names of children from familieswith "mainstream" social characteristics(44).

A less serious problem is that the use offriend controls can lead to overmatching,since friends tend to be similar with regardto life-style and occupational exposures ofinterest, as in a study of head trauma andseizures (45); the loss of efficiency due toovermatching depends on how strongly headtrauma is correlated among friends (e.g.,

at McG

ill University Libraries on D

ecember 2, 2010

aje.oxfordjournals.orgD

ownloaded from

Types of Control Groups 1037

motorcycle racers and boxers) and howclosely it is related to gregariousness.

These problems can be alleviated to someextent by asking cases for the names of sev-eral friends and choosing controls randomlyfrom the list or by asking for names ofassociates rather than friends (31,46) so thatthe control will not be the case's closest, andperhaps most sociable, friend. However,those on the list will still tend to be moresociable than is a loner who is not on any-one's list (but can become a case), and thereis no reason to believe that the extra friendsnamed will have different characteristicsfrom those who would be named on ashorter list (47). In addition, some cases maynot be willing to provide names of friends(48), increasing nonresponse.

Despite serious shortcomings, friend con-trols may be useful in some exceptional cir-cumstances, such as in a study of exposuresunrelated to friendship characteristics, as islikely in a study of a genetically determinedmetabolic disorder (48, 49).

RELATIVE CONTROLS

The choice of relative controls is moti-vated by the deconfounding principle, notthe study-base principle (1). When geneticfactors confound the effect of exposure,blood relatives of the case have been used asa source of controls (50) in an attempt tomatch on genetic background. Spousal andsibship relationships form strata and meetthe reciprocity requirement (1, 42), but thetheoretical justification for other relatives ismore tenuous. Spouses might be a suitablecontrol group if matching on adult environ-mental risk factors is sought. When siblingcontrols are used in studies of the associationbetween genetic markers and the risk ofcancer, confounding by factors related toethnicity is minimized (51); however, casesand controls may be overmatched on a va-riety of genetic and environmental factorsthat are not risk factors but are related tothe exposure under study. For example, ef-fects of risk factors associated with familysize cannot be assessed in a study using

sibling controls because of overmatching(31).

Trade-offs in using relative controls areillustrated in a study of the association be-tween tonsillectomy and Hodgkin's disease(52) that used two control groups, siblingsand spouses, to control for socioeconomicstatus in childhood and adulthood, respec-tively. A higher risk for tonsillectomy wasfound with spousal controls than with sib-ling controls, suggesting either positive con-founding by childhood socioeconomic statusor negative confounding by adult socioeco-nomic status.

THE CASE SERIES AS THE SOURCEOF CONTROLS

An individual can serve as his own controlfor a study of an acute event when the effectof an exposure is transient (53), such as theeffect of a possible triggering activity onmyocardial infarction. The impact of theexposure is evaluated by comparing the pro-portions of events occurring during the pu-tative period of elevated risk and the pro-portions of time each individual has been atelevated risk. This "case-crossover" design(53) can be thought of as a case-controldesign where each stratum consists of a sin-gle individual (or as a cohort study withmany noninformative strata). The studybase principle is clearly satisfied. Althoughthere is no possibility of between-subjectconfounding, a second exposure that tendsto occur at the same time or at differenttimes from the study exposure can causeconfounding (53). This design has the ad-vantages that only patients need to be stud-ied (53) and that recurrences can be handledeasily.

However, for studies of chronic diseaseswhere the main focus is on more stable time-dependent covariates, the use of a studyseries of cases only, as might be found in adisease registry, requires a complete and ac-curate exposure history and the strong as-sumption that the exposure of interest isunrelated to overall mortality (54). This

at McG

ill University Libraries on D

ecember 2, 2010

aje.oxfordjournals.orgD

ownloaded from

1038 Wacholder et al.

study design may also have lower powerthan more conventional studies (54).

PROXY RESPONDENTS ANDDECEASED CONTROLS

Interviews with proxy respondents areoften used when subjects are deceased or toosick to answer questions or for persons withperceptual or cognitive disorders (55). Be-cause proxy respondents will tend to be usedmore often for cases than for healthy con-trols, violation of the comparable accuracyprinciple is likely. Surrogates, particularlyspouses and children, generally provide ac-curate responses for broad categories of ex-posure information, although more detailedinformation is usually less reliable (56-60).For some variables, such as cigarette smok-ing, and consumption of coffee and alcohol,spouses and children are remarkably accu-rate, even when compared with reinter-viewed living subjects (61, 62). Proxies mayeven provide better information than theindex subjects, such as in nutritional studiesamong older subjects, where a subject's wifemay have prepared much of her husband'sfood (63).

When feasible, reducing the time intervalbetween diagnosis and interview of cases canreduce the number of proxy interviews re-quired. When information is obtained froma surrogate because the case is dead, using aliving control sampled properly from thebase can violate the comparable accuracyprinciple. However, insisting on a dead con-trol (64) violates the study base principle,since the base consists of living subjects andsubjects who die represent a special samplefrom that base. In order to use dead controls,one needs to assume representativeness ofexposure (1), that the dead controls have thesame distribution of exposure variables asdoes the base. This assumption has beendemonstrated to be incorrect for a numberof personal behavior variables, including useof tobacco and alcohol (65), even afterdeaths from causes believed to be associatedwith the study exposure are excluded (66).

Interviews with surrogates of appropri-

ately selected living controls do not makeaccuracy fully comparable since the controlsare still alive while the cases are deceased,and responses by their surrogates may beinfluenced by factors associated with thesubject's death (67). Nonetheless, validationstudies (68, 69), in which the responses of aproportion of the living subjects and theirproxies are obtained, can be used to reducethe bias due to errors from proxy responses.

In studies with deceased cases, the use ofproxy interviews for appropriately selectedlive controls is usually preferable to the useof dead controls, particularly if the studyexposure is likely to be associated with over-all mortality. The advisability of insisting ona proxy interview for a live control dependson what information will be obtained fromthe interview. When exposure is assesseddirectly, comparable accuracy for cases andcontrols in an interview designed primarilyto elicit information on confounders doesnot necessarily reduce the bias in the esti-mate of the effect of exposure (1, 67, 70);therefore, a proxy interview for the controlmay not help. Using proxy interviews forlive controls should be considered only when1) information about a key study exposureis to be obtained by interview and 2) a proxyreport for the case is likely to be substantiallyless accurate than the control's self-reportabout the key study exposure.

Controls in proportionate mortalitystudies

A proportionate mortality study can beviewed as a case-control study with controlsobtained from a registry consisting of deaths(71). The underlying assumption is that thedistribution of the exposure under studyamong subjects who died from other causesis the same as that in the base, which consistsof living persons only. Just as in otherregistry-based studies, deaths from causesrelated to the study exposure must be ex-cluded from the control series (21). Thus,this kind of study may not be suitable forinvestigating exposures such as smoking thatare risk factors for causes of death repre-

at McG

ill University Libraries on D

ecember 2, 2010

aje.oxfordjournals.orgD

ownloaded from

Types of Control Groups 1039

senting a high proportion of mortality. Oneadvantage of the approach is that a rosterfor the eligible controls can be establishedconveniently; any absences from the basetypically will not lead to selection bias, sincethe efficiency of the system for registeringdeaths from most causes is unlikely to varysubstantially with cause of death. However,errors in attribution of cause of death dooccur, for example for AIDS, suicide, orcirrhosis of the liver, resulting in misclassi-fication bias and over- or underexclusion ofsubjects.

NUMBER OF CONTROL GROUPS

Some researchers have suggested choosingmore than one control group (72, 73). Itcertainly is reassuring when the results areconcordant across control series. However,when the results are discordant (25, 27, 52),the investigators must decide which result is"correct" and essentially discard the other.We therefore believe that doubt is not a goodbasis for choosing an additional controlgroup. Rather, the best strategy usually is todecide which series is preferable at the designstage.

Multiple control groups might be helpfulwhen each serves a different purpose, aswhen each control group provides the abilityto control for a particular confounder. Inthis situation, the second control group canact as a form of replication.

REFERENCES

1. Wacholder S, McLaughlin JK, Silverman DT, etal. Selection of controls in case-control studies. I.Principles. Am J Epidemiol 1992,135:1019-28.

2. Wacholder S, Silverman DT, McLaughlin JK, etal. Selection of controls in case-control studies. III.Design options. Am J Epidemiol 1992; 135:1042-50.

3. Graubard B, Fears TR, Gail MH. Effects of clustersampling on epidemiologic analysis in population-based case-control studies. Biometrics 1989;45:1053-71.

4. Cole P. Introduction. In: Breslow NE, Day NE,eds. Statistical methods in cancer research. Vol 1.The analysis of case-control studies. Lyon: Inter-national Agency for Research on Cancer, 1980:14—40. (IARC scientific publication no. 32).

5. Whittemore AS. Estimating attributable risk fromcase-control studies. Am J Epidemiol 1983; 117:76-85.

6. Gail M, Lubin JH, Silverman DT. Elements ofdesign in epidemiologic studies. In: DeLisi C, Ei-senfeld J, eds. Statistical methods in cancer epide-miology. Amsterdam: North Holland PublishingCo, 1985:313-23.

7. Savitz DA, Pearce N. Control selection with incom-plete case ascertainment. Am J Epidemiol 1988;127:1109-17.

8. Cole P, Monson RR, Haning H, et al. Smokingand cancer of the lower urinary tract. N Engl JMed 1971,284:129-34.

9. Adami HO, Rimsten A, Stenkvist B, et al. Repro-ductive history and risk of breast cancer. Cancer1978;41:747-57.

10. Gold EB, Diener MD, Szklo M. Parental occupa-tions and cancer in children. J Occup Med 1982;24:578-84.

11. Hatten J. Medicare's common denominator thecovered population. Health Care Finan Rev 1980;2:53-64.

12. Greenberg ER. Random digit dialing for controlselection. A review and a caution on its use instudies of childhood cancer. Am J Epidemiol 1990;131:1-5.

13. Waksberg J. Sampling methods for random digitdialing. J Am Stat Assoc 1978;73:4O-6.

14. Harlow BL, Davis S. Two one-step methods forhousehold screening and interviewing using ran-dom digit dialing. Am J Epidemiol 1988; 127:857-63.

15. Harlow B, Hartge P. Telephone household screen-ing and interviewing. Am J Epidemiol 1983; 117:632-3.

16. Thornberry TO, Massey JT. Trends in UnitedStates telephone coverage across time andsubgroups. In: Groves RM, Biemer PB, Lyberg LE,et al., eds. Telephone survey methodology. NewYork: John Wiley & Sons, Inc, 1988:25-49.

17. Ward EM, Kramer S, Meadows AT. The efficacyof random digit dialing in selecting matched con-trols for a case-control study of pediatric cancer.Am J Epidemiol 1984; 120:582-91.

18. Robison LL, Daigle A. Control selection usingrandom digit dialing for cases of childhood cancer.Am J Epidemiol 1984; 120:164-6.

19. Mohadjer L. Stratification of prefix areas for sam-pling rare populations. In: Groves RM, Biemer PB,Lyberg LE, et al., eds. Telephone survey method-ology. New York: John Wiley & Sons, Inc, 1988:161-73.

20. Miettinen OS. Theoretical epidemiology: principlesof occurrence research in medicine. New York:John Wiley & Sons, Inc, 1985.

21. Miettinen OS. The "case-control" study: valid se-lection of subjects. J Chronic Dis 1985;38:543-8.

22. Cohen BH. Family patterns of longevity and mor-tality. In: Neel JV, Shaw MW, Schull WJ, eds.Genetics and the epidemiology of chronic diseases.Washington, DC: US Department of Health, Edu-cation, and Welfare, 1963:237-63. (DHEW publi-cation no. 1163).

23. Massey FJ, Bernstein FS, O'Fallon WM, et al.Vasectomy and health. JAMA 1984;252:1023-9.

24. Vernick LJ, Vernick SL, Kuller KH. Selection of

at McG

ill University Libraries on D

ecember 2, 2010

aje.oxfordjournals.orgD

ownloaded from

1040 Wacholder et al.

neighborhood controls: logistics and fieldwork. JChronic Dis 1984;37:177-82.

25. Horwitz RI, Feinstein AR. Alternative analyticmethods for case-control studies of estrogens andendomctrial cancer. N Engl J Med 1978;299:1089-94.

26. Hutchison GB, Rothman KJ. Correcting a bias. NEngl J Med 1978;299:1129-30.

27. Hulka BS, Grimson RC, Greenberg BG, et al."Alternative" controls in a case-control study ofendometrial cancer and exogenous estrogen. Am JEpidemiol 1980; 112:376-87.

28. Rothman KJ. Modern epidemiology. Boston: Lit-tle, Brown & Company, 1986.

29. Nelson LM, Franklin GM, Hamman RF, et al.Referral bias in multiple sclerosis research. J ClinEpidemiol 1988;41:187-92.

30. Brown LM, Pottern LM, Hoover RN. Prenatal andperinatal risk factors for testicular cancer. CancerRes 1986;46:4812-16.

31. MacMahon B, Pugh TF. Epidemiology: principlesand methods. Boston: Little, Brown & Company,1970.

32. Flanders WD, Boyle CA, Boring JR. Bias associatedwith differential hospitalization rates in incidentcase-control studies. J Clin Epidemiol 1989;42:395-401.

33. Wacholder S, Silverman DT. Re: "Case-controlstudies using other diseases as controls: problemsof excluding exposure-related diseases." (Letter).Am J Epidemiol 1990;132:1017-18.

34. MacMahon B, Yen S, Trichopoulos D, et al. Coffeeand cancer of the pancreas. (Letter). N Engl J Med1981:304:1605-6.

35. Lubin JH, Hartge P. Excluding controls: misappli-cations in case-control studies. Am J Epidemiol1984; 120:791-3.

36. Jick H, Vessey MP. Case-control studies in theevaluation of drug-induced illness. Am J Epidemiol1978:107:1-7.

37. Axelson O. The case-referent study—some com-ments on its structure, merits, and limitations.Scand J Work Environ Health 1985;11:207-13.

38. Linet MS, Brookmeyer R. Use of cancer controlsin case-control cancer studies. Am J Epidemiol1987;125:1-11.

39. Smith AH, Pearce NE, Callas PW. Cancer case-control studies with other cancers as controls. Int JEpidemiol 1988; 17:298-306.

40. MacMahon B, Yen S, Trichopoulos D, et al. Coffeeand cancer of the pancreas. N Engl J Med 1981;304:630-3.

41. Silverman DT, Hoover RN, Swanson GM, et al.The prevalence of coffee drinking among hospital-ized and population-based control groups. JAMA1983:249:1877-80.

42. Robins J, Pike M. The validity of case-controlstudies with nonrandom selection of controls.Epidemiology 1990; 1:273-84.

43. Siemiatycki J, Colle S, Campbell S, et al. Case-control study of insulin-dependent (type I) diabetesmellitus. Diabetes Care 1989; 12:209-16.

44. Siemiatycki J. Friendly control bias. J Clin Epide-miol 1989;42:687-8.

45. Hochberg F, Toniolo P, Cole P. Head trauma andseizures as risk factors of glioblastoma. Neurology1984;34:1511-14.

46. Kelsey JL, Thompson WD, Evans AS. Methods inobservational epidemiology. New York: OxfordUniversity Press, 1986.

47. Thompson WD. Nonrandom yet unbiased. Epide-miology 1990; 1:262-5.

48. Shaw GL, Tucker MA, Kase RG, et al. Problemsascertaining friend controls in a case-control studyof lung cancer. Am J Epidemiol 1991 ;133:63—6.

49. Flanders WD, Austin H. Possibility of selectionbias in matched case-control studies using friendcontrols. Am J Epidemiol 1986; 124:150-3.

50. Goldstein AM, Hodge SE, Haile RW. Selectionbias in case-control studies using relatives as thecontrols. Int J Epidemiol 1989; 18:985-9.

51. Petrakis NL, King MC. Genetic markers and can-cer epidemiology. Cancer 1977,39:1861-6.

52. Gutensohn N, Li F, Johnson R, et al. Hodgkin'sdisease, tonsillectomy, and family size. N Engl JMed 1975;292:22-5.

53. Maclure M. The case-crossover design: a methodfor studying transient effects on the risk of acuteevents. Am J Epidemiol 1991;133:144-53.

54. Prentice RL, Vollmer WM, Kalbfleisch JD. On theuse of the case series to identify disease risk factors.Biometrics 1984;40:445-58.

55. Nelson LM, Longstreth WT Jr, KoepseU TD, et al.Proxy respondents in epidemiologic research. Epi-demiol Rev 199O;12:71-86.

56. Rogot E, Reid DD. The validity of data from nextof kin in studies of mortality among migrants. IntJ Epidemiol 1951;4:51-4.

57. Kolonel LN, Hirohata T, Nomura AMY. Ade-quacy of survey data collected from substitute re-spondents. Am J Epidemiol 1977; 106:476-84.

58. Marshall J, Priorc R, Haughey B, ct al. Spouse-subject interviews and the reliability of diet studies.Am J Epidemiol 1980; 112:675-83.

59. Lerchen ML, Samet JM. An assessment of thevalidity of questionnaire responses provided by asurviving spouse. Am J Epidemiol 1986; 123:481-9.

60. Blot WJ, McLaughlin JK. Practical issues in thedesign and conduct of case-control studies: use ofnext-of-kin interviews. In: Blot WJ, Hirayama T,Hoel DG, eds. Statistical methods in cancer epide-miology. Hiroshima: Radiation Effects ResearchFoundation, 1985:49-62.

61. McLaughlin JK, Dietz MS, Mehl ES, et al. Relia-bility of surrogate information on cigarette smokingby type of informant. Am J Epidemiol 1987; 126:144-6.

62. McLaughlin JK, Mandel JS, Mehl ES, et al. Reli-ability of next-of-kin and self-respondents for cig-arette, coffee, and alcohol consumption. Epidemi-ology 1990;l:408-12.

63. Samet JM. Surrogate sources of dietary informa-tion. In: Willett W, ed. Nutritional epidemiology.New York: Oxford University Press, 1990:133-42.

64. Gordis L. Should dead cases be matched to deadcontrols? Am J Epidemiol 1982;115:1-5.

65. McLaughlin JK, Blot WJ, Mehl ES, et al. Problemsin the use of dead controls in case-control studies.I. General results. Am J Epidemiol 1985,121:131-9.

66. McLaughlin JK, Blot WJ, Mehl ES, et al. Problemsin the use of dead controls in case-control studies.II. Effect of excluding certain causes of death. Am

at McG

ill University Libraries on D

ecember 2, 2010

aje.oxfordjournals.orgD

ownloaded from

Types of Control Groups 1041

J Epidemiol 1985;122:485-94. 70.67. Walker AM, Velema JP, Robins JM. Analysis of

case-control data derived in part from proxy re-spondents. Am J Epidemiol 1988; 127:905-14. 71.

68. Armstrong BG, Whittemore AS, Howe GR. Analy-sis of case-control data with covariate measurementerror application to diet and colon cancer. Stat 72.Med 1989;8:1151-65.

69. Armstrong BG. The effects of measurement errorson relative risk regressions. Am J Epidemiol 1990; 73.132:1176-84.

Greenland S. The effect of misclassification in thepresence of covariates. Am J Epidemiol 1980; 112:564-9.Miettinen OS, Wang J-D. An alternative to theproportionate mortality ratio. Am J Epidemiol1981;114:144-8.Ibrahim MA, Spitzer WO. The case-control study:the problem and the prospect J Chronic Dis 1979;32:139-44.The case-control study. (Editorial). Br Med J 1979;2:884-5.

at McG

ill University Libraries on D

ecember 2, 2010

aje.oxfordjournals.orgD

ownloaded from