1 Precision and Validity: Selection Bias Dr. Jørn Olsen Epi 200B January 26 and 28, 2010.
-
Upload
april-bennett -
Category
Documents
-
view
216 -
download
0
Transcript of 1 Precision and Validity: Selection Bias Dr. Jørn Olsen Epi 200B January 26 and 28, 2010.
1
Precision and Validity: Selection
Bias
Dr. Jørn OlsenEpi 200B
January 26 and 28, 2010
2
Bias and confounding (Last, Dictionary) Bias: Deviation of results or inference from
truth, or processes leading to such deviations. Any trend in the collection, analysis, interpretation, publication, or review of data that can lead to conclusions that are systematically different from the truth.
3
Bias and confounding (Last, Dictionary) Confounding: A situation in which the
effect of two processes are not separated.
Confounder, confounding factor, confounding variable-Poor term, confounding is study specific. No variables are always confounders.
4
Bias and confounding (Last, Dictionary) Selection bias: caused by the way subjects
are selected into the study or because there are selective losses of subjects prior to data analyses.
In a cohort study the first type of selection bias can often be described as selection leading to more or less confounding.
5
Selection Bias Selection as a design problem Healthy worker selection, Berkson bias Most problematic non-responders in case-
control studies, loss to follow-up
6
Survey
N % % % %
Non-respondersSmokersNon-smokers
400200400
402040
-33.366.6
-6040
-2080
All 1000 100% 100% 100% 100%
7
Follow-up study-Complete Cohort
E N D
+-
10001000
200100 RR = 2.0, RD = 0.10
8
E N D
+-E
500500D
10050
50% refuse to take part in the study
RR = 2.0, RD = 0.10
S
9
E N D
+-
1000500
20050 PR = 2.0, RD = 0.10
E D
S
E D
S
Is unlikely at baseline since they do not know D.
E D
S
C
but Is more likely C could be SES.
10
Most likely
S E D
C
In cohort studies; selection may cause confounding, perhaps more likely reduce confounding. Poor health, poor social conditions, may correlate with selection.
Conditioning on S would open an E-C path-induce confounding that was not present before
11
Large cohorts recruit seldom more than 50%
DNBC about 30%; half of GPs participated 60% of the invited accepted invitation
Selection bias – Yes, if used as a survey But when making internal comparisons?
12
Table 2. RORs Based on Adjusted* ORs in the Source Population and Among Participants
Ref Nohr et al. Epidemiology 206;17:413-8
13
Internal comparison, counterfactual guidelines
RR = 2 for this cohort External validity, generalization For the source population? For all in the future? For other ethnic groups, etc.
14
Selection bias in a cohort study is mainly related to a loss to follow-up.
Reason to expect selection bias? Will “intention to treat” solve the problem? Not when estimating effect size , but may be ok when testing Ho
RCT – A pain killerrandomization
Drug, N = 100 Placebo, N = 100
40 loss to follow-up5 loss to follow-up
15
Follow-up studyE D D All
+
-
150
50
9850
9950
10,000
10,000
RR = 3.0
16
E D D All
+
-
120
45
7880
8955
8,000
9,000
RR = 3.0
Now 20% loss to follow-up among exposed and 10% among not exposed
17
E D D All
+
-
140
40
7860
8960
8,000
9,000
RR = 3.9
Suppose we got:
How could this happen?
When is it likely?
18
E D D Total
+
-
A
C
B
D
N1
N0
Source population
E D D All
+
-
a
c
b
d
n1
n0
Study population
Selection bias if ≠A/N1
C/N0
a/n1
c/n0
19
Does condom use protect against STDs?
What is the source population for such a study?
20
A case-control study samples cases from an STD clinic and controls from the catchment area of the clinic. Any problems with that?
Results could be like this:
Males with infected partners No requirement for infected
partnersCondom use cases controls
Yes 100No 600
200600
cases controls
100600
100600
OR = 0.5 OR = 1.0
21
E D D
+-
2080
1090
100 100 OR = = 2.25
E D D
+-
2040
545
60 50 OR = = 4.50
E D
S
Selection bias is often a problem in a case-control study
20/8010/90
20/405/45
22
Response rates
E D D
+-
100%50%
50%50%
23
Response rates
ORresponders = ORtrue x ORresponse rates
4.50 = 2.25 x100/5050/50
When would we expect this pattern?
When would we expect the opposite?
24
Selections of relevance for designs
Berkson’s bias
Disease may be correlated in hospital patients but not in the population
100,000 30% asthma; 30,00010% bronchitis; 10,0000.3 x 0.1 = 0.03; 3000 have both diseases
25
100,000
30,000
3000
10,000
26
Selections of relevance for designsIn the hospital, let’s assume 40% of asthma patients get hospitalized, and 60% of patients with bronchitis
27000 asthma only - 10800 in hospital7000 bronchitis only - 4200 in hospital3000 with both diseases - 2280 in hospital
0.4 + 0.6 – 0.4 x 0.6 = 0.76
Thus overrepresented in hospital data, the 2 diseases will look as if they are associated but they are not; those with both diseases just have a higher probability of being hospitalized
A “Berkson’s like” bias could be seen for other factors that influence hospitalization rates or diagnostic probabilities.
27
Selections of relevance for designs
30,000
3000
10,000
11,080
2280
6,480
28
Smoking HBP CVD
+ 100 + 20
- 80
+ 6- 14+ 8- 72
(30%)
(10%)
- 100 + 20
- 80
+ 2- 18+ 4- 76
(10%)
(5%)
Smoking ? HBP CVD
HBP CVD risk highest for those with high blood pressure and for smokers
Estimates between smoking and HBP before or after exclusion of patients with CVD
OR – smoking exposure odds ratios for HBP
29
Smoking HBP CVD
+ 100 + 20
- 80
+ 6- 14+ 8- 72
- 100 + 20
- 80
+ 2- 18+ 4- 76
No exclusion of CVD
OR = = 120/2080/80
30
Be careful when excluding diseases from the study if they are in the causal pathway, or if they are causally linked to the end point of your study.
31
Smoking HBP CVD
+ 100 + 20
- 80
+ 6- 14+ 8- 72
- 100 + 20
- 80
+ 2- 18+ 4- 76
Use CVD as controls and exclude them from the case group
OR = = 0.3914/188/4
32
Smoking HBP CVD
+ 100 + 20
- 80
+ 6- 14+ 8- 72
- 100 + 20
- 80
+ 2- 18+ 4- 76
Use CVD as controls and include them in the case group
OR = = 0.5020/208/4
33
Smoking HBP CVD
+ 100 + 20
- 80
+ 6- 14+ 8- 72
- 100 + 20
- 80
+ 2- 18+ 4- 76
Exclude CVD patients from the control group but not from the case group
OR = = 1.0620/2072/76
34
Smoking HBP CVD
+ 100 + 20
- 80
+ 6- 14+ 8- 72
- 100 + 20
- 80
+ 2- 18+ 4- 76
Exclude them from both groups
OR = = 0.8514/1872/76
35
Using hospital controls to replace population controls is bias prone (this example is extreme, though). Controls should provide the exposure distribution in the population that gave rise to the cases.
Do not take into consideration diseases that follow this pattern:Smoking HBP CVDOnly: smokingHBP, and only if smoking is not causing CVD
CVD
Exclusion of persons with an exposure related condition from one group but not from the other introduces a threat to validity (although one of these estimates was close to 1).
Exclusion of such cases for both groups can cause bias (unless the selection criteria are confounders).
36
Healthy worker selection
Is a conceptual problem when designing the study, a violation of the counterfactual ideal
Indicates that SMR values for workers who perform physical demanding jobs tend to be less than 100. The reason is that the comparison we make are biased. The population at large include people with chronic diseases (and high mortality) that cannot perform a physically demanding job). “The sick population effect” or
“the stupid investigator effect”
37
MR
Age
population
exposed
SMR = 80
38
Selection operates into the workforce at recruitment and out of the workforce over time unemployment is associated with suicide risk – causal or bias?
How can this be studied?
39
Selection Bias-Publication Bias Decision making depends upon the
combined evidence-e.g. Cochrane reviews not just one study.
But is the source population for Meta-analyses biased?
40
Selection Bias-Publication Bias Researchers may decide not to submit
based on results Editors may decide to review or reject
based on results Reviewers may decide to recommend
publication based on results Editors may make final conclusions based
on results All of this leads to a biased source
population for reviews and meta analyses
41
Selection Bias-Publication Bias Example-Panayiotis et al Incl;
2005:97:1043-1055.
Association between TP53 (tumor suppressor protein) and risk of death in patients with head and neck cancers
42
Selection Bias-Publication BiasFig. 1
43
Selection Bias-Publication BiasFig. 2
44
Selection Bias-Publication BiasFig. 3
45
External validity? In an etiologic study the aim is to formulate
abstract hypotheses in relation to the factors under study.
The hypotheses are abstract in the sense that they are not tied to a specific population but aim to formulate a general scientific theory.
Internal validity
External validity
46
Estrogen exposure (more than 0.3 mg estrogen/d in at least 6 months) and cancer of the endometrium (N Engl J Med 1978; 299: 1089-94).
Cases: All post-menopausal gynaecological cancer patients at Yale-New Haven Medical Center 1974-1976.
Controls: Mainly patients with cancer of the cervix (60) or the ovarium (43), matched for age and race.
47
E Cases Controls
+-
3584
4115
All 119 119
OR = 12.0 (95% c.l. 4.1-35.0); = 29.52
48
Incl. all postmenopausal women with bleedings.
Cases: Same cancer patients. Controls: Women with bleedings, but no
cancer of the endometrium, matched for age and race.
49
E Cases Controls
+-
44105
23126
All 149 149
OR = 2.3 (95% c.l. 1.3-4.1); = 8.462
50
Horwitz et al. continued the discussion and presented new data in Lancet 1981;2:66-8.
In the abstract they state (shortened and modified)
“In this study, to determine the frequency with which endometrial cancer escapes detection, all necropsies on 8998 eligible women showed previously unsuspected endometrial cancer in 24 of them. The estimated rate of undetected cancer 27/10,000 is two to five times higher than the detection rate of 5/10,000 noted by the Connecticut State Tumor Registry.”
Comments?
51
Two types of endometrial cancer: A-diagnosed, B-undetected
A woman of 45 years of age would have a lifetime risk (until 80) of type A cancer
5/10,000 x 35 = 175/10,000
Better
1-e -5/10,000 x 35 = 174/10,000
The proportion of type B cases would be27/(27 + 174) = 13.4%
52
The most frequent and serious problem of selection bias in case-control studies is non-responders.
And an equal proportion of non-responding cases and controls is NOT a guarantee against selection bias.
The question is whether there is an equal selection of exposed cases and exposed controls.
53
The most serious selection problem in a follow-up study is loss to follow-up.
“If in doubt, stay out”
54
Sensitivity Analysis
55
Cohort – 10 years of follow-up
RR = = 9.0
Smoking N Loss to follow-up
End of follow-up Lung cancer
+-
10001000
200100
8010
80/80010/900
56
Sensitivity approach: Lung cancer risk among lost to follow-ups
Smokers Non-Smokers Comments RR
1/10
1/10
0
0
0(worst case)
1/90
2/90
1/90
2/90
1.0
As for followed-up
Underestimate risk for non-smokersOverestimate risk among smokersUnderestimate risk for non-smokersAll non-smokers lost to follow-up get lung cancer
9.0
8.2
7.3
6.6
0.7
57
Selection Bias Main Points
Selection of the people to the study produces bias under the following condition and more.
A. Selection bias in the design1. cross-sectional study: The sampling strategy does not produce a
representative sample of the target population
58
Selection Bias Main Points, cont.
2. Cohort study/case control study: The not exposed are too far away from the counterfactual ideal. The exposed do not provide the expected disease occurrence had the exposed not been exposed; and stratification or statistical control will not be sufficient to produce unbiased estimates of effects.
examples: health worker selection + many other poorly designed studies.
59
Selection Bias Main Points, cont.
B. Selection bias in the conduct of study; non- responders, loss to follow-up.
1. The cross-sectional study – response rate may correlate with what you want to estimates which would lead to a biased estimate of its prevalence.
Risk of selection bias is high.
60
Selection Bias Main Points, cont.
2. The cohort study – non responses at baseline will usually not correlate directly with both the exposure and the (unknown) endpoint, but selection at baseline will often change the confounder structure (will correlate with exposure). Loss to follow may correlate with both the exposure and endpoint and lead to bias.
Give higher priority to compliance to follow-up than to recruitment at baseline. Loss to
follow-up will often cause bias in the randomized trial (intention to treat analysis).
61
Selection Bias Main Points, cont.
3. The case-control study - Non-responders may well correlate with both the exposure and the endpoint since both are known at the recruitment to the study. Keeping
response rates high should be given high priority and the specific aim of the study should not be disclosed (IRB may not accept this procedure).
62
Selection Bias Main Points, cont.
Selection bias is a serious problem and should be avoided if possible. Often it is not possible and its magnitude and possible impact should be investigated.
63
Steps to avoid bias related to non-responders
Keep non-responding as low as possible, expecially in surveys and case-control studies
Try to get some information on non-responders –at best for E and D, but also on confounders
Analyse data according to the time of responding
Do sensitivity analyses
Do follow-up studies (incl RCTs)
64
So, the first concern in an etiologic study is that of VALIDITY (FREEDOM FROM BIAS –at least known bias).
Internal validity: validity of inference drawn in relation to the members of the study population.
External validity: validity of the inferences as they extend outside the population.