1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.
-
Upload
rosemary-shields -
Category
Documents
-
view
218 -
download
1
Transcript of 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.
1
Precision and ValidityInformation Bias
Dr. Jørn OlsenEpi 200B
January 21 and 26, 2010
2
Bias and confounding (Last, Dictionary)
Bias: Deviation of results or inference from truth, or processes leading to such deviations. Any trend in the collection, analysis, interpretation, publication, or review of data that can lead to conclusions that are systematically different from the truth.
3
Bias and confounding (Last, Dictionary)
Confounding: A situation in which the effect of two processes are not separated.
Confounder, confounding factor, confounding variable-Poor term, confounding is study specific. No variables are always confounders.
Dictionary; IEA/Last:
Information bias (observational bias):
A flaw in measuring exposure or outcome data that results in different quality (accuracy) of information between comparisons groups
5
Information Bias and Other Method Problems
Information: exposures, end points, confounders, modifiers
For discrete variables: classification error/misclassification
Differential/non-differential information bias
6
Data accuracy
Data are almost never 100% accurate
Coding errors, measurement errors We ask questions that cannot be
answered correctly-exposed to ETS last year
7
Non-differential – does not depend upon the value of other variables
Example – diagnosing has the same sensitivity and specificity among exposed and non-exposed.Or, exposure is reported
with the same sensitivity and specificity among cases and controls
Non-differential misclassification better than differential
Non-differential misclassification can often be achieved in follow-up studies
Exposures are recorded prior to disease occurrence
Diseases may be recorded by doctors who do not ask about exposures
9
Recall bias misclassification of the exposure
A serious problem in case control studies or cross sectional studies based upon recall
10
Recall bias
Hungarian case-control surveillance of congenital abnormalities (Epidemiology 2001; 12: 461-66.)
Drug use = self-reported data (interview, memory aids) = log-book: medicine prescribed by ANC doctors
Self-reported drug use
Log-book Yes No
Yes a b
No c d
Sensitivity a/(a+c)
Specificity d/(b+d)
11
A low sensitivity is expected if mothers provide a complete recall since only ANC prescribed drugs are in the log book.
12
Short-term drugs
Case status Sensitivity Specificity
All cases 0.16 0.98
Severe 0.21 0.98
Visible 0.18 0.98
Controls 0.28 0.98
13
Long-term drugs
Case status Sensitivity Specificity
All 0.25 0.97
Severe 0.16 0.95
Visible 0.29 0.97
Controls 0.46 0.97
14
What to do to reduce differential information bias?
Use blinding if possible-”blind till it hurts” Cochrane.
Use of hospital controls may, in some cases, help to reduce information bias.
The disease used to identify the comparison group must NOT be associated with the exposure under study (must not be a cause or a preventive factor).
15
For case-control studies
First study is important No disclosure of study hypothesis Use biomarkers of exposure if
possible Use secondary data collected prior
to the disease Use neutral interviewers
16
Differential misclassification of the endpoint:
sometimes a problem infollow-up studies
17
Is this follow-up study vulnerable to differential misclassification of DVT?
Exposure DVT Obs time
OC +OC -
ac
t +t -
18
Follow-up studies are usually less vulnerable to differential recall bias because the exposure is recorded before the end point, but knowing the hypothesis may introduce bias if the exposure is a suspected cause of the disease under study.
Blind the clinicians, if possible.
19
It is often stated that non-differential misclassification leads to bias towards no association (RR = IRR = OR = 1, RD = IRD = 0)
First argument for that was provided by Bross in the 1950’s.
Non differential misclassification is not the same as random misclassification (random is only non-differential in the long run).
Random misclassification (blinding) can be very differential by chance in a small study.
20
Recorded smo
True smo
+ -
Lung c + TPl FPl
- FNl TNl
Ref + TPr FPr
- FNr TNr
P = proportion of smokers; Pl and Prl = Lung cancerr = reference
21
TP = P x sens
FN = P x (1-sens)
FP = (1-P) (1-spec)
TN = (1-P) spec
If we take interest in the difference between Pl and
Pr, D = Pl – Pr
(normally we would take an interest in exposure odds-for example)
23
We are only able to estimate Pl and Pr, and then
Include D = Pl – Prand in case of non-diff. miscl.FPL = FPr = FP FNL = FNr = FN
Pr)FPr(1TPrPrrP̂
)FPP(1TPPP̂
rP̂ - P̂ D̂
lllll
24
Then = D (1– (FN + FP)) (check it out)
Meaning ≠ D if FN and FP ≠ 0 (sens + spec < 2)
FN + FP < 1.0 D < D (but same sign)
FP + FN = 1.0 D = 0 (like flipping a coin)
FN + FP = 2 D = -D (coding!)
Also true for ORs
D̂
D̂
^
^
^
25
Non differential misclassification of a dichotomous variable will, in most cases, bias values towards no association (but there are other sources of error in a study and the combined effect may be away from the null)
Non differential misclassification of a variable with more than two categories can cause bias away from the null but mainly in rather unusual situations
Misclassification of a confounder can cause bias in any direction.
26
When estimating relative effect measures a high specificity is wanted.
True cohort data
Exp N D D RR
+-
20,00010,000
400100
19,6009900 2.0
Exp N D RR
+-
20,00010,000
32080 2.0
If sensitivity is 0.8 but specificity is 1
27
Exp N D RR
+-
20,00010,000
400 + 3920 = 4320100 + 1980 = 2080 1.04
If sensitivity is 1 but specificity is 0.80
28
If sensitivity is 0.8 and specificity is 0.9
Exp N D RR
+-
20,00010,000
400 x 0.8 + 19600 x 0.10 = 2280
100 x 0.8 + 9900 x 0.10 = 1070
1.07
29
The corresponding case-cohort studies would produce the following (similar) results (if done right in this situation as a case-cohort study).
Exp Cases Controls OR
+-
400100
333.33166.66
All 500 500 2.0
30
The corresponding case-cohort studies would produce the following (similar) results
Exp Cases Controls OR
+-
32080
266.66133.33
All 400 400 2.0
31
Exp Cases Controls OR
+-
43202080
4266.662133.33
All 6400 6400 1.04
32
Exp Cases Controls OR
+-
22801070
22331117
All 3350 3350 1.07
33
If we get a reference pathologist to eliminate all FP cases, we would get (for the last table)
Exp Cases Controls OR
+-
2280 – 1960 = 3201070 – 990 = 80
266.66133.33
400 4002.0
34
Adjusting for misclassification is possible if sens and spec are known
Diagn D+ D- All
+ P x sens (1-P)(1-spec)
- P(1-sens) (1-P)spec 1-
All P 1-P
P̂
P̂
1) - spec (sens / l) - spec P̂( P
1) - spec (sens P 1 - spec P̂
spec P P - spec - 1 sens P P̂
spec)-P)(1-(1 sens P P̂
36
Example
sens = 0.44 spec = 0.94; based upon comparison with “Golden Standard” – clinical diagnosing
Sex Questionnaire – bronchitis
+ - All
M 350 1427 1777
F 277 1787 2064
RP = (350/1777) / (277/2064) = 1.47
37
Exp P (M) =
(350/1777 + 0.94 – 1) / (0.44 + 0.94 – 1)= 0.360 (640 with the disease)
Exp P (F) =
(277/2064 + 0.94 – 1) / (0.44 + 0.94 – 1)= 0.195 (403 with the disease)
In case of differential misclassification, use sex specific sens and spec
403/2064
640/1777 RP = 1.85
38
Misclassification of a confounder may bias a result in any direction (Greenland & Robins. Am J Epidemiol 1985:122;495-506)Let this be the true data:
39
E C Cases Controls OR
+ +-
10025
200100 2.0
- +-
20100
40400 2.0
The confounder has an effect (OR=2)
The exposure has no effect (OR=1)
40
Now assume exposure and disease status is recorded without error. Only the confounder is non-differential misclassified (sens=0.8 and spec=0.9), we then get:
E C Cases Controls OR
+ +-
82.542.5
170130 1.48
- +-
2694
72368 1.41
41
When stratifying on the confounderTrue data
C E Cases Controls OR
+ +-
10020
20040 1.0
- +-
25100
100400 1.0
42
Miscl data
C E Cases Controls OR
+ +-
82.526
17072 1.2
- +-
42.594
130368 1.5
43
Misclassification is likely if we ask for sensitive data (alcohol intake), if we ask for data that can not be easily recalled like diet, if the relevant time window is short (teratology), if we give little attention to the data collection or perhaps if we give too much attention to the data collection.
44
Regression towards the mean. Misclassification for a group of people because we over sample large random errors. This selection leads to misclassification.
IQ = IQ + ε
Σε = 0 for all in the study but not for those selected from extreme parts of the distribution (Σε > 0). Their measured IQs may be unusual because their IQs are unusual or because their measurement errors were large, or both. In a new round of measuring IQ one would expect Σε to be zero (at least closer to 0).
IQ^
45
Regression towards the mean comes in many different forms. Assume you want to predict PTB and collect data on a number of potential risk factors.
You select those who have the highest RR and claim you can predict 60% of PTB using these markers. When you apply these ‘predictors’ in a new data source, you are in for a disappointment, why?
46
Misclassification has an impact on estimates of effect sizes and power
A smaller study with better quality data may be preferable than a large study with poor quality data
Use blinding to avoid differential misclassification
Estimate misclassification/repeated measures
47
Capture – recapture to estimate completeness of recording (the degree of underreporting).
If you have two different data sources (parental reporting of febrile seizures and hospitalizations for febrile seizures) you may be able to estimate these data sources actual coverage
48
The arguments come from biologists and go like this:You want to know the number of salmon in a given lake; you can empty the lake and count all salmons. Or
1. You catch some salmon (M1) in the lake and give them a mark and throw them back into the lake
2. You make another catch of salmon (M2) and note how many had the mark (were caught in the first catch) M3
3. Now you know M1, M2 and M3 and you are ready to estimate the total number of salmon in the lake, N.
49
P1 (first catch) M1/N
P2 (second catch) M2/N
M3 = N x P1 x P2
= N x M1/N x M2/N
M3 =
N =
M1 x M2
N
M1 x M2
M3
50
Say, in our study, we had parental reports for 100 children with FS and 75 hospital reports.
Our estimate of the total number of children with FS in the study would be (if 50 were registered with FS both places)
(100 x 75)/50 = 150
51
Other Problems
52
In cross sectional studies, we do not know what came first
CVD – anxiety, stress, high blood pressure
But temporal ambiguity may also exist in longitudinal studies
53
Many diseases have a long preclinical phase before they are diagnosed. If they have impact on E during the preclinical phase – reverse causation may be a problem. Example exposure to selen and breast cancer.
54
Repeated events like in reproductive epidemiology may produce other problems.
55
Example from reproductive Epidemiology Howard et al. Epidemiology 2007;18:544-51
Woman often have more than one child reproductive failures often repeat themselves
Reproductive failures may impact exposure Example smoking women who get a child with CA may stop
smoking when they plan a new pregnancy. How to analyze data?
DAG 1- No adjustment needed by Oo when analyzing E1,→O1
56
DAG 2
Now a backdoor path E1←E0→O0→O1
adjustment for E0 or O0
57
DAG 3
Now 2 backdoor paths E,←E0→Oo→O1 and E1←O0 → O1 adjusting for Oo blocks both paths
58
DAG 4A
59
DAG 4B
60
Add covariate Ca that cause exposures and Cb that cause the endpoint
Incl. Oo blocks E1←Ca→Eo→ Oo← Cb→O1 adding Ca, Eo and Cb solves the problem
61
DAG 5
62
Now 2 backdoor paths from E1 to O1 E1←Oo←Cb→O1 and E1←Ca→Eo→O1and
Oo is a collider
CA and Cb would control this path
63
Studies on diseases that are part of a screening program
No protective effect of fruit and vegetables on breast cancer. The study did not take screening into consideration.
if women who like fruits and vegetables more often take part in screening and screening is not considered in the analysis
bias in the early phase of screening?
bias under steady state?
and if this had been colon cancer?
64
The ecological fallacy at the individual level
Many exposures come in packages – diet, air pollution, welding fume, coffee
Often, measurements are made at the aggregated level – carrots, coffee, etc. (more than just B-carotene and caffeine)
65
Conclusion
Make data as accurate data as possible – also true for confounders.
Avoid differential misclassification (blinding)
Estimate sensitivity and specificity of key variables if possible
Avoid low specificity when measuring ratios (RR, IRR, OR)
Do sensitivity analyses