1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

65
1 Precision and Validity Information Bias Dr. Jørn Olsen Epi 200B January 21 and 26, 2010

Transcript of 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

Page 1: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

1

Precision and ValidityInformation Bias

Dr. Jørn OlsenEpi 200B

January 21 and 26, 2010

Page 2: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

2

Bias and confounding (Last, Dictionary)

Bias: Deviation of results or inference from truth, or processes leading to such deviations. Any trend in the collection, analysis, interpretation, publication, or review of data that can lead to conclusions that are systematically different from the truth.

Page 3: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

3

Bias and confounding (Last, Dictionary)

Confounding: A situation in which the effect of two processes are not separated.

Confounder, confounding factor, confounding variable-Poor term, confounding is study specific. No variables are always confounders.

Page 4: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

Dictionary; IEA/Last:

Information bias (observational bias):

A flaw in measuring exposure or outcome data that results in different quality (accuracy) of information between comparisons groups

Page 5: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

5

Information Bias and Other Method Problems

Information: exposures, end points, confounders, modifiers

For discrete variables: classification error/misclassification

Differential/non-differential information bias

Page 6: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

6

Data accuracy

Data are almost never 100% accurate

Coding errors, measurement errors We ask questions that cannot be

answered correctly-exposed to ETS last year

Page 7: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

7

Non-differential – does not depend upon the value of other variables

Example – diagnosing has the same sensitivity and specificity among exposed and non-exposed.Or, exposure is reported

with the same sensitivity and specificity among cases and controls

Page 8: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

Non-differential misclassification better than differential

Non-differential misclassification can often be achieved in follow-up studies

Exposures are recorded prior to disease occurrence

Diseases may be recorded by doctors who do not ask about exposures

Page 9: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

9

Recall bias misclassification of the exposure

A serious problem in case control studies or cross sectional studies based upon recall

Page 10: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

10

Recall bias

Hungarian case-control surveillance of congenital abnormalities (Epidemiology 2001; 12: 461-66.)

Drug use = self-reported data (interview, memory aids) = log-book: medicine prescribed by ANC doctors

Self-reported drug use

Log-book Yes No

Yes a b

No c d

Sensitivity a/(a+c)

Specificity d/(b+d)

Page 11: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

11

A low sensitivity is expected if mothers provide a complete recall since only ANC prescribed drugs are in the log book.

Page 12: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

12

Short-term drugs

Case status Sensitivity Specificity

All cases 0.16 0.98

Severe 0.21 0.98

Visible 0.18 0.98

Controls 0.28 0.98

Page 13: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

13

Long-term drugs

Case status Sensitivity Specificity

All 0.25 0.97

Severe 0.16 0.95

Visible 0.29 0.97

Controls 0.46 0.97

Page 14: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

14

What to do to reduce differential information bias?

Use blinding if possible-”blind till it hurts” Cochrane.

Use of hospital controls may, in some cases, help to reduce information bias.

The disease used to identify the comparison group must NOT be associated with the exposure under study (must not be a cause or a preventive factor).

Page 15: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

15

For case-control studies

First study is important No disclosure of study hypothesis Use biomarkers of exposure if

possible Use secondary data collected prior

to the disease Use neutral interviewers

Page 16: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

16

Differential misclassification of the endpoint:

sometimes a problem infollow-up studies

Page 17: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

17

Is this follow-up study vulnerable to differential misclassification of DVT?

Exposure DVT Obs time

OC +OC -

ac

t +t -

Page 18: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

18

Follow-up studies are usually less vulnerable to differential recall bias because the exposure is recorded before the end point, but knowing the hypothesis may introduce bias if the exposure is a suspected cause of the disease under study.

Blind the clinicians, if possible.

Page 19: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

19

It is often stated that non-differential misclassification leads to bias towards no association (RR = IRR = OR = 1, RD = IRD = 0)

First argument for that was provided by Bross in the 1950’s.

Non differential misclassification is not the same as random misclassification (random is only non-differential in the long run).

Random misclassification (blinding) can be very differential by chance in a small study.

Page 20: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

20

Recorded smo

True smo

+ -

Lung c + TPl FPl

- FNl TNl

Ref + TPr FPr

- FNr TNr

P = proportion of smokers; Pl and Prl = Lung cancerr = reference

Page 21: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

21

TP = P x sens

FN = P x (1-sens)

FP = (1-P) (1-spec)

TN = (1-P) spec

Page 22: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

If we take interest in the difference between Pl and

Pr, D = Pl – Pr

(normally we would take an interest in exposure odds-for example)

Page 23: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

23

We are only able to estimate Pl and Pr, and then

Include D = Pl – Prand in case of non-diff. miscl.FPL = FPr = FP FNL = FNr = FN

Pr)FPr(1TPrPrrP̂

)FPP(1TPPP̂

rP̂ - P̂ D̂

lllll

Page 24: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

24

Then = D (1– (FN + FP)) (check it out)

Meaning ≠ D if FN and FP ≠ 0 (sens + spec < 2)

FN + FP < 1.0 D < D (but same sign)

FP + FN = 1.0 D = 0 (like flipping a coin)

FN + FP = 2 D = -D (coding!)

Also true for ORs

^

^

^

Page 25: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

25

Non differential misclassification of a dichotomous variable will, in most cases, bias values towards no association (but there are other sources of error in a study and the combined effect may be away from the null)

Non differential misclassification of a variable with more than two categories can cause bias away from the null but mainly in rather unusual situations

Misclassification of a confounder can cause bias in any direction.

Page 26: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

26

When estimating relative effect measures a high specificity is wanted.

True cohort data

Exp N D D RR

+-

20,00010,000

400100

19,6009900 2.0

Exp N D RR

+-

20,00010,000

32080 2.0

If sensitivity is 0.8 but specificity is 1

Page 27: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

27

Exp N D RR

+-

20,00010,000

400 + 3920 = 4320100 + 1980 = 2080 1.04

If sensitivity is 1 but specificity is 0.80

Page 28: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

28

If sensitivity is 0.8 and specificity is 0.9

Exp N D RR

+-

20,00010,000

400 x 0.8 + 19600 x 0.10 = 2280

100 x 0.8 + 9900 x 0.10 = 1070

1.07

Page 29: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

29

The corresponding case-cohort studies would produce the following (similar) results (if done right in this situation as a case-cohort study).

Exp Cases Controls OR

+-

400100

333.33166.66

All 500 500 2.0

Page 30: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

30

The corresponding case-cohort studies would produce the following (similar) results

Exp Cases Controls OR

+-

32080

266.66133.33

All 400 400 2.0

Page 31: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

31

Exp Cases Controls OR

+-

43202080

4266.662133.33

All 6400 6400 1.04

Page 32: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

32

Exp Cases Controls OR

+-

22801070

22331117

All 3350 3350 1.07

Page 33: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

33

If we get a reference pathologist to eliminate all FP cases, we would get (for the last table)

Exp Cases Controls OR

+-

2280 – 1960 = 3201070 – 990 = 80

266.66133.33

400 4002.0

Page 34: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

34

Adjusting for misclassification is possible if sens and spec are known

Diagn D+ D- All

+ P x sens (1-P)(1-spec)

- P(1-sens) (1-P)spec 1-

All P 1-P

Page 35: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

1) - spec (sens / l) - spec P̂( P

1) - spec (sens P 1 - spec P̂

spec P P - spec - 1 sens P P̂

spec)-P)(1-(1 sens P P̂

Page 36: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

36

Example

sens = 0.44 spec = 0.94; based upon comparison with “Golden Standard” – clinical diagnosing

Sex Questionnaire – bronchitis

+ - All

M 350 1427 1777

F 277 1787 2064

RP = (350/1777) / (277/2064) = 1.47

Page 37: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

37

Exp P (M) =

(350/1777 + 0.94 – 1) / (0.44 + 0.94 – 1)= 0.360 (640 with the disease)

Exp P (F) =

(277/2064 + 0.94 – 1) / (0.44 + 0.94 – 1)= 0.195 (403 with the disease)

In case of differential misclassification, use sex specific sens and spec

403/2064

640/1777 RP = 1.85

Page 38: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

38

Misclassification of a confounder may bias a result in any direction (Greenland & Robins. Am J Epidemiol 1985:122;495-506)Let this be the true data:

Page 39: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

39

E C Cases Controls OR

+ +-

10025

200100 2.0

- +-

20100

40400 2.0

The confounder has an effect (OR=2)

The exposure has no effect (OR=1)

Page 40: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

40

Now assume exposure and disease status is recorded without error. Only the confounder is non-differential misclassified (sens=0.8 and spec=0.9), we then get:

E C Cases Controls OR

+ +-

82.542.5

170130 1.48

- +-

2694

72368 1.41

Page 41: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

41

When stratifying on the confounderTrue data

C E Cases Controls OR

+ +-

10020

20040 1.0

- +-

25100

100400 1.0

Page 42: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

42

Miscl data

C E Cases Controls OR

+ +-

82.526

17072 1.2

- +-

42.594

130368 1.5

Page 43: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

43

Misclassification is likely if we ask for sensitive data (alcohol intake), if we ask for data that can not be easily recalled like diet, if the relevant time window is short (teratology), if we give little attention to the data collection or perhaps if we give too much attention to the data collection.

Page 44: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

44

Regression towards the mean. Misclassification for a group of people because we over sample large random errors. This selection leads to misclassification.

IQ = IQ + ε

Σε = 0 for all in the study but not for those selected from extreme parts of the distribution (Σε > 0). Their measured IQs may be unusual because their IQs are unusual or because their measurement errors were large, or both. In a new round of measuring IQ one would expect Σε to be zero (at least closer to 0).

IQ^

Page 45: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

45

Regression towards the mean comes in many different forms. Assume you want to predict PTB and collect data on a number of potential risk factors.

You select those who have the highest RR and claim you can predict 60% of PTB using these markers. When you apply these ‘predictors’ in a new data source, you are in for a disappointment, why?

Page 46: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

46

Misclassification has an impact on estimates of effect sizes and power

A smaller study with better quality data may be preferable than a large study with poor quality data

Use blinding to avoid differential misclassification

Estimate misclassification/repeated measures

Page 47: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

47

Capture – recapture to estimate completeness of recording (the degree of underreporting).

If you have two different data sources (parental reporting of febrile seizures and hospitalizations for febrile seizures) you may be able to estimate these data sources actual coverage

Page 48: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

48

The arguments come from biologists and go like this:You want to know the number of salmon in a given lake; you can empty the lake and count all salmons. Or

1. You catch some salmon (M1) in the lake and give them a mark and throw them back into the lake

2. You make another catch of salmon (M2) and note how many had the mark (were caught in the first catch) M3

3. Now you know M1, M2 and M3 and you are ready to estimate the total number of salmon in the lake, N.

Page 49: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

49

P1 (first catch) M1/N

P2 (second catch) M2/N

M3 = N x P1 x P2

= N x M1/N x M2/N

M3 =

N =

M1 x M2

N

M1 x M2

M3

Page 50: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

50

Say, in our study, we had parental reports for 100 children with FS and 75 hospital reports.

Our estimate of the total number of children with FS in the study would be (if 50 were registered with FS both places)

(100 x 75)/50 = 150

Page 51: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

51

Other Problems

Page 52: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

52

In cross sectional studies, we do not know what came first

CVD – anxiety, stress, high blood pressure

But temporal ambiguity may also exist in longitudinal studies

Page 53: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

53

Many diseases have a long preclinical phase before they are diagnosed. If they have impact on E during the preclinical phase – reverse causation may be a problem. Example exposure to selen and breast cancer.

Page 54: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

54

Repeated events like in reproductive epidemiology may produce other problems.

Page 55: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

55

Example from reproductive Epidemiology Howard et al. Epidemiology 2007;18:544-51

Woman often have more than one child reproductive failures often repeat themselves

Reproductive failures may impact exposure Example smoking women who get a child with CA may stop

smoking when they plan a new pregnancy. How to analyze data?

DAG 1- No adjustment needed by Oo when analyzing E1,→O1

Page 56: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

56

DAG 2

Now a backdoor path E1←E0→O0→O1

adjustment for E0 or O0

Page 57: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

57

DAG 3

Now 2 backdoor paths E,←E0→Oo→O1 and E1←O0 → O1 adjusting for Oo blocks both paths

Page 58: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

58

DAG 4A

Page 59: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

59

DAG 4B

Page 60: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

60

Add covariate Ca that cause exposures and Cb that cause the endpoint

Incl. Oo blocks E1←Ca→Eo→ Oo← Cb→O1 adding Ca, Eo and Cb solves the problem

Page 61: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

61

DAG 5

Page 62: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

62

Now 2 backdoor paths from E1 to O1 E1←Oo←Cb→O1 and E1←Ca→Eo→O1and

Oo is a collider

CA and Cb would control this path

Page 63: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

63

Studies on diseases that are part of a screening program

No protective effect of fruit and vegetables on breast cancer. The study did not take screening into consideration.

if women who like fruits and vegetables more often take part in screening and screening is not considered in the analysis

bias in the early phase of screening?

bias under steady state?

and if this had been colon cancer?

Page 64: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

64

The ecological fallacy at the individual level

Many exposures come in packages – diet, air pollution, welding fume, coffee

Often, measurements are made at the aggregated level – carrots, coffee, etc. (more than just B-carotene and caffeine)

Page 65: 1 Precision and Validity Information Bias Dr. J ø rn Olsen Epi 200B January 21 and 26, 2010.

65

Conclusion

Make data as accurate data as possible – also true for confounders.

Avoid differential misclassification (blinding)

Estimate sensitivity and specificity of key variables if possible

Avoid low specificity when measuring ratios (RR, IRR, OR)

Do sensitivity analyses