1 Precision and Validity: Selection Bias Dr. Jørn Olsen Epi 200B January 26 and 28, 2010.

1

Precision and Validity: Selection

Bias

Dr. Jørn OlsenEpi 200B

January 26 and 28, 2010

2

Bias and confounding (Last, Dictionary) Bias: Deviation of results or inference from

truth, or processes leading to such deviations. Any trend in the collection, analysis, interpretation, publication, or review of data that can lead to conclusions that are systematically different from the truth.

3

Bias and confounding (Last, Dictionary) Confounding: A situation in which the

effect of two processes are not separated.

Confounder, confounding factor, confounding variable-Poor term, confounding is study specific. No variables are always confounders.

4

Bias and confounding (Last, Dictionary) Selection bias: caused by the way subjects

are selected into the study or because there are selective losses of subjects prior to data analyses.

In a cohort study the first type of selection bias can often be described as selection leading to more or less confounding.

5

Selection Bias Selection as a design problem Healthy worker selection, Berkson bias Most problematic non-responders in case-

control studies, loss to follow-up

6

Survey

N % % % %

Non-respondersSmokersNon-smokers

400200400

402040

-33.366.6

-6040

-2080

All 1000 100% 100% 100% 100%

7

Follow-up study-Complete Cohort

E N D

+-

10001000

200100 RR = 2.0, RD = 0.10

8

E N D

+-E

500500D

10050

50% refuse to take part in the study

RR = 2.0, RD = 0.10

S

9

E N D

+-

1000500

20050 PR = 2.0, RD = 0.10

E D

S

E D

S

Is unlikely at baseline since they do not know D.

E D

S

C

but Is more likely C could be SES.

10

Most likely

S E D

C

In cohort studies; selection may cause confounding, perhaps more likely reduce confounding. Poor health, poor social conditions, may correlate with selection.

Conditioning on S would open an E-C path-induce confounding that was not present before

11

Large cohorts recruit seldom more than 50%

DNBC about 30%; half of GPs participated 60% of the invited accepted invitation

Selection bias – Yes, if used as a survey But when making internal comparisons?

12

Table 2. RORs Based on Adjusted* ORs in the Source Population and Among Participants

Ref Nohr et al. Epidemiology 206;17:413-8

13

Internal comparison, counterfactual guidelines

RR = 2 for this cohort External validity, generalization For the source population? For all in the future? For other ethnic groups, etc.

14

Selection bias in a cohort study is mainly related to a loss to follow-up.

Reason to expect selection bias? Will “intention to treat” solve the problem? Not when estimating effect size , but may be ok when testing Ho

RCT – A pain killerrandomization

Drug, N = 100 Placebo, N = 100

40 loss to follow-up5 loss to follow-up

15

Follow-up studyE D D All

+

-

150

50

9850

9950

10,000

10,000

RR = 3.0

16

E D D All

+

-

120

45

7880

8955

8,000

9,000

RR = 3.0

Now 20% loss to follow-up among exposed and 10% among not exposed

17

E D D All

+

-

140

40

7860

8960

8,000

9,000

RR = 3.9

Suppose we got:

How could this happen?

When is it likely?

18

E D D Total

+

-

A

C

B

D

N1

N0

Source population

E D D All

+

-

a

c

b

d

n1

n0

Study population

Selection bias if ≠A/N1

C/N0

a/n1

c/n0

19

Does condom use protect against STDs?

What is the source population for such a study?

20

A case-control study samples cases from an STD clinic and controls from the catchment area of the clinic. Any problems with that?

Results could be like this:

Males with infected partners No requirement for infected

partnersCondom use cases controls

Yes 100No 600

200600

cases controls

100600

100600

OR = 0.5 OR = 1.0

21

E D D

+-

2080

1090

100 100 OR = = 2.25

E D D

+-

2040

545

60 50 OR = = 4.50

E D

S

Selection bias is often a problem in a case-control study

20/8010/90

20/405/45

22

Response rates

E D D

+-

100%50%

50%50%

23

Response rates

ORresponders = ORtrue x ORresponse rates

4.50 = 2.25 x100/5050/50

When would we expect this pattern?

When would we expect the opposite?

24

Selections of relevance for designs

Berkson’s bias

Disease may be correlated in hospital patients but not in the population

100,000 30% asthma; 30,00010% bronchitis; 10,0000.3 x 0.1 = 0.03; 3000 have both diseases

25

100,000

30,000

3000

10,000

26

Selections of relevance for designsIn the hospital, let’s assume 40% of asthma patients get hospitalized, and 60% of patients with bronchitis

27000 asthma only - 10800 in hospital7000 bronchitis only - 4200 in hospital3000 with both diseases - 2280 in hospital

0.4 + 0.6 – 0.4 x 0.6 = 0.76

Thus overrepresented in hospital data, the 2 diseases will look as if they are associated but they are not; those with both diseases just have a higher probability of being hospitalized

A “Berkson’s like” bias could be seen for other factors that influence hospitalization rates or diagnostic probabilities.

27

Selections of relevance for designs

30,000

3000

10,000

11,080

2280

6,480

28

Smoking HBP CVD

+ 100 + 20

- 80

+ 6- 14+ 8- 72

(30%)

(10%)

- 100 + 20

- 80

+ 2- 18+ 4- 76

(10%)

(5%)

Smoking ? HBP CVD

HBP CVD risk highest for those with high blood pressure and for smokers

Estimates between smoking and HBP before or after exclusion of patients with CVD

OR – smoking exposure odds ratios for HBP

29

Smoking HBP CVD

+ 100 + 20

- 80

+ 6- 14+ 8- 72

- 100 + 20

- 80

+ 2- 18+ 4- 76

No exclusion of CVD

OR = = 120/2080/80

30

Be careful when excluding diseases from the study if they are in the causal pathway, or if they are causally linked to the end point of your study.

31

Smoking HBP CVD

+ 100 + 20

- 80

+ 6- 14+ 8- 72

- 100 + 20

- 80

+ 2- 18+ 4- 76

Use CVD as controls and exclude them from the case group

OR = = 0.3914/188/4

32

Smoking HBP CVD

+ 100 + 20

- 80

+ 6- 14+ 8- 72

- 100 + 20

- 80

+ 2- 18+ 4- 76

Use CVD as controls and include them in the case group

OR = = 0.5020/208/4

33

Smoking HBP CVD

+ 100 + 20

- 80

+ 6- 14+ 8- 72

- 100 + 20

- 80

+ 2- 18+ 4- 76

Exclude CVD patients from the control group but not from the case group

OR = = 1.0620/2072/76

34

Smoking HBP CVD

+ 100 + 20

- 80

+ 6- 14+ 8- 72

- 100 + 20

- 80

+ 2- 18+ 4- 76

Exclude them from both groups

OR = = 0.8514/1872/76

35

Using hospital controls to replace population controls is bias prone (this example is extreme, though). Controls should provide the exposure distribution in the population that gave rise to the cases.

Do not take into consideration diseases that follow this pattern:Smoking HBP CVDOnly: smokingHBP, and only if smoking is not causing CVD

CVD

Exclusion of persons with an exposure related condition from one group but not from the other introduces a threat to validity (although one of these estimates was close to 1).

Exclusion of such cases for both groups can cause bias (unless the selection criteria are confounders).

36

Healthy worker selection

Is a conceptual problem when designing the study, a violation of the counterfactual ideal

Indicates that SMR values for workers who perform physical demanding jobs tend to be less than 100. The reason is that the comparison we make are biased. The population at large include people with chronic diseases (and high mortality) that cannot perform a physically demanding job). “The sick population effect” or

“the stupid investigator effect”

37

MR

Age

population

exposed

SMR = 80

38

Selection operates into the workforce at recruitment and out of the workforce over time unemployment is associated with suicide risk – causal or bias?

How can this be studied?

39

Selection Bias-Publication Bias Decision making depends upon the

combined evidence-e.g. Cochrane reviews not just one study.

But is the source population for Meta-analyses biased?

40

Selection Bias-Publication Bias Researchers may decide not to submit

based on results Editors may decide to review or reject

based on results Reviewers may decide to recommend

publication based on results Editors may make final conclusions based

on results All of this leads to a biased source

population for reviews and meta analyses

41

Selection Bias-Publication Bias Example-Panayiotis et al Incl;

2005:97:1043-1055.

Association between TP53 (tumor suppressor protein) and risk of death in patients with head and neck cancers

42

Selection Bias-Publication BiasFig. 1

43


44


45

External validity? In an etiologic study the aim is to formulate

abstract hypotheses in relation to the factors under study.

The hypotheses are abstract in the sense that they are not tied to a specific population but aim to formulate a general scientific theory.

Internal validity

External validity

46

Estrogen exposure (more than 0.3 mg estrogen/d in at least 6 months) and cancer of the endometrium (N Engl J Med 1978; 299: 1089-94).

Cases: All post-menopausal gynaecological cancer patients at Yale-New Haven Medical Center 1974-1976.

Controls: Mainly patients with cancer of the cervix (60) or the ovarium (43), matched for age and race.

47

E Cases Controls

+-

3584

4115

All 119 119

OR = 12.0 (95% c.l. 4.1-35.0); = 29.52

48

Incl. all postmenopausal women with bleedings.

Cases: Same cancer patients. Controls: Women with bleedings, but no

cancer of the endometrium, matched for age and race.

49

E Cases Controls

+-

44105

23126

All 149 149

OR = 2.3 (95% c.l. 1.3-4.1); = 8.462

50

Horwitz et al. continued the discussion and presented new data in Lancet 1981;2:66-8.

In the abstract they state (shortened and modified)

“In this study, to determine the frequency with which endometrial cancer escapes detection, all necropsies on 8998 eligible women showed previously unsuspected endometrial cancer in 24 of them. The estimated rate of undetected cancer 27/10,000 is two to five times higher than the detection rate of 5/10,000 noted by the Connecticut State Tumor Registry.”

Comments?

51

Two types of endometrial cancer: A-diagnosed, B-undetected

A woman of 45 years of age would have a lifetime risk (until 80) of type A cancer

5/10,000 x 35 = 175/10,000

Better

1-e -5/10,000 x 35 = 174/10,000

The proportion of type B cases would be27/(27 + 174) = 13.4%

52

The most frequent and serious problem of selection bias in case-control studies is non-responders.

And an equal proportion of non-responding cases and controls is NOT a guarantee against selection bias.

The question is whether there is an equal selection of exposed cases and exposed controls.

53

The most serious selection problem in a follow-up study is loss to follow-up.

“If in doubt, stay out”

54

Sensitivity Analysis

55

Cohort – 10 years of follow-up

RR = = 9.0

Smoking N Loss to follow-up

End of follow-up Lung cancer

+-

10001000

200100

8010

80/80010/900

56

Sensitivity approach: Lung cancer risk among lost to follow-ups

Smokers Non-Smokers Comments RR

1/10

1/10

0

0

0(worst case)

1/90

2/90

1/90

2/90

1.0

As for followed-up

Underestimate risk for non-smokersOverestimate risk among smokersUnderestimate risk for non-smokersAll non-smokers lost to follow-up get lung cancer

9.0

8.2

7.3

6.6

0.7

57

Selection Bias Main Points

Selection of the people to the study produces bias under the following condition and more.

A. Selection bias in the design1. cross-sectional study: The sampling strategy does not produce a

representative sample of the target population

58

Selection Bias Main Points, cont.

2. Cohort study/case control study: The not exposed are too far away from the counterfactual ideal. The exposed do not provide the expected disease occurrence had the exposed not been exposed; and stratification or statistical control will not be sufficient to produce unbiased estimates of effects.

examples: health worker selection + many other poorly designed studies.

59


B. Selection bias in the conduct of study; non- responders, loss to follow-up.

1. The cross-sectional study – response rate may correlate with what you want to estimates which would lead to a biased estimate of its prevalence.

Risk of selection bias is high.

60


2. The cohort study – non responses at baseline will usually not correlate directly with both the exposure and the (unknown) endpoint, but selection at baseline will often change the confounder structure (will correlate with exposure). Loss to follow may correlate with both the exposure and endpoint and lead to bias.

Give higher priority to compliance to follow-up than to recruitment at baseline. Loss to

follow-up will often cause bias in the randomized trial (intention to treat analysis).

61


3. The case-control study - Non-responders may well correlate with both the exposure and the endpoint since both are known at the recruitment to the study. Keeping

response rates high should be given high priority and the specific aim of the study should not be disclosed (IRB may not accept this procedure).

62


Selection bias is a serious problem and should be avoided if possible. Often it is not possible and its magnitude and possible impact should be investigated.

63

Steps to avoid bias related to non-responders

Keep non-responding as low as possible, expecially in surveys and case-control studies

Try to get some information on non-responders –at best for E and D, but also on confounders

Analyse data according to the time of responding

Do sensitivity analyses

Do follow-up studies (incl RCTs)

64

So, the first concern in an etiologic study is that of VALIDITY (FREEDOM FROM BIAS –at least known bias).

Internal validity: validity of inference drawn in relation to the members of the study population.

External validity: validity of the inferences as they extend outside the population.

1 Precision and Validity: Selection Bias Dr. Jørn Olsen Epi 200B January 26 and 28, 2010.

Documents

Transcript of 1 Precision and Validity: Selection Bias Dr. Jørn Olsen Epi 200B January 26 and 28, 2010.