Confounding Lecture

Preparatory lecture

Confounding and Bias

Random Error 2

Introduction

• Most epidemiological studies measure disease frequency in two (or more) groups that differ only on the exposure of interest. The two measures of disease frequency are combined into a single measure of association – risk or rate ratio, odds ratio, risk or rate difference

Random Error 3

Introduction

• The next step is to evaluate whether the result that has been observed in the data is true, or whether the observed result is false and there is an alternate explanation. This is the process of assessing validity of a study result.

Accuracy vs. precision

Accuracy: obtaining results close to truth

Survey 1

Survey 2

Survey 3

Real population

value


Precision: obtaining similar results with repeated measurement (may or may not be accurate)


Poor precision (from small sample size) with reasonable accuracy (without bias):


Good precision (from small sample size) with reasonable accuracy (without bias):


Good precision (from large sample size), but with poor accuracy (with bias):

In sum…• Sampling error

– Difference between survey result and population value due to random selection of sample

– Greater with smaller sample sizes– Induces lack of precision

• Bias– Difference between survey result and population value due to error

in measurement, selection of non-representative sample or other factors

– Due to factors other than sample size– Therefore, a large sample size cannot guarantee absence of bias– Induces lack of accuracy, even with good precision

Definitions ERROR: 1. A false or mistaken result obtained in a study

or experiment2. Random error is the portion of variation in

measurement that has no apparent connection to any other measurement or variable, generally regarded as due to chance

3. Systematic error which often has a recognizable source, e.g., a faulty measuring instrument, or pattern, e.g., it is consistently wrong in a particular direction

(Last)

Bias• Deviation of results or inferences from the

truth, or processes leading to such deviation. Any trend in the collection, analysis, interpretation, publication, or review of data that can lead to conclusions that are systematically different from the truth. (Last)

• A process at any stage of inference tending to produce results that depart systematically from true values (Fletcher)

What is meant by bias inresearch?

• Bias is the term used to describe differences between the study findings and truth

• “Any effect at any stage of investigation orinference tending to produce results that departsystematically from the true values (to bedistinguished from random error)”

Bias

• Bias is a systematic error in an epidemiologic study that results in an incorrect estimation of the association between exposure and outcome

What can be wrong in the study?

Random error

Results in low precision of the epidemiological measure measure is not precise, but true

1 Imprecise measuring2 Too small groups

Systematic errors(= bias)

Results in low validity of the epidemiological measure measure is not true

1 Selection bias2 Information bias

3 Confounding

Random errors

Errors in epidemiological studies

Error

Study size

Systematic error (bias)

Random error (chance)

Estimation

• When we measure OR, we estimate a point estimate– Will never know the true value

• Confidence interval indicates precision or amount of random error– Wide interval low precision– Narrow interval high precision

• OR = 4.5 (2.0 – 10)

Classification of bias

There are three broad categories of bias:• • selection bias• • confounding• • measurement bias

Systematic error

•Does not decrease with increasing sample size

• Selection bias• Information bias• Confounding

BiasSystematic deviations in study findings

from the truth–Results from errors in the collection,

analysis, interpretation, publication, or review of data

Selection BiasError due to systematic difference between the characteristics of the people selected for a study and

those who are not.

Selection bias

• Errors due to systematic differences in characteristics between those who are selected for study and those who are not.

(Last; Beaglehole)• When comparisons are made between groups

of patients that differ in ways other than the main factors under study, that affect the outcome under study. (Fletcher)

What is Selection Bias?

“ Error due to systematic differences in characteristics between those who are selected for study and those who are not.”

Examples of Selection bias• Subjects: hospital cases under the care of a

physician• Excluded: 1. Die before admission – acute/severe disease.2. Not sick enough to require hospital care3. Do not have access due to cost, distance etc.• Result: conclusions cannot be generalized• Also known as ‘Ascertainment Bias’

)Last(

Ascertainment Bias

• Systematic failure to represent equally all classes of cases or persons supposed to be represented in a sample. This bias may arise because of the nature of the sources from which the persons come, e.g., a specialized clinic;

Case ascertainment• Who is your case?

– Patient?– Deceased person?

• What is the definition of the case?– Cancer (clinically? Pathologically?)– Virus carriers (Asymptomatic patients)

→ You need to screen the antibody

Who will be controls?• Control 　 ≠ non-case

– Controls are also at risk of the disease in his(her) future.

– In a case-control study of gastric cancer, a person who has received the gastrectomy cannot be a control.

– In a case-control study of car accident, a person who does not drive a car cannot be a control.

Selection bias with ‘volunteers’

• Also known as ‘response bias’ • Systematic error due to differences in

characteristics b/w those who choose or volunteer to take part in a study and those who do not

Selection bias with ‘Survival Cohorts’

• Patients are included in study because they are available, and currently have the disease

• For lethal diseases patients in survival cohort are the ones who are fortunate to have survived, and so are available for observation

• For remitting diseases patients are those who are unfortunate enough to have persistent disease

• Also known as ‘Available patient cohorts’

Selection bias due to ‘Loss to Follow-up’

• Also known as ‘Migration Bias’• In nearly all large studies some members of

the original cohort drop out of the study• If drop-outs occur randomly, such that

characteristics of lost subjects in one group are on an average similar to those who remain in the group, no bias is introduced

• But ordinarily the characteristics of the lost subjects are not the same

Healthy worker effect

• A phenomenon observed initially in studies of occupational diseases: workers usually exhibit lower overall death rates than the general population, because the severely ill and chronically disabled are ordinarily excluded from employment. Death rates in the general population may be inappropriate for comparison if this effect is not taken into account.

)Last(

Example…. ‘healthy worker effect ’

• Question: association b/w formaldehyde exposure and eye irritation

• Subjects: factory workers exposed to formaldehyde

• Bias: those who suffer most from eye irritation are likely to leave the job at their own request or on medical advice

• Result: remaining workers are less affected; association effect is diluted

Information Bias(Observation Bias,

Measurement Bias)Error due to systematic differences in the way data on exposure or outcome are obtained from various groups leading to misclassification of

study subjects

Measurement bias• Systematic error arising from inaccurate

measurements (or classification) of subjects or study variables. (Last)

• Occurs when individual measurements or classifications of disease or exposure are inaccurate (i.e. they do not measure correctly what they are supposed to measure)

(Beaglehole)• If patients in one group stand a better chance of

having their outcomes detected than those in another group. (Fletcher)

Measurement / (Mis) classification

• Exposure misclassification occurs when exposed subjects are incorrectly classified as unexposed, or vice versa

• Disease misclassification occurs when diseased subjects are incorrectly classified as non-diseased, or vice versa

)Norell(

Causes of misclassification

1. Measurement gap: gap between the measured and the true value of a variable

- Observer / interviewer bias- Recall bias- Reporting bias2. Gap b/w the theoretical and empirical

definition of exposure / disease

Example… ‘gap b/w definitions’

Theoretical definition•Exposure: passive

smoking – inhalation of tobacco smoke from other people’s smoking

•Disease: Myocardial infarction – necrosis of the heart muscle tissue

Empirical definition•Exposure: passive

smoking – time spent with smokers (having smokers as room-mates)

•Disease: Myocardial infarction – certain diagnostic criteria (chest pain, enzyme levels, signs on ECG)

Exposure misclassification – Non-differential

•Misclassification does not differ between cases and non-cases

•Generally leads to dilution of effect, i.e. bias towards RR=1 (no association)

Example…Non-differential Exposure Misclassification

+nt-ntTotal+nt4080120-nt

100004000050000

+nt-ntTotal+nt6060120-nt

200003000050000

EXPOSUREX-ray exposure


DIS

EASE

Bre

ast C

ance

r

RR= 40/10000 80/40000 = 2

RR= 60/20000 60/30000 = 1.5

An example of non-differential misclassification in an exposure variable

We want to compare mean of blood pressure levels between cases and controls.

The blood pressure checker has a problem and always gives 5mmHg-higher than true values.

All subjects were examined by the same blood pressure checker.

→ no problem for internal comparison

Exposure misclassification - Differential

• Misclassification differs between cases and non-cases

• Introduces a bias towards RR= 0 (negative / protective association), or

RR= α (infinity)(strong positive association)

Example…Differential Exposure Misclassification

+nt-ntTotal+nt4080120-nt99603992049880

100004000050000

+nt-ntTotal+nt4080120-nt199402994049880

199803002050000



DIS

EASE

Bre

ast C

ance

r

RR= 40/10000 80/40000 = 2

RR= 40/19980 80/30020 = 0.75

Causes of Differential Exposure Misclassification

• Recall Bias:Systematic error due to differences in accuracy or completeness of recall to memory of past events or experience.

For e.g. patients suffering from MI are more likely to recall and report ‘lack of exercise’ in the past than controls


• Measurement bias:e.g. analysis of Hb by different methods

(cyanmethemoglobin and Sahli's) in cases and controls.

e.g.biochemical analysis of the two groups from two different laboratories, which give consistently different results


• Interviewer / observer bias: systematic error due to observer variation (failure of the observer to measure or identify a phenomenon correctly)

e.g. in patients of thrombo-embolism, look for h/o OCP use more aggressively

Confounding 1. A relationship b/w the effects of two or

more causal factors as observed in a set of data such that it is not logically possible to separate the contribution that any single causal factor has made to an effect

(Last)

Confounding

When another exposure exists in the study population (besides the one being studied) and is associated both with disease and the

exposure being studied. If this extraneous factor – itself a determinant of or risk factor for health outcome is unequally distributed b/w the exposure subgroups, it can lead to

confounding)Beaglehole(

Confounding

Confounders are risk factors for the outcome.

Confounders are related to exposure of your interest.

Confounders are NOT in the process of causal relationship between the exposure and the outcome of your interest.

Example of “not” confounder- pineal hormone is not a confounder-

Breast cancer

Down regulationof pineal hormoneCausation?

EMF

EMF: electro-magnetic field

EMF exposure induces down

regulation of pineal hormone

Decrease of pineal hormonemay be the risk of breast ca.

If EMF exposure cause breast cancer only through down regulation of pineal hormone, this is not a confounder.

Examples … confounding

SMOKING LUNG CANCER

AGE)If the average ages of the smoking and

non-smoking groups are very different(

)As age advanceschances of lungcancer increase(


COFFEE DRINKING HEART DISEASE

SMOKING

)Coffee drinkers are more likely to smoke(

)Smoking increasesthe risk of heart ds(


ALCOHOLINTAKE

MYOCARDIALINFARCTION

SEX

)Men are more at risk for MI(

)Men are more likelyto consume alcohol

than women(

Why do we have to consider confounding?

We want to know the “real” causal association but a

distorted relationship remains if you do not adjust

for the effects of confounding factors.

Example … multiple biases• Study: ?? Association b/w regular exercise and

risk of CHD• Methodology: employees of a plant offered an

exercise program; some volunteered, others did not

coronary events detected by regular voluntary check-ups, including a careful history, ECG, checking routine heath records

• Result: the group that exercised had lower CHD rates

Biases operating

• Selection: volunteers might have had initial lower risk (e.g. lower lipids etc.)

• Measurement: exercise group had a better chance of having a coronary event detected since more likely to be examined more frequently

• Confounding: if exercise group smoked cigarettes less, a known risk factor for CHD

Methods for controlling Selection Bias

During Study Design1. Randomization2. Restriction3. MatchingDuring analysis1. Stratification2. Adjustmenta) Simple / standardizationb) Multiple / multivariate adjustment

Randomization

• The only way to equalize all extraneous factors, or ‘everything else’ is to assign patients to groups randomly so that each has an equal chance of falling into the exposed or unexposed group

• Equalizes even those factors which we might not know about!

• But it is not possible always

Restriction

• Subjects chosen for study are restricted to only those possessing a narrow range of characteristics, to equalize important extraneous factors

Example… restriction

• Study: effect of age on prognosis of MI• Restriction: Male / White / Uncomplicated

anterior wall MI• Important extraneous factors controlled

for: sex / race / severity of disease• Limitation: results not generalizable to

females, people of non-white community, those with complicated MI

• For example: “Babies who are breast-fed have less illness

than babies who are bottle-fed.”

Which illnesses? How is feeding type defined? How large a difference in risk?

• A better example: “Babies who are exclusively breast-fed for

three months or more will have a reduction in the incidence of hospital admissions for gastroenteritis of at least 30% over the first year of life.”

Matching - definition •The process of making a study group and a

comparison group comparable with respect to extraneous factors (Last)

•For each patient in one group there are one or more patients in the comparison group with same characteristics, except for the factor of

interest (Fletcher)

Types of Matching• Caliper matching: process of matching

comparison group to study group within a specific distance for a continuous variable (e.g., matching age to within 2 years)

• Frequency matching: frequency distributions of the matched variable(s) be similar in study and comparison groups

• Category matching: matching the groups in broad classes such as relatively wide age ranges or occupational groups

•Matching is often done for age, sex, race, place of residence, severity of disease, rate of progression of disease, previous treatment received etc.

•Limitations:-controls for bias for only those factors involved

in the match-Usually not possible to match for more than a

few factors because of the practical difficulties of finding patients that meet all matching criteria

-If categories for matching are relatively crude, there may be room for substantial differences b/w matched groups

Stratification • The process of or the result of separating a

sample into several sub-samples according to specified criteria such as age groups, socio-economic status etc. (Last)

• The effect of confounding variables may be controlled by stratifying the analysis of results

• After data are collected, they can be analyzed and results presented according to subgroups of patients, or strata, of similar characteristics (Fletcher)


+nt-nt+nt140100-ntTotal 3000030000

+nt-ntmalefemalemalefemale

+nt120206040-ntTotal20000100001000020000

Exposure-alcohol

Exposure-alcohol

Dis

ease

M

ID

i se a

se

M

I

RR = 140/30000 100/30000 = 1.4

RR = 120/20000(M) 60/10000

= 1RR = 20/10000

(F) 40/20000 = 1

Standardization

A set of techniques used to remove as far as possible the effects of differences in age or

other confounding variables when comparing two or more populations

The method uses weighted averaging of rates specific for age, sex, or some other potentially

confounding variable(s), according to some specified distribution of these variables

)Last(

Example … direct standardization

PreopPts Deaths %High 500306

Medium400164Low 3002.67

Total 1200484PreopPts RateExp.deathsHigh 400624

Medium400416Low 400.672.68

Total 120042.68 (3.6%)

HOSPITAL ‘A’

HOSPITAL ‘Std’

Multivariate adjustment• Simultaneously controlling the effects of

many variables to determine the independent effects of one

• Can select from a large no. of variables a smaller subset that independently and significantly contributes to the overall variation in outcome, and can arrange variables in order of the strength of their contribution

• Only feasible way to deal with many variables at one time during the analysis phase

Examples… Multivariate adjustment

• CHD is the joint result of lipid abnormalities, HT, smoking, family history, DM, exercise, personality type.

• Start with 2x2 tables using one variable at a time

• Contingency tables, i.e. stratified analyses, examining the effect of one variable changed in the presence/absence of one or more variables

Dealing with measurement bias

1. Blinding- Subject- Observer / interviewer- Analyser 2. Strict definition / standard definition for

exposure / disease / outcome3. Equal efforts to discover events equally in all

the groups

Controlling confounding

•Similar to controlling for selection bias•Use randomization, restriction, matching,

stratification, standardization, multivariate analysis etc.

How can we solve the problem of confounding?

“Prevention” at study design LimitationRandomization in an

intervention studyMatching in a cohort

study But not in a case-control study

How can we solve the problem of confounding?

“Treatment “ at statistical analysis

Stratification by a confounderMultivariate analysis

Error & Bias• Error: random error

• Bias ： systematic error–differential misclassification

–non-differential misclassification This is a problem!

EXAMPLES OF RANDOM ERROR, BIAS, MISCLASSIFICATION AND

CONFOUNDING IN THE SAME STUDY:

STUDY: In a cohort study, babies of women who bottle feed and women who

breast feed are compared, and it is found that the incidence of gastroenteritis, as recorded in medical records, is lower in

the babies who are breast-fed.

EXAMPLE OF RANDOM ERROR

By chance, there are more episodes of gastroenteritis in the bottle-fed group in the study sample ,

Or, also by chance, no difference in risk was found ,

EXAMPLE OF RANDOM MISCLASSIFICATION

Lack of good information on feeding history results in some breast-feeding mothers being randomly classified as bottle-feeding, and vice-

versa ..

EXAMPLE OF BIAS

The medical records of bottle-fed babies only are less complete (perhaps bottle fed babies go to the doctor less) than those of breast fed babies, and thus record fewer episodes of gastro-enteritis in them only. This is called bias because the observation itself is in error.

EXAMPLE OF CONFOUNDING The mothers of breast-fed babies are of higher social class, and the babies thus have better hygiene, less crowding and perhaps other factors that protect against gastroenteritis. Crowding and hygiene are truly protective against gastroenteritis, but we mistakenly attribute their effects to breast feeding. This is called confounding. because the observation is correct, but its explanation is wrong.

•Sampling•Sample Size•Study design

•Sources of data collection•Methods of data collection

•Content of information

Prevention of Bias

Selection bias

• Error because the associationexposure disease

is different for participants and non-participants in the study

• Errors in the– procedures to select participants– factors that influence participation

Examples of selection bias

• Self-selection bias• Non-response• Loss to follow-up

Self selection bias

• Selection bias is the distortion of statistics by the way in which a sample is selected.

• Self-selection bias is the distortion caused when the sample chooses itself — certain characteristics are over-represented because theycorrelate with willingness to be included.

http://moneyterms.co.uk/correlation/

Non response

• non-response occurs when certain questions in a survey are not answered by a respondent.

• non-response takes place also when a randomly sampled individual cannot be contacted or refuses to participate in a survey.

Sources of selection bias

Inappropriate selection of study subjects from the study population

• – non-random selection of subjects from the same population

• – selection of subjects from different or ill-defined study populations

• – failure to locate or unwillingness of people to participate

• – loss of persons from the study population because of the health outcome eg selective survival

example selection biassuppose we would like to conduct a case–control study of the association

between liver cancer and smoking. Cases (those identified as having liver cancer) could be all available

individuals in all the hospitals in town during the year of the study.Controls (individuals without history of liver cancer) would berecruited by local mass media advertisements—hence they would be

volunteers. The study results would most probably show a strong association between smoking and liver cancer, not necessarily because smoking and liver cancer are related, but because the selection process was different for cases and controls. Although the cases were arguably sampled from the population at large, the controls were sampled from a population of volunteers!

Preventing selection bias

• Same selection criteria• High response-rate• High rate of follow-up

Information bias

• Error because the measurement of exposure or disease

is different between the comparison groups.• Errors in the

– procedures to measure exposure– procedures to diagnose disease

Examples of information bias

• Diagnostic bias• Recall bias• Researcher influence

Measurement bias

Inaccurate measurement of study variables can lead to bias

Sources of inaccurate measurement:• • subject error – error within the individual for any

reason, eg imperfect recall of past exposures• • Instrument error – eg equipment not properly

calibrated, wording of question• • Observer error – error in use of instrument or

recording

Types of measurement error

Non-differential error• the inaccuracies of measurement are the same among

subgroups of subject• Non-differential measurement error in exposure and

outcome will always lead to bias towards finding no effect

Differential error• the inaccuracies of measurement are different among

subgroups of subject can lead to bias towards or away from no effect

Misclassification

Dog No dog

TBE-cases 20 20 OR = ad/bc = 3,0

Controls 20 60

Dog No dog


Controls 20 60

Dog No dog


Controls 28 52

True

Differential

Non-differential

Non-differential misclassification

• Same degree of misclassification in both cases and controls

• OR will be underestimated– True value is higher

• If no causal effect found, ask:– Could it be due to non-differential

misclassification?

Preventing information bias• Clear definitions• Good measuring methods• Blinding• Standardised procedures• Quality control

Minimising measurement bias1. use valid reliable tools to measure all studysubjects2. train staff and monitor their use of researchtools3. regular quality checks of research tools4. blinding of study subjects and assessors5. subjects in C-C study unaware of studyhypothesis6. consider sub-study to determine validity andreliability of measurements

Confounding

It occurs when there is a confounder, which is associated with both exposure

and disease independently .

Exposure Disease

Confounder

SLIDE 97

Confounding

• defined as: a situation in which the measure of effect of exposure on disease is distorted because of the association of the study factor with other factors that influence the outcome.

• These other factors are called confounders

A variable can be a confounder if all the following conditions are met:

• It is associated with the exposure of interest (causally or not).

• It is causally related to the outcome.

• AND ... It is not part of the exposure outcome causal pathway

Alcohol Lung cancer

Smoking

Confounding: ExampleConfounding: Example

Confounding: exampleConfounding: example

Drinker

Non-drinker

100 200

Lung cancer No lung cancer

50 50

50 150

50% of cases are drinkers, but only 25% of controls are drinkers.

Therefore, it appears that drinking is strongly associated with lung cancer.

Confounding: exampleConfounding: example

Drinker

Non-drinker


45 15

30 10

Drinker

Non-drinker


5 35

20 140

Smoker

Non-smoker

Among smokers, 45/75=60% of lung

cancer cases drink and

15/25=60% of controls drink.

Among non-smokers 5/25=20% of lung

cancer cases drink and

35/175=20% of controls drink.

75

25

25

175

Stratification: “Series of 2x2 tables”

Idea: Take a 2x2 table and break it into a series of smaller 2x2 tables (one table at each of J

levels of the confounder yields J tables).

Example: in testing for an association between lung cancer and alcohol drinking (yes/no),

separate smokers and non-smokers.

An Example

Maternal coffee consumption during

pregnancy

Delivery of low birth weight infant

?

Example

Low Birth Weight

Normal Birth Weight

Coffee17096

No Coffee9088

Crude OR = (170)(88) / (96)(90) = 1.73Crude OR = (170)(88) / (96)(90) = 1.73

SmokersLow Birth Weight

Normal Birth Weight

Coffee16016

No Coffee808

Stratum-specific OR = (160)(8) / (16)(80) = 1.00Stratum-specific OR = (160)(8) / (16)(80) = 1.00

Non-smokersLow Birth Weight

Normal Birth Weight

Coffee1080

No Coffee1080

Stratum-specific OR = (10)(80) / (80)(10) = 1.00Stratum-specific OR = (10)(80) / (80)(10) = 1.00

Evidence of ConfoundingORcrude = 1.73

ORsmokers = 1.00

ORnon-smokers = 1.00

The association between coffee consumption and having a low birth weight baby is

confounded by smoking. This is demonstrated by the lack of effect in each stratum.

Strategy #1: Does the variable meet the criteria to be a confounder?

Hypothetical case-control study of risk factors for malaria. 150 cases, 150 controls; gender distribution.

Cases ControlsMales 88 68

Females 62 82150 150

Question:Is male gender causally related to the risk of malaria?

Yes

No

Further study is needed

OR= [88 x 82] ÷ [68 x 62] = 1.71

Malaria

Malegender

?

Confounder for a male gender-malaria association?

?

Malaria

Malegender

?

Confounder for a male gender-malaria association?

Outdooroccupation

Malaria

Malegender

?Outdooroccupation

?

First criterion: Is the putative confounder associated with exposure?

. Males Females N (%) N (%)

Outdoor 68 (43.5) 13 (9.0) Indoor 88 131

156 (100) 144 (100)

Question:Is outdoor occupation associated with male gender?

Yes

No

OR=7.8

First criterion: Is the putative confounder associated with exposure?

Malaria

Malegender

?Outdooroccupation

?

Second criterion: Is the putative confounder associated with the outcome (case-control status)?

. Cases Controls N (%) N (%)

Outdoor 63 (42.0) 18 (12.0) Indoor 87 132

150 (100) 150 (100)

Question:Is outdoor occupation (or something for which this variable is a marker of --e.g., exposure to mosquitoes) causally related to malaria?

Yes

No

OR=5.3

Malaria

Second criterion: Is the putative confounder associated with case-control status?

Third criterion: Is the putative confounder in the causal pathway exposure outcome?

.

Malaria

Malegender

?Outdoor

occupation

?

Yes, it could be

Probably not

Note: Judgment and knowledge about the socio-cultural context are critical to answer

this question

Question :Provided that:• Crude association between male gender and malaria: OR=1.71

and • ... Outdoor occupation is more frequent among males, and• ... Outdoor occupation is associated with greater risk of malaria …

What would be the expected magnitude of the association between male gender and malaria after controlling for occupation (i.e., assuming the same degree of outdoor occupation in males and females)?

The (adjusted) association estimate will be smaller than 1.71

The (adjusted) association estimate will =1.71

The (adjusted) association estimate will greater than 1.71

Controlling confounding

In the design• Restriction of the study• Matching

In the analysis•Restriction of the analysis•Stratification•Multivariable regression

StrategyAdvantagesDisadvantagesSpecification“Include only non-smokers.”

• Easily understood• Limits generalizability• May limit sample size

Matching“Match smoking status of cases and controls”

• Useful for eliminating influence of strong constitutional confounders like age and sex

• Decision to match must be made when designing and can have irreversible adverse effects on analysis• Time consuming• Can not analyze associations of matched variables with the outcome

SLIDE 119

Control confounding at the designing stage

StrategyAdvantagesDisadvantagesStratification“Conduct analysis separately for smokers and non-smokers.”

• Easily understood• Reversible

• May be limited by sample size for each stratum• Difficult to control for multiple confounders

Statistical adjustment“Conduct multivariate analysis controlling (adjusting) for smoking status.”

• Multiple confounders can be controlled.• Reversible

• Need advanced statistical techniques• Results may be difficult to understand

SLIDE 120

Control confounding at the analysis stage

Restriction

We study only mothers of a certain age

Many children Downs’

35 year old mothers

Matching

“Selection of controls to be identical to the cases with respect to distribution of one or

more potential confounders”.

Many children Downs’

Maternal age

Multivariable regression

• Analyse the data in a statistical model that includes both the presumed cause and possible confounders

• Measure the odds ratio OR for each of the exposures, independent from the others

• Logistic regression is the most common model in epidemiology

Example miners exposure and lung cancer

• one group of miners exposed to the underground environment and the other group not exposed

• Hence any differences in lung cancer rate would be due to exposure to working underground

ExampleSelection bias and confounding

Bias occurs when the exposed and nonexposed groups have different risks ofdeveloping the outcome of interest forreasons other than being exposed.This can be due to selection bias orconfoundingeg. more underground workers smoke

Confounding variables

In our study of miners:1. .smoking is an independent risk factor(cause) of the disease (lung cancer)2. .more underground miners smoke – iesmoking is unevenly distributed amongthe exposed and non-exposed3. .smoking is not on the causal pathwaybetween exposure and disease

•When examining the relationship between an explanatory factor and an outcome, we are interested in identifying factors that may modify the factor's effect on the outcome (effect modifiers). We must also be aware of potential bias or confounding in a study because these can cause a reported association (or lack thereof) to be misleading. Bias and confounding are related to the measurement and study design. Let 's define these terms:

•If the method used to select subjects or collect data results in an incorrect association. ,

•THINK >> Bias !•If an observed association is not correct because a different

(lurking) variable is associated with both the potential risk factor and the outcome, but it is not a causal factor itself ,

•THINK >> Confounding !

•If an effect is real but the magnitude of the effect is different for different groups of individuals (e.g., males vs females or blacks vs whites).

•THINK >> Effect modification!

A confounding factors

• is one that affects both the exposure and the disease-that is (has an association with both the disease and the risk factor under study) that may distort relationships between the two and confound (confuse) the study results.

Exposure Outcome

Third variable

ConfoundingConfounding

Coffee CHD

Smoking

ConfoundingConfounding

Smoking is correlated with coffee drinking and a risk factor even for those who do not drink coffee

Confounding factor :

– Drinking coffee causes CHD

– Drinking coffee may not be the cause of CHD, but rather the fact that smokers are also coffee drinkers.

133

Confounding

Risk FactorIndependent

VariableCoffee

Disease Dependent Variable

CHD

CovariableConfounder

Smoking

Example:

• In a study of the association between tobacco smoking and lung cancer, age would be a confounding factor if the average ages of the non-smoking and smoking groups in the study population were very different, since lung cancer incidence increases with age.

•

Another example:

• the possible association between meat consumption and cancer colon may be due to other accompanying factors such as decreased intake of vegetables or increased intake of fat rather than the meat consumption itself.

problem•The annual report of POF Hospital for the year 2006 shows

200 cases of Myocardial Infarction, 35 cases of Cholecystitis, 105 cases of Pneumonia and 350 cases of Acute Gastroenteritis. The result of this report cannot be generalized on the total population of Faisalabad on account of: a. Confounding bias b. Memory bias c. Selection bias d. Berkesonian bias e. Interviewer’s bias

Key: True: d

•Mother’s education is therefore a potential confounding variable.

•In order to give a true picture of the relationship between bottle-feeding and diarrhea of under-twos, the influence of mother’s education should be controlled.

•This could either be addressed in the research design, e.g., by selecting only mothers with a specific level of education, or it could be taken into account during the analysis of the findings by analyzing the relation between bottle-feeding and diarrhea separately for mothers with different levels of education.

A study was done to compare the lung capacity of coal miners to the lung capacity of farm workers. The researcher studied 200 workers of each type. Other factors that might affect lung capacity are smoking habits and exercise habits. The smoking habits of the two worker types are similar, but the coal miners generally exercise less than the farm workers

1 .Which of the following is the explanatory variable in this study?a. Exerciseb. Lung capacityc. Smoking or notd. Occupation

2 .Which of the following is a confounding variable in this study?a. Exerciseb. Lung capacityc. Smoking or notd. Occupation

• Essential principles (features) of properly designed clinical trials:

• Control of variables surrounding the experimental subjects

• The investigator has control of the subjects, the intervention, outcome measurements, and sets the conditions under which the experiment is conducted. In particular, the investigator determines who will be exposed to the intervention and who will not. This selection is done in such a way that the comparison of outcome measure between the exposed and unexposed groups is as free of bias as possible.

•

• Randomization refers to the practice of assigning subjects to experimental or treatment groups in a completely random manner. Thus each subject has an equal chance of being placed in the experimental group. This avoids the potential bias of the researcher choosing subjects s/he feels would be most likely to benefit from the intervention for the intervention group, and a similar possible bias if the choice were left up to the subjects.

• Blindness refers to the practice in which the researcher remains uninformed and unaware

of the identities of experimental and control groups throughout the period of

experimentation and data gathering. Thus, the researcher can remain unbiased in judging the

responses of any particular subject or group.•

•When studies involve human subjects, it is important that the subjects also remain

uninformed as to whether they have been placed in the experimental group (receiving

the treatment) or control group (receiving the placebo). Such procedure is referred to as

double-blind (neither researcher nor subjects know who is receiving the treatment) .

• ). This is important because some people begin to feel better if they believe they have received a treatment. Only at the end of the study would the ‘code’ (known by the statistician) be broken and the results analyzed according to who had been taking the drug and who had not.

•So, the gold standard design for clinical trials, i.e., the least prone to bias, is the randomized

double-blind controlled trial.

Confounding Lecture

Documents

Transcript of Confounding Lecture