Weon preconference pearce variation and causation

36
Variation and causation Neil Pearce London School of Hygiene and Tropical Medicine London, United Kingdom Variation and causation The analysis of variance and the analysis of causes Implications for data analysis Implications for causal inference

Transcript of Weon preconference pearce variation and causation

Page 1: Weon preconference pearce variation and causation

Variation and causation

Neil Pearce London School of Hygiene and

Tropical Medicine London, United Kingdom

Variation and causation

• The analysis of variance and the analysis of causes • Implications for data analysis • Implications for causal inference

Page 2: Weon preconference pearce variation and causation

160

180

200

220

240

68 72 76 80

r=0.48

Height

Wei

ght

Height

Wei

ght

160

180

200

220

240

68 72 76 80

r=0.29

Page 3: Weon preconference pearce variation and causation

PKU gene

No Yes

Phenylalanine

in the diet

% cases caused

Low

High

% variance explained

High phenylalanine in the diet 100% 0%

PKU Gene 100% 100%

What is a cause?

• A cause is something for which, if the exposed group had not been exposed, at least some of the cases of disease would not have occurred

• Defined in terms of counterfactuals/potential outcomes • This may include some instances where the exposure causes

some cases and prevents some others (e.g. alcohol could theoretically increase the risk of total mortality in young people and decrease the risk in old people, and has a zero net effect in middle-age)

• Does this mean that we have to be able to conceive of a corresponding intervention? If so, what does this mean for factors such as age, gender, genetics?

Page 4: Weon preconference pearce variation and causation

Does asbestos exposure modify the effect of smoking on lung cancer?

Asbestos exposed

Asbestos non-exposed

Smokers 35/1000 10/1000

Non-smokers 5/1000 1/1000

Asbestos exposed

Asbestos non-exposed

Smokers 35/1000 10/1000

Non-smokers 5/1000 1/1000

U

1/1000

U’ S

9/1000

+ U’’ A +

5/1000

+

U’””

A S

21/1000

Page 5: Weon preconference pearce variation and causation

U

High

Ph

Diet

PKU

gene

PKU gene

No Yes

Phenylalanine

in the diet

% cases caused

Low

High

% variance explained

High phenylalanine in the diet 100% 0%

PKU Gene 100% 100%

Page 6: Weon preconference pearce variation and causation

PKU gene

No Yes

Phenylalanine

in the diet

% cases caused

Low

High

% variance explained

High phenylalanine in the diet 100% 100%

PKU Gene 100% 0%

PKU gene

No Yes

Phenylalanine

in the diet

% cases caused

Low

High

% variance explained

High phenylalanine in the diet 100% ~50%

PKU Gene 100% ~50%

Page 7: Weon preconference pearce variation and causation

Genetic diseases

• When the environmental component is universal but the genetic component varies, then we say that the condition is entirely genetic

• When the genetic component is universal but the environmental component varies then we say that the disease is entirely environmental

• In most instances, presumably, both the genetic factors and the environmental factors vary, and whether we label the disease as “genetic” or “environmental” depends on our current knowledge

Genetics, heritability and the environment

• “Except for some cases of trauma, it is fair to say that virtually every human illness has a hereditary component.” (Collins F. N Engl J Med 1999, 341: 28-37.)

• Virtually every human illness also has an environmental component.

• Thus, virtually every human illness is 100% genetic and 100% environmental (e.g. PKU)

• What is the heritability of lung cancer? What would be the heritability in a population where everyone smoked? (Rose)

Page 8: Weon preconference pearce variation and causation

Lewontin RC. The analysis of variance and the analysis of causes. Am J Hum Genet 1974; 26: 400-11.

• “The difficulties in the early history of genetics embodied in the pseudo-question of ‘nature versus nurture’ arose precisely because it … supposed that the phenotype of an individual could be the result of either environment or genotype, whereas we understand the phenotype to be the result of both.”

• “The analysis of causes in human genetics is meant to provide us with the basic knowledge we require for correct schemes of environmental modification and intervention… Analysis of variance can do neither of these because its results are a unique function of the present distribution of environment and genotypes.”

• “For example, if two men lay bricks to build a wall, we may quite fairly measure their contribution by counting the number laid by each; but if one mixes the mortar and the other lays the bricks, it would be absurd to measure their relative quantitative contributions by measuring the volume of bricks and of mortar. It is obviously even more absurd to … ascribe so many inches of a man’s height to his genes and so many to his environment.” [Lewontin RC. The analysis of variance and the analysis of causes. Am J Hum Genet 1974; 26: 400-11.]

• “Genetical science has outgrown the false antithesis between heredity and environment productive of so much futile controversy in the past.” [Hogben L. Nature and nurture. 1932]

Page 9: Weon preconference pearce variation and causation

The analysis of variance and the analysis of causes

• The points raised by Lewontin are logically the same as those raised by Greenland et al (1986, 1991) referencing Tukey (1954) that standardized regression coefficients and correlations are not good measures of effect because they are constructed from the standard deviation (and hence variance) of the exposure

• Even if one is interested in accounting for variation in the exposure distribution, variance is not the best measure of variation; traditional standardization (now termed ‘marginalization’) is the best way to do this

Page 10: Weon preconference pearce variation and causation
Page 11: Weon preconference pearce variation and causation

Balance Box

Balance Box

Page 12: Weon preconference pearce variation and causation

Changes in physical activity over time

• “Leisure Time Physical Activity (LTPA) accounts for only a small part of total physical energy expenditure… although…LTPA has not decreased, and may even have increased moderately, this increase has not compensated for the substantial decline in occupational, household and transportation activities. Thus, overall physical activity in the population has decreased considerably”

[Hu FB. Obesity epidemiology. Oxford: 2008]

Who cut down the last tree?

• “What did the Easter Islander who cut down the last palm tree say as he was doing it?... [In fact] the changes in forest cover from year to year would have been almost undetectable... Only the oldest islanders thinking back to their childhoods ... could have recognised a difference. Gradually Easter Island’s trees became fewer, smaller, and less important... No one would have noticed the falling of the last little palm sapling”

[Jarrod Diamond. Collapse: how societies choose to fail or survive]

Page 13: Weon preconference pearce variation and causation

Obesity genes

No Yes

Exercise

% cases caused

High

Low

% variance explained

Obesity genes 100% 50%

Lack of exercise 100% 50%

Obesity genes

No Yes

Exercise

% cases caused

High

Low

% variance explained

Obesity genes 100% 100%

Lack of exercise 100% 0%

Page 14: Weon preconference pearce variation and causation

Variation and causation

• The analysis of variance and the analysis of causes • Implications for data analysis • Implications for causal inference

Measures of variation and measures of causation for smoking and lung cancer in two different populations

Population 1 Population 2

Smoking Non-smoking Smoking Non-smoking

Lung cancer 1000/ 5000

100/ 5000

1800/ 9000

20/ 1000

Rate ratio 10.0 10.0

Correlation 0.29 0.14

% variance explained

8.4% 2.0%

PAF 82% 89%

Page 15: Weon preconference pearce variation and causation

Measures of variation and measures of causation for smoking and lung cancer, by prevalence of smoking

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

% variance explained

Population Attributable Fraction (PAF) %

The analysis of variance, correlation and standardised regression coefficients

• Any standard effect measure (e.g. the relative risk for smoking and lung cancer, or the regression coefficient for the association between BMI and blood pressure) should be ‘generalisable’ and should not be dependent on the amount of variation in the exposure (e.g. smoking, BMI) in the population

• The ‘percentage of variance explained’ is not generalisable because it depends on the amount of variation in the exposure (e.g. smoking, BMI) in the population. It is therefore not possible to use this as an effect measure in an epidemiological study, or to extrapolate this from one population or time period to another

• The same issues apply to correlation coefficients, and analyses using standardised regression coefficients

Page 16: Weon preconference pearce variation and causation

Approaches to Modeling “Traditional” approaches to modeling are based on the analysis of

variance. • They involve using models for prediction • The aim is to achieve a model that “fits well” in that it explains

most of the variance in the data • The aim is also to achieve a model that is “parsimonious” in that

it fits well with the minimum number of variables Thus decisions on adding or deleting variables are based on: • Statistical significance • Goodness of fit

Stepwise regression The problems of this approach include: • It makes decisions based on the amount (%) of variance explained

by each variable • If a variable does not vary much, then the model will decide that it is

not important (because it doesn’t explain much of the variation in the data) - E.g. if everyone smokes about the same amount

• If two variables are highly correlated, it is essentially random which of them will be selected - E.g. smoking and use of a cigarette lighter

• Stepwise regression does not evaluate confounders – a variable can be a strong confounder but may be ‘rejected’ by stepwise regression because it is not statistically significant, and a variable that is not a confounder may be included because it is statistically significant - E.g. if the exposed and non-exposed groups have the same percentage of smokers, then smoking will not be a confounder even if it is a strong risk factor for the disease

Page 17: Weon preconference pearce variation and causation

Approaches to Modeling

The analysis of causes requires a focus on the effect measure (RR or regression coefficient) and its CI, not on the correlation coefficient, % variance explained, or p-value

The ‘size’ of the main effect should be specified by the researcher (e.g. an increase of 2000 steps a day), not by the computer programme (e.g. one sd increase in exercise)

Decisions on adding potential confounders should be based on clear criteria (e.g. a priori knowledge and/or change-in-estimate)

Factors that are of major public health significance, or important confounders, may not be statistically significant because they do not vary sufficiently in the population under study

Causal Modelling

In causal modelling: • A variable that “adds significantly” to the model

may not be a confounder • A variable that does not “add significantly” may

be a confounder

Page 18: Weon preconference pearce variation and causation

Causal Modelling In causal modelling: • All potential confounders should be controlled if possible (there

is in general no benefit from having a more parsimonious model) • However, adding variables that are strongly correlated with

exposure will result in multicollinearity making the model unstable (this occurs when two or more ‘independent’ variables are strongly correlated with each other, so it is difficult to separate their independent effects)

• Also, if we have too many variables in the model, there can be problems from sparse data (i.e. too many variables for the number of data points)

Causal Modelling

Thus, in causal modelling: decisions on adding or deleting variables are based on the need to: • Control confounding • Avoid multicollinearity and sparse data

Interaction (effect measure modification) is of lesser concern unless there are strong a priori reasons to examine it

Page 19: Weon preconference pearce variation and causation

Variation and causation

• The analysis of variance and the analysis of causes • Implications for data analysis • Implications for causal inference

– Population comparisons will continue to be important – The RCT paradigm is not always the most appropriate framework

for epidemiology

The importance of population-level studies

• The population level is fundamental in epidemiology, just as weather systems (rather than molecular phenomena) are fundamental in climatology and macro-evolutionary processes are fundamental in evolutionary biology

• If we see differences in disease incidence/prevalence between populations, this shows that: – There is a public health problem that should be solved – There is a scientific question that can be solved

Page 20: Weon preconference pearce variation and causation

The importance of international comparisons

• Many of the major discoveries in cancer epidemiology followed the publication of “Cancer Incidence in Five Continents” in the 1950s and 1960s which generated new hypotheses about possible (population and individual) causes of cancer (e.g. diet and colorectal cancer, hepatitis B and liver cancer, HPV and cervical cancer)

• Many other important epidemiological hypotheses (e.g. the Barker Hypothesis) first arose from population comparisons and/or analyses of trends over time

• This is likely to continue to be the case (e.g. chemicals with know carcinogenic mechanisms are unlikely to be marketed, and therefore new carcinogens are likely to involve new mechanisms which will be “discovered” by population comparisons, rather than in the laboratory)

I Asher (chair) R Beasley J Crane E Mitchell N Pearce C Robertson

C Lai

JR Shah

N Aït-Khaled G Anabwani

J Mallol

(F Martinez)

B Björkstén

S Montefort

R Anderson (M Burr) U Keil D Strachan E von Mutius S Weiland H Williams

ISAAC Steering Committee

ISAAC 1998

S Foliaki

Page 21: Weon preconference pearce variation and causation

CENTRES REGISTERED FOR ISAAC PHASE THREE

Phase 3A centres

Phase 3B centres

Phase 1 centres not participating in Phase 3

Wheeze in last 12 mthsPhase One

ISAA C

1998

13-14 y r age grp

<5%5 to <10%10 to <20% 20%

Page 22: Weon preconference pearce variation and causation

Western

Non-Western

Combined WesternUK, West Sussex

OestersundSweden, Linkoeping

ValenciaMadrid

CartagenaSpain, Almeria

Norway, TromsoNew Zealand, Hawkes Bay

Netherlands, UtrechtItaly, Rome

Thessaloniki*Greece, Athens

MunichGermany, Dresden

Estonia, Tallinn

0 2 4 6 8 10 12 14 16

Combined Non-WesternTurkey, Ankara

Palestine, RamallahIndia, Bombay

Ghana, KintampoGeorgia, TbilisiAlbania, Tirana

* not considered in combine estimate due to heterogenity Odds ratio with 95% confidence interval

Current wheeze v skin prick test for atopy

What does this mean for causal inference? • Perhaps few implications specifically for causal inference, but

many implications for the overall process of scientific discovery

• Some exposures are almost ubiquitous in some parts of the world and/or absent in others, and we will never discover their importance unless we do international comparisons

• Population comparisons will therefore continue to be an important part of the process of epidemiological research

• They can ‘prove’ nothing (because of problems with the ecologic fallacy, etc), but will continue to be the best source of new hypotheses that can then be tested in more focussed studies

• If epidemiology is about populations, then the RCT paradigm will not always be the most appropriate

Page 23: Weon preconference pearce variation and causation

Variation and causation

• The analysis of variance and the analysis of causes • Implications for data analysis • Implications for causal inference

– Population comparisons will continue to be important – The RCT paradigm is not always the most appropriate framework

for epidemiology

(Observational) epidemiology has had many successes

• Smoking and lung cancer • Occupational cancer • Cervical cancer • etc

Page 24: Weon preconference pearce variation and causation

Observational epidemiology vs clinical trials

“To better understand causal effects, epidemiologists should put more effort into organizing large-scale randomized trials instead of traditional observational studies, which are inevitably crippled by confounding and other biases.”

[Konrad Jamrozik, 2005] Which causal effects?

Which causal effects?

“The types of causal effects that can best be answered by large-scale randomized trials”

These are not necessarily... “The causal effects that are most important or

most interesting in public health and/or scientific terms”

Page 25: Weon preconference pearce variation and causation

Limitations of randomized trials

• Only certain questions can be asked – Climate change – Socioeconomic factors

Page 26: Weon preconference pearce variation and causation

Limitations of randomized trials

• Only certain questions can be asked • Only simplistic questions get asked

– The importance of context

The Importance of Context The “populations” which epidemiologists study are

not just collections of individuals which are conveniently grouped for the purposes of study, but are instead historical entities.

Every population has its own history, culture, organisation, and economic and social divisions which influences how and why people are exposed to particular factors, and how they respond.

Page 27: Weon preconference pearce variation and causation

The Importance of Context • There were large numbers of deaths in indigenous people when

the Pacific was colonised in the 19th century • It is commonly assumed that epidemics of deaths due to

infectious diseases affected all populations, but many populations actually experienced very few deaths

• One determinant of death from infectious disease was whether land was taken (and social systems disrupted)

• Snow also observed that overcrowding was an important determinant of death from cholera (in addition to the effects of water supply)

• The effects of many other risk factors (e.g. drug adverse reactions) may depend on the population (and health care system) in which the exposure is occurring

Limitations of randomized trials

• Only certain questions can be asked • Only simplistic questions get asked • The wrong questions get asked

– Beta carotene

Page 28: Weon preconference pearce variation and causation

Levels of analysis: beta carotene and cancer

• Peto R, Doll R, et al. Can dietary beta-carotene materially reduce human cancer rates? Nature 1981; 290: 201-208. – “Human cancer risks are inversely correlated with (a) blood

retinol and (b) dietary -carotene... If dietary b-carotene is truly protective – which could be tested by controlled trials – there are a number of theoretical mechanisms whereby it may act.”

• Intervention studies were conducted with beta carotene • Two out of three large trials showed an increased risk

of lung cancer in the intervention group

Levels of analysis

“A question relevant to the etiology of cancer that is seldom asked is: What gets cancer - the genes, the cell, the organism, or perhaps even the population? The potential answers are not necessarily exclusive, even given reductionist tendencies and the genuine and justified excitement over discoveries in the molecular biology of cancer. Rather these are levels of explanation that may be more or less coherent within themselves but provide even more information when they exist in a framework provided by all of the explanatory modes”.

(Potter, 1992)

Page 29: Weon preconference pearce variation and causation

Limitations of randomized trials

• Only certain questions can be asked • Only simplistic questions get asked • The wrong questions get asked • The right questions get asked, but the

intervention doesn’t work – trials of health promotion

Health promotion: experience in developed countries

• North Karelia study – Local community support – Comprehensive approach – Mass media – Workplace – Primary care, hospitals, schools – Training programmes – Food shops and food industry

Page 30: Weon preconference pearce variation and causation

North Karelia: death rates during the course of the study in Finland

North Karelia risk factor trends: smoking

Ebrahim & Davey Smith, Int J Epi, 2001;30:201-5

0

10

20

30

40

50

60

N Karelia Kuopio N Karelia Kuopio

Men Women

Year

1972

1977

Page 31: Weon preconference pearce variation and causation

North Karelia risk factor trends: cholesterol

Ebrahim & Davey Smith, Int J Epi, 2001;30:201-5

4

5

6

7

8

N Karelia Kuopio N Karelia Kuopio

Men Women

Year

1972

1977

1982

North Karelia risk factor trends: blood pressure

Ebrahim & Davey Smith, Int J Epi, 2001;30:201-5

70

75

80

85

90

95

N Karelia Kuopio N Karelia Kuopio

Men Women

Year

1972

1977

1982

Page 32: Weon preconference pearce variation and causation

Disappointing results replicated many times

• Cardiovascular community control programs • Stanford Heart Disease Prevention Program • Stanford Five-City project • Minnesota Heart Health Program • Pawtucket Heart Health Program • Heart Beat Wales • COMMIT study

Multiple risk factor intervention

The randomised controlled trial evidence

•Dietary modification

•Smoking cessation

•Increasing exercise

•+/- drug treatments

Page 33: Weon preconference pearce variation and causation

Systematic review of multiple risk factor interventions: effect on CHD mortality

Ebrahim and Davey Smith BMJ 1997 and Cochrane Library

Limitations of randomized trials

• Only certain questions can be asked • Only simplistic questions get asked • The wrong questions get asked • The right questions get asked, but the

intervention doesn’t work • The right questions get asked, the intervention is

appropriate, and it works – folic acid

Page 34: Weon preconference pearce variation and causation

Limitations of the RCT paradigm • Works well for some things (e.g. smoking and lung

cancer) but not so well for others • Some study designs are regarded as better than others • Some important questions don’t fit the paradigm (e.g.

climate change, socioeconomic status) • Lack of recognition of population context • Difficulties of incorporating prior knowledge • Lack of recognition of prior knowledge • Lack of recognition of cycle of theory development and

testing including ecologic and hypothesis generating studies, as well as analytical studies

Appropriate technology • The danger for epidemiology is that an over-

emphasis on the “merits” of randomized trials may restrict which hypotheses are acceptable for study

• The appropriateness of any research methodology depends on the phenomenon under study: its magnitude, setting, the current state of theory and knowledge, the availability of valid measurement tools, and the proposed uses of the information.

Page 35: Weon preconference pearce variation and causation

Many sciences are observational • Astronomy/cosmology, Archaeology, Geology,

Evolutionary biology In some instances, these sciences may provide a

better paradigm for epidemiology than is provided by RCTs

We don’t always need to think of causality in terms of potential interventions (e.g. we can’t do RCTs of earthquakes or the big bang)

Methodology for epidemiological studies in the 21st century

• Recognition of multi-level population context • RCT paradigm isn’t always the most useful • Not all important questions can be answered with

cohort or case-control studies (e.g. climate change) • Few questions can be answered with a single study • Consideration of prior knowledge • Analysis of causes rather than analysis of variation • Causality isn’t just about interventions

Page 36: Weon preconference pearce variation and causation

Variation and causation

Neil Pearce London School of Hygiene and

Tropical Medicine London, United Kingdom