Correlation Does Not Imply Causation

download Correlation Does Not Imply Causation

of 19

Transcript of Correlation Does Not Imply Causation

  • 8/22/2019 Correlation Does Not Imply Causation


    Wiki article

    Correlation does not imply causation

    Correlation does not imply causation is a phrase used inscienceandstatisticsto emphasize that acorrelationbetween two variables does not

    necessarily imply that onecausesthe other.[1][2]

    Manystatistical testscalculate correlation betweenvariables. A few go further and calculate the

    likelihood of a true causal relationship; examples are theGranger causalitytest andconvergent cross mapping.

    The counter assumption, that correlation proves causation, is considered aquestionable causelogical fallacyin that two events

    occurring togetherare taken to have a cause-and-effect relationship. This fallacy is also known as cum hoc ergo propter hoc, Latin for "with this,

    therefore because of this", and "false cause". A similar fallacy, that an event that follows another wasnecessarily a consequenceof the first event,

    is sometimes described aspost hoc ergo propter hoc(Latinfor "after this, therefore because of this").

    In a widely studied example, numerousepidemiological studiesshowed that women who were taking combinedhormone replacement

    therapy(HRT) also had a lower-than-average incidence ofcoronary heart disease(CHD), leading doctors to propose that HRT was protective

    against CHD. Butrandomized controlled trialsshowed that HRT caused a small but statistically significant increase in risk of CHD. Re-analysis of

    the data from the epidemiological studies showed that women undertaking HRT were more likely to be from highersocio-economic groups(ABC1),

    with better-than-average diet and exercise regimens. The use of HRT and decreased incidence of coronary heart disease were coincident effects

    of a common cause (i.e. the benefits associated with a higher socioeconomic status), rather than cause and effect, as had been supposed.


    As with any logical fallacy, identifying that the reasoning behind an argument is flawed does not imply that the resulting conclusion is false. In the

    instance above, if the trials had found that hormone replacement therapy caused a decrease in coronary heart disease, but not to the degree

    suggested by the epidemiological studies, the assumption of causality would have been correct, although the logic behind the assumption would

    still have been flawed.
  • 8/22/2019 Correlation Does Not Imply Causation




    1 Usage

    2 General pattern

    3 Examples of illogically inferring causation from correlation

    o 3.1 B causes A (reverse causation)

    o 3.2 A and B cause C which causes D (string of causation)

    o 3.3 A causes B and B causes A (bidirectional causation)

    o 3.4 Third factor C (the commo n-causal variable) causes both A and B

    4 Relation to the ecological fallacy

    5 Determining causation

    o 5.1 In academia

    o 5.2 Causality construed from counterfactual states

    o 5.3 Causality predicted by an extrapolation of trends

    6 Use of correlation as scientific evidence

    7 See also

    8 References

    9 External links


    Inlogic, the technical use of the word "implies" means "to be asufficientcircumstance." This is the meaning intended by statisticians when they

    say causation is not certain. Indeed,p implies q has the technical meaning oflogical implication:if p then q symbolized asp q. That is "if

    circumstancep is true, then q necessarily follows." In this sense, it is always correct to say "Correlation does not implycausation."
  • 8/22/2019 Correlation Does Not Imply Causation


    However, in casual use, the word "imply" loosely means suggests rather than requires. The idea that correlation and causation are connected is

    certainly true; where there is causation, there is likely to be correlation. Indeed, correlation is used when inferring causation; the important point is

    that such inferences are made after correlations are confirmed to be real and all causational relationship are systematically explored using large

    enough data sets.

    Edward Tufte, in a criticism of the brevity of "correlation does not imply causation," deprecates the use of "is" to relate correlation and causation

    (as in "Correlation is not causation"), citing its inaccuracy as incomplete.[1]

    While it is not the case that correlation is causation, simply stating their

    nonequivalence omits information about their relationship. Tufte suggests that the shortest true statement that can be made about causality and

    correlation is one of the following:[4]

    "Empirically observed covariation is a necessary but not sufficient condition for causality."

    "Correlation is not causation but it sure is a hint."

    General pattern[edit]

    For any two correlated events A and B, the following relationships are possible:

    A causes B;

    B causes A;

    A and B are consequences of a common cause, but do not cause each other;

    There is no connection between A and B; the correlation is coincidental.

    Less clear-cut correlations are also possible. For example, causality is not necessarily one-way; in apredator-prey relationship, predator numbers

    affect prey, but prey numbers, i.e. food supply, also affect predators.

    The cum hoc ergo propter hoclogical fallacy can be expressed as follows:

    1. A occurs in correlation with B.

    2. Therefore,A causes B.
  • 8/22/2019 Correlation Does Not Imply Causation


    In this type of logical fallacy, one makes a premature conclusion aboutcausalityafter observing only acorrelationbetween two or more factors.

    Generally, if one factor (A) is observed to only be correlated with another factor (B), it is sometimes taken for granted thatA is causing B, even

    when no evidence supports it. This is a logical fallacy because there are at least five possibilities:

    1. A may be the cause ofB.

    2. B may be the cause ofA.

    3. some unknown third factorCmay actually be the cause of bothA and B.

    4. there may be a combination of the above three relationships. For example, B may be the cause ofA at the same time asA is the cause

    ofB (contradicting that the only relationship betweenA and B is thatA causes B). This describes aself-reinforcingsystem.

    5. the "relationship" is acoincidenceor so complex or indirect that it is more effectively called a coincidence (i.e. two events occurring at the

    same time that have no direct relationship to each other besides the fact that they are occurring at the same time). A largersample

    sizehelps to reduce the chance of a coincidence, unless there is asystematic errorin the experiment.

    In other words, there can be no conclusion made regarding the existence or the direction of a cause and effect relationship only from the fact that

    A and B are correlated. Determining whether there is an actual cause and effect relationship requires further investigation, even when the

    relationship betweenA and B isstatistically significant, a largeeffect sizeis observed, or a large part of thevariance is explained.

    Examples of illogically inferring causation from correlation[edit]

    This section includes alist of references, related reading orexternal links, but the sources of this section remain unclear because

    it lacksinline citations. Pleaseimprovethis article by introducing more precise citations. (July 2012)

    B causes A (reverse causation)[edit]

    Example 1

    The more firemen fighting a fire, the bigger the fire is observed to be.

    Therefore firemen cause an increase in the size of a fire.
  • 8/22/2019 Correlation Does Not Imply Causation


    In this example, the correlation between the number of firemen at a scene and the size of the fire does not imply that the fi remen cause

    the fire. Firemen are sent according to the severity of the fire and if there is a large fire, a greater number of firemen are sent; therefore, it

    is rather that fire causes firemen to arrive at the scene. So the above conclusion is false.

    Example 2

    The faster windmills are observed to rotate, the more wind is observed to be.

    Therefore wind is caused by the rotation of windmills. (Or, simply put: windmills, as their name indicates, are machines used to produce


    In this example, the correlation (simultaneity) between windmill activity and wind velocity does not imply that wind is caused by

    windmills. It is rather the other way around, as suggested by the fact that wind doesnt need windmills to exist, while windm ills

    need wind to rotate. Wind can be observed in places where there are no windmills or non-rotating windmills. And there are good

    reasons to believe that wind existed before the invention of windmills.A and B cause C which causes D (string of causation)[edit]

    Lack of religion is associated with increased rates of depression.

    Therefore, lack of religion directly causes increased rates of depression.

    In this example, the correlation between lack of religion and depression does not imply that lack of religion causes

    depression. Depression is caused in part by how people are treated. Some cultures might be suspicious of people who

    have a lack of religion, so people who have a lack of religion are more likely to be discriminated against and to fall into

    depression. So the above conclusion is false.A causes B and B causes A (bidirectional causation)[edit]

    Increased pressure is associated with increased temperature.

    Therefore pressure causes temperature.

    Theideal gas law, , describes the direct relationship between pressure and temperature (along

    with other factors) to show that there is a direct correlation between the two properties. For a fixed volume and

    mass of gas, an increase in temperature will cause an increase in pressure; likewise, increased pressure will
  • 8/22/2019 Correlation Does Not Imply Causation


    cause an increase in temperature. This demonstrates bidirectional causation. The conclusion that pressure

    causes temperature is true but is not logically guaranteed by the premise.

    Third factor C (the common-causal variable) causes both A and B[edit]

    Main article:Spurious relationship

    All these examples deal with alurking variable, which is simply a hidden third variable that affects both causes

    of the correlation; for example, the fact that it is summer in Example 3. A difficulty often also arises where the

    third factor, though fundamentally different from A and B, is so closely related to A and/or B as to be confused

    with them or very difficult to scientifically disentangle from them (see Example 4).

    Example 1

    Sleepingwith one'sshoeson is strongly correlated with waking up with aheadache.

    Therefore, sleeping with one's shoes on causes headache.

    The above example commits the correlation-implies-causation fallacy, as it prematurely concludes that

    sleeping with one's shoes on causes headache. A more plausible explanation is that both are caused

    by a third factor, in this case going to beddrunk, which thereby gives rise to a correlation. So the

    conclusion is false.

    Example 2

    Young children who sleep with the light on are much more likely to developmyopiain later life.

    Therefore, sleeping with the light on causes myopia.

    This is a scientific example that resulted from a study at theUniversity of

    PennsylvaniaMedical Center. Published in the May 13, 1999 issue ofNature,[5]

    the study

    received much coverage at the time in the popular press.[6]

    However, a later study atOhio

    State Universitydid not find thatinfantssleeping with the light on caused the development of

    myopia. It did find a strong link between parental myopia and the development of child

    myopia, also noting that myopic parents were more likely to leave a light on in their children's
  • 8/22/2019 Correlation Does Not Imply Causation



    In this case, the cause of both conditions is parental myopia, and the above-

    stated conclusion is false.

    Example 3

    Asice creamsales increase, the rate ofdrowningdeaths increases sharply.

    Therefore, ice cream consumption causes drowning.

    The aforementioned example fails to recognize the importance of time and

    temperature in relationship to ice cream sales. Ice cream is sold during the

    hotsummermonths at a much greater rate than during colder times, and it is during

    these hot summer months that people are more likely to engage in activities involving

    water, such asswimming. The increased drowning deaths are simply caused by more

    exposure to water-based activities, not ice cream. The stated conclusion is false.Example 4

    A hypothetical study shows a relationship between test anxiety scores and shyness scores, with a statistical rvalue (strength of

    correlation) of +.59.[11]

    Therefore, it may be simply concluded that shyness, in some part, causally influences test anxiety.

    However, as encountered in many psychological studies, another variable, a

    "self-consciousness score," is discovered which has a sharper correlation

    (+.73) with shyness. This suggests a possible "third variable" problem,

    however, when three such closely related measures are found, it further

    suggests that each may have bidirectional tendencies (see "bidirectional

    variable," above), being a cluster of correlated values each influencing one

    another to some extent. Therefore, the simple conclusion above may be


    Example 5
  • 8/22/2019 Correlation Does Not Imply Causation


    Since the 1950s, both the atmosphericCO2level andobesitylevels have increased sharply.

    Hence, atmospheric CO2 causes obesity.

    Richer populations tend to eat more food and consume more


    Example 6

    HDL("good")cholesterolis negatively correlated with incidence of heart attack.

    Therefore, taking medication to raise HDL will decrease the chance of having a heart attack.

    Further research[12]

    has called this conclusion into question.

    Instead, it may be that other underlying factors, like genes,

    diet and exercise, affect both HDL levels and the likelihood

    of having a heart attack; it is possible that medicines may

    affect the directly measurable factor, HDL levels, without

    affecting the chance of heart attack.

    Relation to the ecological fallacy[edit]

    There is a relation between this subject-matter and

    theecological fallacy, described in a 1950 paper by William

    S. Robinson.[13]

    Robinson shows that ecological

    correlations, where the statistical object is a group of

    persons (i.e. an ethnic group), does not show the same

    behaviour as individual correlations, where the objects of

    inquiry are individuals: "The relation between ecological

    and individual correlations which is discussed in this paper

    provides a definite answer as to whether ecological

    correlations can validly be used as substitutes for individual
  • 8/22/2019 Correlation Does Not Imply Causation


    correlations. They cannot." (...) "(a)n ecological correlation

    is almost certainly not equal to its corresponding individual


    Determining causation[edit]

    In academia[edit]

    Main articles:CausalityandCausality (physics)

    The point of view that correlation implies causation may be

    regarded as a theory ofcausalitywhich is somewhat

    inherent to the field ofstatistics. Withinacademiaas a

    whole, the nature of causality is systematically investigated

    from severalacademic disciplines,


    In academia, there is a significant number of theories on

    causality; The Oxford Handbook of Causation (Beebee et

    al. 2009) encompasses 770 pages. Among the more

    influential theories withinphilosophyareAristotle'sFour



    Humeargued that causality is based on experience, and

    experience similarly based on the assumption that the

    future models the past, which in turn can only be based on

    experience leading tocircular logic. In conclusion, he

    asserted thatcausality is not based on actual reasoning:

    only correlation can actually be perceived.[15]


    Kant, according to Beebee et al., held that "a causal
  • 8/22/2019 Correlation Does Not Imply Causation


    principle according to which every event has a cause, or

    follows according to a causal law, cannot be established

    through induction as a purely empirical claim, since in

    would then lack strict universality, or necessity".[16]

    Outside the field of philosophy, theories of causation can

    be identified inclassical mechanics,statistical


    mechanics,spacetimetheories,biology,social sciences,


    In order for a correlation to be established as

    causal withinphysics, it is normally understood that the

    cause and the effect must be connected through a

    localmechanism(cf. for instance the concept ofimpact) or

    anonlocalmechanism (cf. the concept offield), in

    accordance with knownlaws of nature.

    From the point of view ofthermodynamics, universal

    properties of causes as compared to effects have been

    identified through theSecond law of thermodynamics,

    confirming the ancient, medieval andDescartian[18]


    that "the cause is greater than the effect" for the particular

    case ofthermodynamic free energy. This, in turn, would

    appear to be challenged by popular interpretations of the

    concepts ofnonlinear systemandButterfly effect, in which

    small causes are regarded to be able to cause large effects

    due to, respectively, unpredictability and an unlikely

    triggering of large amounts ofpotential energy.
  • 8/22/2019 Correlation Does Not Imply Causation


    Causality construed from counterfactualstates[edit]

    See also:Verificationism

    Intuitively, causation seems to require not just a correlation,

    but a counterfactual dependence. Suppose that a student

    performed poorly on a test and guesses that the cause was

    his not studying. To prove this, one thinks of the

    counterfactual the same student writing the same test

    under the same circumstances but having studied the night

    before. If one could rewind history, and change only one

    small thing (making the student study for the exam), then

    causation could be observed (by comparing version 1 to

    version 2). Because one cannot rewind history and replay

    events after making small controlled changes, causation

    can only be inferred, never exactly known. This is referred

    to as the Fundamental Problem of Causal Inference it is

    impossible to directly observe causal effects.[19]

    A major goal of scientificexperimentsand statistical

    methods is to approximate as best as possible the

    counterfactual state of the world.[20]

    For example, one could

    run anexperiment on identical twinswho were known to

    consistently get the same grades on their tests. One twin is

    sent to study for six hours while the other is sent to the

    amusement park. If their test scores suddenly diverged by

    a large degree, this would be strong evidence that studying
  • 8/22/2019 Correlation Does Not Imply Causation


    (or going to the amusement park) had a causal effect on

    test scores. In this case, correlation between studying and

    test scores would almost certainly imply causation.

    Well-designedexperimental studiesreplace equality of

    individuals as in the previous example by equality of

    groups. This is achieved by randomization of the subjects

    to two or more groups. Although not a perfect system, the

    likeliness of being equal in all aspects rises with the

    number of subjects placed randomly in the

    treatment/placebogroups. From the significance of the

    difference of the effect of the treatment vs. the placebo,

    one can conclude the likeliness of the treatment having a

    causal effect on the disease. This likeliness can be

    quantified in statistical terms by theP-value[dubiousdiscuss]


    Causality predicted by an extrapolation oftrends[edit]

    See also:Inertia

    When experimental studies are impossible and only pre-

    existing data are available, as is usually the case for

    example ineconomics,regression analysiscan be used.

    Factors other than the potential causative variable of

    interest are controlled for by including them as regressors

    in addition to the regressor representing the variable of

    interest. False inferences of causation due to reverse

    causation (or wrong estimates of the magnitude of
  • 8/22/2019 Correlation Does Not Imply Causation


    causation due the presence of bidirectional causation) can

    be avoided by using explanators (regressors) that are

    necessarilyexogenous, such as physical explanators like

    rainfall amount (as a determinant of, say, futures prices),lagged variables whose values were determined before the

    dependent variable's value was determined,instrumental

    variablesfor the explanators (chosen based on their known

    exogeneity), etc. SeeCausality#Statistics and

    Economics.Spurious correlationdue to mutual influence

    from a third, common, causative variable, is harder to

    avoid: the model must be specified such that there is a

    theoretical reason to believe that no such underlying

    causative variable has been omitted from the model. In

    particular, underlying time trends of both the dependent

    variable and the independent (potentially causative)

    variable must be controlled for by including time as another

    independent variable.[citation needed]

    Use of correlation as scientificevidence[edit]

    Much of scientific evidence is based upon a correlation of


    they are observed to occur together.

    Scientists are careful to point out that correlation does not

    necessarily mean causation. The assumption that A causes

    B simply because A correlates with B is often not accepted

    as a legitimate form of argument. However, sometimes
  • 8/22/2019 Correlation Does Not Imply Causation


    people commit the opposite fallacy dismissing correlation

    entirely, as if it does not suggest causation. This would

    dismiss a large swath of important scientific evidence.[21]

    In conclusion, correlation is a valuable type of scientific

    evidence in fields such as medicine, psychology, and

    sociology. But first correlations must be confirmed as real,

    and then every possible causative relationship must be

    systematically explored. In the end correlation can be used

    as powerful evidence for a cause and effect relationship

    between a treatment and benefit, a risk factor and a

    disease, or a social or economic factor and various

    outcomes. But it is also one of the most abused types of

    evidence, because it is easy and even tempting to come to

    premature conclusions based upon the preliminary

    appearance of a correlation.

    See also[edit]

    Affirming the consequent

    Chain reaction

    Confirmation bias


    Design of experiments

    Domino effect

    Ecological fallacy

    Four causes
  • 8/22/2019 Correlation Does Not Imply Causation


    Mierscheid Law

    Normally distributed and uncorrelated does not imply


    Observational study

    Occam's razor

    Pirates and global warming



    1. ^abTufte, Edward R.(2006).The Cognitive Style of

    PowerPoint: Pitching Out Corrupts Within.Cheshire,

    Connecticut:Graphics Press. p. 5.ISBN0-9613921-5-0.

    2. ^Aldrich, John (1995)."Correlations Genuine and

    Spurious in Pearson and Yule".Statistical

    Science10 (4): 364


    3. ^Lawlor DA, Davey Smith G, Ebrahim S (June 2004).

    "Commentary: the hormone replacement-coronary heart

    disease conundrum: is this the death of observational

    epidemiology?". Int J Epidemiol33 (3): 464


    4. ^Tufte, Edward R.(2003).The Cognitive Style of

    PowerPoint. Cheshire, Connecticut: Graphics Press.

    p. 4.ISBN0-9613921-5-0.,_Connecticut,_Connecticut,_Connecticut,_Connecticut,_Connecticut,_Connecticut
  • 8/22/2019 Correlation Does Not Imply Causation


    5. ^Quinn GE, Shin CH, Maguire MG, Stone RA (May

    1999). "Myopia and ambient lighting at

    night". Nature399 (6732): 113


    6. ^CNN, May 13, 1999.Night-light may lead to


    7. ^Ohio State UniversityResearch News, March 9,

    2000.Night lights don't lead to nearsightedness, study


    8. ^Zadnik K, Jones LA, Irvin BC, et al. (March 2000).

    "Myopia and ambient night-time

    lighting". Nature404 (6774): 143


    9. ^Gwiazda J, Ong E, Held R, Thorn F (March 2000).

    "Myopia and ambient night-time

    lighting". Nature404 (6774):


    10. ^Stone, J; et al., E; Held, R; Thorn, F (March 2000).

    "Myopia and ambient night-timelighting". Nature404 (6774): 144.doi:10.1038/35004665

    11. ^The Psychology of Personality: Viewpoints, Research,

    and Applications. Carducci, Bernard J. 2nd Edition.

    Wiley-Blackwell: UK, 2009.

    12. ^Ornish, Dean. "Cholesterol: The good, the bad, and

    the truth"[1](retrieved 3 June 2011)
  • 8/22/2019 Correlation Does Not Imply Causation


    13. ^Robinson, W.S. (1950). "Ecological Correlations and

    the Behavior of Individuals".American Sociological

    Review(American Sociological Review) 15 (3): 351


    14. ^Beebee et al., 2009

    15. ^David Hume (Stanford Encyclopedia of Philosophy)

    16. ^Beebee et al. 2009

    17. ^Beebee et al. 2009

    18. ^Lloyd, A.C., The principle that the cause is greater

    than its effect, Pronesis 21(2), 1976

    19. ^Paul W. Holland. 1986. "Statistics and Causal

    Inference" Journal of the American Statistical

    Association, Vol. 81, No. 396. (Dec., 1986), pp. 945


    20. ^Judea Pearl. 2000. Causality: Models, Reasoning, and

    Inference, Cambridge University Press.

    21. ^abNovella."Evidence in Medicine: Correlation and

    Causation".Science and Medicine. Science-Based


    External links[edit]

    "The Art and Science of cause and effect": a slide

    show and tutorial lecture by Judea Pearl

    Causal inference in statistics: An overview, by Judea

    Pearl (September 2009)
  • 8/22/2019 Correlation Does Not Imply Causation








    Help improve this page

    What's this?

    Did you find what you were looking for?


  • 8/22/2019 Correlation Does Not Imply Causation
