4.3: Establishing Causation Both correlation and regression are very useful in describing the...

5
4.3: Establishing Causation Both correlation and regression are very useful in describing the relationship between two variables; however, they are first and foremost used to describe linear relationships. In addition, a few extreme observations can strongly influence the correlation r and the LSRL. Other errors can occur as a result of using extrapolation, lurking variables, using averaged data, or even confusing association and causation.

Transcript of 4.3: Establishing Causation Both correlation and regression are very useful in describing the...

Page 1: 4.3: Establishing Causation Both correlation and regression are very useful in describing the relationship between two variables; however, they are first.

4.3: Establishing Causation

Both correlation and regression are very useful in describing the relationship between two variables; however, they are first and foremost used to describe linear relationships.In addition, a few extreme observations can strongly influence the correlation r and the LSRL.Other errors can occur as a result of using extrapolation, lurking variables, using averaged data, or even confusing association and causation.

Page 2: 4.3: Establishing Causation Both correlation and regression are very useful in describing the relationship between two variables; however, they are first.

Where Errors Can Occur

ExtrapolationExtrapolation is the use of a regression line/curve to predict values of y beyond the domain of given x-values.Few relationships, if any, are linear indefinitely. Hence, avoid extrapolating too far.

Lurking VariablesA lurking variable isn’t included in the data, but has an effect on the relationship between the actual variables being studied:

It may falsely suggest a strong relationship between the variables being studied, or…It could obscure an existent relationship between the known variables.

Using Averaged DataCaution should always be exercised in drawing conclusions, because many times data will represent averages rather than individuals.Correlations based on averages are generally too high when applied to individuals.

Page 3: 4.3: Establishing Causation Both correlation and regression are very useful in describing the relationship between two variables; however, they are first.

Errors (continued)

Association and CausationAn association between variables simply means that changes in x and changes in y occur together.An association can reflect any of three possibilities:

Causation, when changes in y are caused by changes in x.Common response, when changes in both x and y are in response to a separate variable not being studied.Confounding, when changes in y can not be clearly attributed to any one variable because the effects of those variables cannot be distinguished from one another

To be as accurate as possible in detecting causation, controlled experiments are used, in which x is directly changed and lurking variables are carefully monitored.

Page 4: 4.3: Establishing Causation Both correlation and regression are very useful in describing the relationship between two variables; however, they are first.

ExamplesCausation, number of hours worked (in a minimum wage job) vs. money earnedCommon response, SAT math score vs. SAT verbal score

Another example is the rates of ice cream consumption and murder, which exhibit a strong positive association. Which causes which; does eating ice cream cause murder or does murder make people eat ice cream? The answer is neither—increases in both ice cream consumption and murder are associated with hot weather. (from Wikipedia)

Confounding, homework grade vs. test grade (in this case, a possible confounding variable is work ethic)

An extraneous variable is a variable that MAY compete with the independent variable in explaining the outcome of a study. A confounding variable (also called a third variable) is a variable that DOES cause a problem because it is empirically related to both the independent and dependent variable. A confounding variable is a type of extraneous variable (it’s the type that we know is a problem, rather than the type that might potentially be a problem). Another example is violent movie exposure vs. acts of violence because a possible confounding variable is predisposition to violence

Page 5: 4.3: Establishing Causation Both correlation and regression are very useful in describing the relationship between two variables; however, they are first.

What if experimentation isn’t possible?

-Strong Association

-Consistent Association

-Large response valuesstrong respones

-Alleged cause precedes effect in time

-Alleged cause is plausible

REMBEMBER, we prefer an experiment to establish causation but this isn’t always possible. Consider all options.