Correlation: Relationship between Variables. Statistical Relationships versus Deterministic...

13
Correlation: Relationship between Variables
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    237
  • download

    2

Transcript of Correlation: Relationship between Variables. Statistical Relationships versus Deterministic...

Page 1: Correlation: Relationship between Variables. Statistical Relationships versus Deterministic Relationships Deterministic: if we know the value of one variable,

Correlation:Relationship between

Variables

Page 2: Correlation: Relationship between Variables. Statistical Relationships versus Deterministic Relationships Deterministic: if we know the value of one variable,

Statistical Relationships versusDeterministic Relationships

Deterministic: if we know the value of one variable, we can determine the value of the other exactly. e.g. relationship between volume and weight of water.

Statistical: natural variability exists in both measurements. Useful for describing what happens to a population or aggregate.

Page 3: Correlation: Relationship between Variables. Statistical Relationships versus Deterministic Relationships Deterministic: if we know the value of one variable,

Measuring StrengthThrough Correlation

Correlation (or the Pearson product-moment correlation or the correlation coefficient) represented by the letter r:

A Linear Relationship

• Indicator of how closely the values fall to a straight line. • Measures linear relationships only; that is, it measures

how close the individual points in a scatterplot are to a straight line.

Page 4: Correlation: Relationship between Variables. Statistical Relationships versus Deterministic Relationships Deterministic: if we know the value of one variable,

Features of the Correlation Coefficient• It indicates the strength of the relationship and whether

there is a positive or negative relationship.

• The correlation coefficient is a number between -1 and 1.

• A positive correlation indicates that the variables increase together.

• A negative correlation indicates that as one variable increases, the other decreases.

• Correlation of +1/-1 indicates a perfect linear positive/negative relationship between the two variables; as one increases, the other increases/decreases at a constant rate (a deterministic linear relationship).

Page 5: Correlation: Relationship between Variables. Statistical Relationships versus Deterministic Relationships Deterministic: if we know the value of one variable,

Features of the Correlation Coefficient • Correlation of zero could indicate no linear relationship

between the two variables, or that the best straight line through the data on a scatterplot is exactly horizontal.

• The closer the correlation is to 1 or -1, the stronger the relationship. The closer it is to 0, the weaker the relationship.

• Crude estimate: > |.5|? most likely a relationship < |.3|? correlation essentially non-

existent |.3| < r < |.5|? gray area!

• Correlations are unaffected if the units of measurement are changed. For example, the correlation between weight and height remains the same regardless of whether height is expressed in inches, feet or millimeters (as long as it isn’t rounded off).

Page 6: Correlation: Relationship between Variables. Statistical Relationships versus Deterministic Relationships Deterministic: if we know the value of one variable,

What to discuss?

To discuss the relationships between two variables we look at:

• Statistical Significance

• Strength of Correlation

Page 7: Correlation: Relationship between Variables. Statistical Relationships versus Deterministic Relationships Deterministic: if we know the value of one variable,

Statistical Significance

A relationship is statistically significant if that relationship is stronger than 95% of the relationships we would expect to see just by chance.

Page 8: Correlation: Relationship between Variables. Statistical Relationships versus Deterministic Relationships Deterministic: if we know the value of one variable,

Statistical SignificanceWhat r value is statistically significant?

It depends on the size of the sample.

The following table indicates the lowest r value that would indicate a statistically significant relationship for a given sample size.

A more complete table is in the Excel file.

Critical Values for the Correlation Coefficient 

Number of Points 95% Confidence

3 0.997

4 0.950

5 0.878

6 0.811

7 0.754

8 0.707

9 0.666

10 0.632

11 0.602

12 0.576

13 0.553

14 0.532

15 0.514

16 0.497

17 0.482

18 0.468

19 0.456

20 0.444

Page 9: Correlation: Relationship between Variables. Statistical Relationships versus Deterministic Relationships Deterministic: if we know the value of one variable,

Two Warnings about Statistical Significance

• Even a minor relationship will achieve “statistical significance” if the sample is very large.

• A very strong relationship won’t necessarily achieve “statistical significance” if the sample is very small.

Page 10: Correlation: Relationship between Variables. Statistical Relationships versus Deterministic Relationships Deterministic: if we know the value of one variable,

Example - Verbal SAT and GPA

The correlation coefficient is .485, indicating a moderate positive relationship.

Scatterplot of GPA and verbal SAT score.

Higher verbal SAT scores tend to indicate higher GPAs as well, but the relationship is nowhere close to being exact.

Page 11: Correlation: Relationship between Variables. Statistical Relationships versus Deterministic Relationships Deterministic: if we know the value of one variable,

Example - Husbands’ and Wifes’ Ages and Heights

Discuss the strength of the relationships. Are they positively or negatively correlated?

Scatterplot of British husbands’ and wives’ heights (in millimeters); r = .36

Scatterplot of British wives’ and husbands’ and ages; r = .94

Page 12: Correlation: Relationship between Variables. Statistical Relationships versus Deterministic Relationships Deterministic: if we know the value of one variable,

Example - Occupational Prestige and Suicide Rates

Correlation of .109. There is a weak positive relationshipIf outlier removed r drops to .018.

Plot of suicide rate versus occupational prestige for 36 occupations.

Page 13: Correlation: Relationship between Variables. Statistical Relationships versus Deterministic Relationships Deterministic: if we know the value of one variable,

Statistical Relationships

What is the difference between correlation and regression?

Correlation: measures the strength of a relationship between two measurement variables.

Regression: uses the equation of the trendline to predict one measurement variable from another.