Correlation: How Strong Is the Linear Relationship? Lecture 50 Sec. 13.7 Mon, May 1, 2006.

26
Correlation: How Correlation: How Strong Is the Strong Is the Linear Linear Relationship? Relationship? Lecture 50 Lecture 50 Sec. 13.7 Sec. 13.7 Mon, May 1, 2006 Mon, May 1, 2006

Transcript of Correlation: How Strong Is the Linear Relationship? Lecture 50 Sec. 13.7 Mon, May 1, 2006.

Correlation: How Correlation: How Strong Is the Strong Is the

Linear Linear Relationship?Relationship?Lecture 50Lecture 50

Sec. 13.7Sec. 13.7

Mon, May 1, 2006Mon, May 1, 2006

The Correlation The Correlation CoefficientCoefficient

The The correlation coefficientcorrelation coefficient rr is a is a number between –1 and +1.number between –1 and +1.

It measures the direction and It measures the direction and strength of the linear relationship.strength of the linear relationship. If If rr > 0, then the relationship is positive. > 0, then the relationship is positive.

If If rr < 0, then the relationship is negative. < 0, then the relationship is negative. The closer The closer rr is to +1 is to +1 oror –1, the stronger –1, the stronger

the relationship.the relationship. The closer The closer rr is to 0, the weaker the is to 0, the weaker the

relationship.relationship.

Strong Positive Linear Strong Positive Linear AssociationAssociation

x

y

In this display, In this display, rr is close to +1. is close to +1.

Strong Positive Linear Strong Positive Linear AssociationAssociation

x

y

In this display, In this display, rr is close to +1. is close to +1.

Strong Negative Linear Strong Negative Linear AssociationAssociation

In this display, In this display, rr is close to –1. is close to –1.

x

y

Strong Negative Linear Strong Negative Linear AssociationAssociation

In this display, In this display, rr is close to –1. is close to –1.

x

y

Almost No Linear Almost No Linear AssociationAssociation

In this display, In this display, rr is close to 0. is close to 0.

x

y

Almost No Linear Almost No Linear AssociationAssociation

In this display, In this display, rr is close to 0. is close to 0.

x

y

Correlation vs. Cause Correlation vs. Cause and Effectand Effect

If the value of If the value of rr is close to +1 or -1, is close to +1 or -1, that indicates that that indicates that xx is a good is a good predictorpredictor of of yy..

It does It does notnot indicate that indicate that xx causes causes yy.. The correlation coefficient alone The correlation coefficient alone

cannot be used to determine cause cannot be used to determine cause and effect.and effect.

Correlation vs. Cause Correlation vs. Cause and Effectand Effect

There is good reason to believe that There is good reason to believe that the size of a person’s waistline is a the size of a person’s waistline is a predictor of his performance on an predictor of his performance on an algebra test (within the age range 0 algebra test (within the age range 0 – 21). Why?– 21). Why?

However, increasing your waistline However, increasing your waistline will not help you on an algebra test.will not help you on an algebra test.

Similarly, avoiding algebra is not a Similarly, avoiding algebra is not a good way to reduce your waistline.good way to reduce your waistline.

““Third” VariablesThird” Variables

The hidden third variable is age.The hidden third variable is age. Age causes (to some extent) the Age causes (to some extent) the

waistline to increase.waistline to increase. Age causes (to some extent) a Age causes (to some extent) a

person to do better on an algebra person to do better on an algebra test.test.

Mixing PopulationsMixing Populations

Mixing nonhomogeneous groups can Mixing nonhomogeneous groups can create a misleading correlation create a misleading correlation coefficient.coefficient.

Suppose we gather data on the Suppose we gather data on the number of hours spent watching TV number of hours spent watching TV each week and the child’s reading each week and the child’s reading level, for 1level, for 1stst, 2, 2ndnd, and 3, and 3rdrd grade grade students.students.

Mixing PopulationsMixing Populations

We may get the following results, We may get the following results, suggesting a weak positive suggesting a weak positive correlation.correlation.

Number of hours of TV

Read

ing

leve

l

Mixing PopulationsMixing Populations

We may get the following results, We may get the following results, suggesting a weak positive suggesting a weak positive correlation.correlation.

Number of hours of TV

Read

ing

leve

l

Mixing PopulationsMixing Populations

However, if we separate the points However, if we separate the points according to grade level, we may see according to grade level, we may see a different picture.a different picture.

Number of hours of TV

Read

ing

leve

l 1st grade

2nd grade

3rd grade

Mixing PopulationsMixing Populations

First-grade students by themselves First-grade students by themselves may indicate negative correlation.may indicate negative correlation.

Number of hours of TV

Read

ing

leve

l 1st grade

2nd grade

3rd grade

Mixing PopulationsMixing Populations

Second-grade students by Second-grade students by themselves may also indicate themselves may also indicate negative correlation.negative correlation.

Number of hours of TV

Read

ing

leve

l 1st grade

2nd grade

3rd grade

Mixing PopulationsMixing Populations

And third-grade students by And third-grade students by themselves may indicate negative themselves may indicate negative correlation.correlation.

Number of hours of TV

Read

ing

leve

l 1st grade

2nd grade

3rd grade

Mixing PopulationsMixing Populations

So, why did the points in the So, why did the points in the aggregate indicate a positive aggregate indicate a positive relationship?relationship?

Number of hours of TV

Read

ing

leve

l 1st grade

2nd grade

3rd grade

Calculating the Correlation Calculating the Correlation CoefficientCoefficient

There are many formulas for There are many formulas for rr.. The most basic formula isThe most basic formula is

Another formula isAnother formula is

2222 yynxxn

yxxynr

22 )()(

))((

yyxx

yyxxr

ExampleExample

Consider again the dataConsider again the data

x y

2 3

3 5

5 9

6 12

9 16

ExampleExample

Compute Compute xx, , yy, , xx22, , yy22, and , and xyxy.

155 515 28225 45

x y x2 y2 xy

2 3 4 9 6

3 5 9 25 15

5 9 25 81 45

6 12 36 144

72

9 16 81 256

144

ExampleExample

Then compute Then compute rr..

.9922.0550150

285

455155251155

4525282522

r

TI-83 – Calculating TI-83 – Calculating rr

To calculate To calculate rr on the TI-83, on the TI-83, First, be sure that Diagnostic is turned on.First, be sure that Diagnostic is turned on.

Press CATALOG and select DiagnosticsOn.Press CATALOG and select DiagnosticsOn. Then, follow the procedure that produces Then, follow the procedure that produces

the regression line.the regression line. In the same window, the TI-83 reports In the same window, the TI-83 reports rr22

and and rr.. Use the TI-83 to calculate Use the TI-83 to calculate rr in the in the

preceding example.preceding example.

The Relationship Between The Relationship Between bb and and rr

It turns out that there is a simple It turns out that there is a simple relationship between the slope relationship between the slope bb of of the regression line and the the regression line and the correlation coefficient correlation coefficient rr..

X

Y

s

srb

The Relationship Between The Relationship Between bb and and rr

In the previous example, we had In the previous example, we had ssXX = = 2.7386 and 2.7386 and ssYY = 5.2440. = 5.2440.

We also found We also found bb = 1.9. = 1.9. Therefore, the correlation coefficient Therefore, the correlation coefficient

isis.9922.0

2440.5

7386.2)9.1(

r