Correlation and Covariance
description
Transcript of Correlation and Covariance
Correlation and Covariance
R. F. Riesenfeld(Based on web slides by
James H. Steiger)
CS5961 Comp Stat 2
Goals
⇨ Introduce concepts of Covariance Correlation
⇨ Develop computational formulas
R F Riesenfeld Sp 2010
CS5961 Comp Stat 3
Covariance
⇨Variables may change in relation to each other
⇨Covariance measures how much the movement in one variable predicts the movement in a corresponding variable
R F Riesenfeld Sp 2010
CS5961 Comp Stat 4
Smoking and Lung Capacity
⇨ Example: investigate relationship between cigarette smoking and lung capacity
⇨ Data: sample group response data on smoking habits, and measured lung capacities, respectively
R F Riesenfeld Sp 2010
CS5961 Comp Stat 5
Smoking v Lung Capacity Data
N Cigarettes (X ) Lung Capacity (Y )1 0 452 5 423 10 334 15 31
5 20 29
R F Riesenfeld Sp 2010
6
Smoking and Lung Capacity
-5 0 5 10 15 20 2520
25
30
35
40
45
50
Lung Capacity (Y )
Smoking (yrs)
Lung
Cap
acity
CS5961 Comp Stat 7
Smoking v Lung Capacity
⇨Observe that as smoking exposure goes up, corresponding lung capacity goes down⇨Variables covary inversely⇨Covariance and Correlation quantify relationship
R F Riesenfeld Sp 2010
CS5961 Comp Stat 8
Covariance⇨Variables that covary inversely, like smoking
and lung capacity, tend to appear on opposite sides of the group means When smoking is above its group mean, lung
capacity tends to be below its group mean.
⇨Average product of deviation measures extent to which variables covary, the degree of linkage between them
R F Riesenfeld Sp 2010
CS5961 Comp Stat 9
The Sample Covariance
⇨Similar to variance, for theoretical reasons, average is typically computed using (N -1), not N . Thus,
1
11
N
i ii
xyS X X Y YN
R F Riesenfeld Sp 2010
CS5961 Comp Stat 10
Cigs (X ) Lung Cap (Y )
0 455 42
10 3315 3120 29
10 36
Calculating Covariance
R F Riesenfeld Sp 2010
X Y
CS5961 Comp Stat 11
Calculating Covariance
Cigs (X ) Cap (Y )0 -10 -90 9 455 -5 -30 6 42
10 0 0 -3 3315 5 -25 -5 3120 10 -70 -7 29
∑= -215R F Riesenfeld Sp 2010
( ) ( )X X Y Y ( )Y Y ( )X X
CS5961 Comp Stat 12
Evaluation yields,
1 ( 215) 53.754xyS
Covariance Calculation (2)
R F Riesenfeld Sp 2010
CS5961 Comp Stat 13
Covariance under Affine Transformation
.
,
.
i i i i
i i i i
ii
L aX b M cY d
l a x m c y
u u u
Let and Then,
,
where,
1
1 1
N
LM i ii
S l mN
Evaluating, in turn, gives,
R F Riesenfeld Sp 2010
CS5961 Comp Stat 14
Covariance under Affine Transf (2)
1
1
1
1 1
111
1
N
LM i ii
N
i ii
N
i ii
S l mN
a x c yN
ac x yN
Evaluating further,
LM xyS acS
R F Riesenfeld Sp 2010
CS5961 Comp Stat 15
(Pearson) Correlation Coefficient rxy
⇨Like covariance, but uses Z-values instead of deviations. Hence, invariant under linear transformation of the raw data.
1
11
N
xy i ii
r zx zyN
R F Riesenfeld Sp 2010
CS5961 Comp Stat 16
Alternative (common) Expression
xyxy
x y
sr
s s
R F Riesenfeld Sp 2010
CS5961 Comp Stat 17
Computational Formula 1
1 1
1
11
N N
i iNi i
xy i ii
X Ys X Y
N N
R F Riesenfeld Sp 2010
CS5961 Comp Stat 18
Computational Formula 2
2 22 2
N XY X Y
N X X N Y Yxyr
R F Riesenfeld Sp 2010
CS5961 Comp Stat 19
Table for Calculating rxy
Cigs (X ) X 2 XY Y 2 Cap (Y )0 0 0 2025 455 25 210 1764 42
10 100 330 1089 3315 225 465 961 3120 400 580 841 29
∑= 50 750 1585 6680 180
R F Riesenfeld Sp 2010
CS5961 Comp Stat 20
Computing rxy from Table
2 2
5(1585) 50(180)
7925 9000
5(750 50 ) 5(6680) 180
3750 2500 33400 32400
xyr
R F Riesenfeld Sp 2010
CS5961 Comp Stat 21
Computing Correlation
1075
1250 1000
0.9615
xy
xy
r
r
R F Riesenfeld Sp 2010
CS5961 Comp Stat 22
Conclusion
⇨ rxy = -0.96 implies almost certainty smoker will have diminish lung capacity
⇨ Greater smoking exposure implies greater likelihood of lung damage
0.96xyr
R F Riesenfeld Sp 2010
CS5961 Comp Stat 23
End Covariance & Correlation
Notes
R F Riesenfeld Sp 2010