Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from...

25
Regression and Correlation GTECH 201 Lecture 18
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    218
  • download

    3

Transcript of Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from...

Page 1: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Regression and Correlation

GTECH 201Lecture 18

Page 2: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

ANOVA

Analysis of Variance Continuation from matched-pair difference

of means tests; but now for 3+ cases We still check whether samples come from

one or more distinct populations Variance is a descriptive parameter ANOVA compares group means and looks

whether they differ sufficiently to reject H0

Page 3: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

ANOVA H0 and HA

Page 4: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

ANOVA Test Statistic

MSB = between-group mean squares

MSW = within-group mean squares

Between-group variability is calculated in three steps:

B

w

MSF

MS

1. Calculate overall mean as weighted average of sample means

2. Calculate between-group sum of squares3. Calculate between-group mean squares (MSB)

Page 5: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Between-group Variability

1. Total or overall mean

2. Between-group sum of squares

3. Between-group mean squares

1

k

i ii

T

n XX

N

2

2 2

1 1

k k

B i i T i i Ti i

SS n X X n X N X

1B B

BB

SS SSMS

df k

Page 6: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Within-group Variability

1. Within-group sum of squares

2. Within-group mean squares

2

1

1k

w i ii

SS n s

W WW

W

SS SSMS

df N k

Page 7: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Kruskal-Wallis Test Nonparametric equivalent of ANOVA Extension of Wilcoxon rank sum W test

to 3+ cases Average rank is Ri / ni

Then the Kruskal-Wallis H test statistic is

With N =n1 + n2 + … +nk = total number of observations, and

Ri = sum of ranks in sample i

2

1

123 1

1

ki

i i

RH N

N N n

Page 8: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

ANOVA Example

House prices by neighborhood in ,000 dollars A B C D175 151 127 174147 183 142 182138 174 124 210156 181 150 191184 193 180148 205

196

Page 9: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

ANOVA Example, continued

Sample statistics

n X sA 6 158.00 17.83B 7 183.29 17.61C 5 144.60 22.49D 4 189.25 15.48

Total 22 168.68 24.85

Now fill in the six steps of the ANOVA calculation

Page 10: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

The Six Steps

1 6(158.00) 7(183.29) 5(144.60) 4(189.25)168.68

22

k

i ii

T

n XX

N

2 2 2 2 22 2

1

6 158.00 7 183.29 5 144.60 4 189.25 22 168 6769.394k

B i i Ti

SS n X N X

6769.3942256.465

1 3B B

BB

SS SSMS

df k

2 2 2 22

1

1 5 17.83 6 17.61 4 22.49 3 15.48 6193.379k

w i ii

SS n s

6193.379

344.07722 4

W WW

W

SS SSMS

df N k

2256.465

6.558344.077

B

W

MSF

MS .003p

Page 11: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Correlation Co-relatedness between 2+ variables As the values of one variable go up,

those of the other change proportionally

Two step approach:1. Graphically - scatterplot2. Numerically – correlation coefficients

Page 12: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Is There a Correlation?

Page 13: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Scatterplots Exploratory analysis

Page 14: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Pearson’s Correlation Index

Based on concept of covariance

= covariation between X and Y

= deviation of X from its mean

= deviation of Y from its mean

Pearson’s correlation coefficient

XYCV X X Y Y XYCV

X X

Y Y

/X Y

X X Y Y Nr

S S

Page 15: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Sample and Population

r is the sample correlation coefficient

Applying the t distribution, we can infer the correlation for the whole population

Test statistic for Pearson’s r

2

2

1

r nt

r

Page 16: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Correlation Example Lake effect snow

Page 17: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Spearman’s Rank Correlation

Non-parametric alternative to Pearson

Logic similar to Kruskal and Wilcoxon

Spearman’s rank correlation coefficient 2

3

61s

dr

N N

Page 18: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Regression

In correlation we observe degrees of association but no causal or functional relationship

In regression analysis, we distinguish an independent from a dependent variable

Many forms of functional relationships bivariate linear

multivariate non-linear (curvi-linear)

Page 19: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Graphical Representation

In correlation analysis either variable could be depicted on either axis

In regression analysis, the independent variable is always on the X axis

Bivariate relationship is described by a best-fitting line through the scatterplot

Page 20: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Least-Square Regression

Objective: minimize 2id

Y a bX

Page 21: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Regression Equation

Y = a + bX

22

n XY X Yb

n X X

Y b Xa

n

Page 22: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Strength of Relationship

How much is explained by the regression equation?

Page 23: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Coefficient of Determination

Total variation of Y (all the bucket water)

Large ‘Y’ = dependent variable Small ‘y’ = deviation of each value of Y

from its mean

e = explained; u = unexplained

22y Y Y 2 2 2

e uy y y

2 2 2e uy y y

Page 24: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Explained Variation

Ratio of square of covariation between X and Y to the variation in X

where xy = covariation between X and Y

x2 = total variation of X

Coefficient of determination

222e

xyy

x

22

2eyry

Page 25: Regression and Correlation GTECH 201 Lecture 18. ANOVA Analysis of Variance Continuation from matched-pair difference of means tests; but now for 3+ cases.

Error Analysis

r 2 tells us what percentage of the variation is accounted for by the independent variable

This then allows us to infer the standard error of our estimate

which tells us, on average, how far off our prediction would be in measurement units

2

2eySE

n