Education 793 Class Notes Joint Distributions and Correlation 1 October 2003.

Post on 17-Dec-2015

218 views 1 download

Transcript of Education 793 Class Notes Joint Distributions and Correlation 1 October 2003.

Education 793 Class Notes

Joint Distributions and Correlation

1 October 2003

Today’s Agenda

• Class and lab announcements

• Your questions?

• Joint distributions

• Correlation analysis to regression

Joint Distributions

• In correlational studies, the researcher is interested in questions about the relationship between two or more variables.

• How are scores on one variable associated with scores on another variable?

• A joint distribution is a distribution in which pairs of scores for each subject are recorded.

Graphical Representation

• Scatterplots of the (x,y)’s.

SticiGui: Scatterplots and Association

Definition:

Correlation - a measure of the strength of association between two variables.

Pearson-Product Correlation: Measure of Association

• An index showing the degree to which two distributions that show a linear relationship in the scatterplot are associated

• Values range from –1 to +1, with 0 indicating no relationship

• The average crossproduct of the standard scores of two variables

• Computed as:

Important Properties

• Will underestimate curvilinear relationships• As homogenity increases, correlation

coefficient tends to decrease• Size of sample does not affect size of correlation

coefficient• Positive Associations mean that as X increases Y

increases and negative association means that as X increases Y decreases

• Correlation is just the standardized version of the covariance (does not depend on magnitude of sdy and sdx

Individual Contributions to rMean of x = 27.50; s = 17.08Mean of y = 31.25; s = 18.87

0

5

10

15

20

25

30

35

40

45

50

0 10 20 30 40 50

(5;45) (25;45)

(45;5)

(35;30)

++

+---

-+

Visualizing CorrelationsPlot A Plot B Plot C Plot D

Plot E Plot F Plot G Plot H

Squared Correlation Coefficient or Coefficient of Determination2xyr

Coefficient of Determination tells you how much (percent) of the variance in one set of scores is accounted for by knowing the other set of scores.

Shared Variance

=shared variance / total variance

Restricting Range

N = 255R = .63

A

The Impact of Restricted Range

N = 43R = .17

B C

N = 4R = .10

Correlation and Causality

Correlation does not equal causation

The higher the absolute value of a correlation, the stronger the relationship between two variables. Strength, though, does not explain the source of the relationship

Causal Interpretation

Logical possibility

Symbolic representation

Causal Explanation

1. A B A causes B

2. A B B causes A

3. A C B C causes both A and B

4. D C A

D A

D causes C which, in turn causes A

D causes A directly

Extending Correlation to Regression

Goal:

To predict values of our dependent variable based on values of our independent variable(s) and our knowledge of the underlying relationship (measured by Pearson's r)

Requirements:

Have data appropriate for computing rBe willing to specify nature of relationship (IV DV)

Extending the Correlation

Aptitude and Performance

Creating the Prediction Equation

Calculating Y-hat

Predicting the DV

Residuals

Standard Error of Estimate

• A natural extension of the standard deviation– Deviations from the mean predicted value– Squared– Summed– Divide by N (or N-2 when estimating

parameters)– This is an estimate or the error made when

estimating y from x.

Formula for SE of Estimate

2

)ˆ( 2

.2

n

yys yx

An alternative formula: 222

. 1 xyyyx rss

Since r2xy=proportion of variance in y predictable from x,

1- r2xy is the proportion that is NOT predictable from x.

Hence, the error.

For Next Week

• Chapter 8 p. 211-225

• Chapter 10 p. 249-271