Correlation and Regression Quantitative Methods in HPELS 440:210.

Post on 23-Dec-2015

226 views 0 download

Transcript of Correlation and Regression Quantitative Methods in HPELS 440:210.

Correlation and Regression

Quantitative Methods in HPELS

440:210

Agenda

Introduction The Pearson Correlation Hypothesis Tests with the Pearson

Correlation Regression Instat Nonparametric versions

Introduction Correlation: Statistical technique used to

measure and describe a relationship between two variables

Direction of relationship: Positive Negative

Form of relationship: Linear Quadratic . . .

Degree of relationship: -1.0 0.0 +1.0

Uses of Correlations

Prediction Validity Reliability

Agenda

Introduction The Pearson Correlation Hypothesis Tests with the Pearson

Correlation Regression Instat Nonparametric versions

The Pearson Correlation Statistical Notation Recall for ANOVA:

r = Pearson correlationSP = sum of products of deviationsMx = mean of x scores

SSx = sum of squares of x scores

Pearson Correlation

Formula Considerations Recall for ANOVA:SP = (X – Mx)(Y – My)

SP = XY – XY / n

SSx = (X – Mx)2

SSy = (Y – My)2

r = SP / √SSxSSy

Pearson Correlation

Step 1: Calculate SP Step 2: Calculate SS for X and Y values Step 3: Calcuate r

Step 1 SP

SP = (X – Mx)(Y – My)SP = (-6*-1)+(4*1)+(-2*-1)+(2*0)+(2*1)SP = 6 + 4 + 2 + 0 + 2SP = 14

SP = XY – XY / nSP = 74 – [30(100)]/5SP = 74 - 60SP = 14

X=30 Y=10

XY = (0*1)+(10*3)+(4*1)+(8*2)+(8*3)XY = 0 + 30 + 4 + 16 + 24XY = 74

Step 2 SSx and SSy

Step 3 r

r = SP / √SSxSSy

r = 14 / √(64)(4) r = 14 / √256 r = 14/16 r = 0.875

Interpretation of r

Correlation ≠ causality Restricted range

If data does not represent the full range of scores – be wary

Outliers can have a dramatic effect Figure 16.9

Correlation and variability Coefficient of determination (r2)

Agenda

Introduction The Pearson Correlation Hypothesis Tests with the Pearson

Correlation Regression Instat Nonparametric versions

The Process

Step 1: State hypotheses Non directional:

H0: ρ = 0 (no population correlation) H1: ρ ≠ 0 (population correlation exists)

Directional: H0: ρ ≤ 0 (no positive population correlation) H1: ρ < 0 (positive population correlation exists)

Step 2: Set criteria = 0.05

Step 3: Collect data and calculate statistic r

Step 4: Make decision Accept or reject

Example

Researchers are interested in determining if leg strength is related to jumping ability

Researchers measure leg strength with 1RM squat (lbs) and vertical jump height (inches) in 5 subjects (n = 5)

Step 1: State Hypotheses

Non-Directional

H0: ρ = 0

H1: ρ ≠ 0

Step 2: Set Criteria

Alpha () = 0.05

Critical Value:

Use Critical Values for Pearson Correlation Table

Appendix B.6 (p 697)

Information Needed:

df = n - 2

Alpha (a) = 0.05

Directional or non-directional?

Critical value = 0.878

0.878

Step 3: Collect Data and Calculate Statistic

Data:

X Y XY

200 25 5000

180 22 3960

225 27 6075

300 27 8100

160 25 4000

1065 126 27135

Calculate SPSP = XY – XY / nSP = 27135 – [1065(126)]/5SP = 27135 - 26838SP = 297

Calculate SSx

X X-Mx (X-Mx)2

200 -13 169

180 -33 1089

225 12 144

300 87 7569

160 -53 2809

213M 11780

Calculate SSy

Y Y-My (Y-My)2

25 -0.2 0.04

22 -3.2 10.24

27 1.8 3.24

27 1.8 3.24

25 -0.2 0.04

25.2M 16.8

X X-Mx (X-Mx)2

200 -13 169

180 -33 1089

225 12 144

300 87 7569

160 -53 2809

213M 11780

r = SP / √SSxSSy

r = 297 / √11780(16.8)

r = 297 / √197904

r = 297 / 444.86

r = 0.667

Step 3: Collect Data and Calculate Statistic

Calculate r Step 4: Make Decision

0.667 < 0.878

Accept or reject?

Agenda

Introduction The Pearson Correlation Hypothesis Tests with the Pearson

Correlation Regression Instat Nonparametric versions

Regression Recall Several uses of correlation:

PredictionValidityReliability

Regression attempts to predict one variable based on information about the other variable

Line of best fit

Regression

Line of best fit can be described with the following linear equation Y = bX + a where:Y = predicted Y valueb = slope of lineX = any X valuea = intercept

Y = bX + a, where:

Y = cost (?)

b = cost per hour ($5)

X = number of hours (?)

a = membership cost ($25)Y = 5X + 25

Y = 5(10) + 25

Y = 50 + 25 = 75

Y = 5X + 25

Y = 5(30) + 25

Y = 150 + 25 = 175

5

25

Line of best fit minimizes

distances of points from line

Calculation of the Regression Line

Regression line = line of best fit = linear equation

SP = (X – Mx)(Y – My)

SSx = (X – Mx)2

b = SP / SSx

a = My - bMx

Example 16.14, p 557

SP = (X – Mx)(Y – My)

SP = 16

SSx = (X – Mx)2

SP = 10

b = SP / SSx

b = 16 / 10 = 1.6

a = My - bMx

a = 6 – 1.6(5) = -2

Mx=5 My=6

Y = bX + a

Y = 1.6(X) - 2

Agenda

Introduction The Pearson Correlation Hypothesis Tests with the Pearson

Correlation Regression Instat Nonparametric versions

Instat - Correlation Type data from sample into a column.

Label column appropriately. Choose “Manage” Choose “Column Properties” Choose “Name”

Choose “Statistics” Choose “Regression”

Choose “Correlation”

Instat – Correlation Choose the appropriate variables to be

correlated Click OK Interpret the p-value

Instat – Regression

Type data from sample into a column. Label column appropriately.

Choose “Manage” Choose “Column Properties” Choose “Name”

Choose “Statistics” Choose “Regression”

Choose “Simple”

Instat – Regression

Choose appropriate variables for: Response (Y) Explanatory (X)

Check “significance test” Check “ANOVA table” Check “Plots” Click OK Interpret p-value

Reporting Correlation Results Information to include:

Value of the r statistic Sample size p-value

Examples: A correlation of the data revealed that strength and

jumping ability were not significantly related (r = 0.667, n = 5, p > 0.05)

Correlation matrices are used when interrelationships of several variables are tested (Table 1, p 541)

Agenda

Introduction The Pearson Correlation Hypothesis Tests with the Pearson

Correlation Regression Instat Nonparametric versions

Nonparametric Versions Spearman rho when at least one of the

data sets is ordinal Point biserial correlation when one set

of data is ratio/interval and the other is dichotomousMale vs. femaleSuccess vs. failure

Phi coefficient when both data sets are dichotomous

Violation of Assumptions Nonparametric Version Friedman Test

(Not covered) When to use the Friedman Test:

Related-samples design with three or more groups

Scale of measurement assumption violation: Ordinal data

Normality assumption violation: Regardless of scale of measurement

Textbook Assignment

Problems: 5, 7, 10, 23 (with post hoc)