Correlations & Linear Regressions

37
Correlations & Linear Regressions Block 3

Transcript of Correlations & Linear Regressions

Page 1: Correlations & Linear Regressions

Correlations &Linear Regressions

Block 3

Page 2: Correlations & Linear Regressions

Question

• You can ask other questions besides ‘Aretwo conditions different’?

• What relationship or association existsbetween two or more variables?– Positively related: as x goes, so goes y.– Negatively related: whatever x does, y does

the opposite.– No relationship.

Page 3: Correlations & Linear Regressions

Example of linear correlation

Record Sales (thousands)

4003002001000

Adv

erts

ing

Bud

g et

(tho

usan

ds

of p

o un d

s)

3000

2000

1000

0

-1000

Page 4: Correlations & Linear Regressions

Covariance• An association is indexed by covariance.

– Are changes in one variable met with a similar oropposite change in another variable?

• Variance (s2) = SS/N-1• SS =

!

xi " x( )#2

– We squared the error scores when lookingfor variance within one variable.

– If interested in the association between twovariables, we multiply the error scorestogether.

Page 5: Correlations & Linear Regressions

Calculating Covariance

• If deviations from the mean go in the samedirections for both variables, you’ll get apositive number.

• If deviations from the mean go in oppositedirections (one negative, one positive)you’ll get a negative number.

!

cov(x,y) =xi " x( ) yi " y( )#N "1

Page 6: Correlations & Linear Regressions

Interpreting linear relations

• Correlation coefficient [r] = linearrelationship between two variables.

• r2 = proportion of common variation in thetwo variables (strength or magnitude of therelationship.

• Outliers? A single outlier can greatlyinfluence the strength of a correlation.

Page 7: Correlations & Linear Regressions

Effect of outliers

One approach to dealing with outliers is to see if they arenon-representative (i.e., at the far end of the normal distribution).If so, they should be removed.

Page 8: Correlations & Linear Regressions

Conducting the analysis

• Each variable gets separate column.• Create a scatter plot to get visual

impression of data.– Direction of relationship– Strength of relationship– Extreme values (outliers), which can greatly

influence correlation coefficient.

Page 9: Correlations & Linear Regressions

When doing T-tests or anovas,especiallyrepeatedmeasures, eachrow was data from1 person.

Each observationis its own datarow.

No collapsing ofdata

Page 10: Correlations & Linear Regressions

Types of correlations

• Bivariate correlation: between two variable– Pearson’s correlation coefficient for parametric

data (interval or ratio data)

• Partial correlation: relationship betweentwo variables while ‘controlling’ the effectof one or more additional variables.

Page 11: Correlations & Linear Regressions
Page 12: Correlations & Linear Regressions
Page 13: Correlations & Linear Regressions
Page 14: Correlations & Linear Regressions

Partial Correlations

Page 15: Correlations & Linear Regressions
Page 16: Correlations & Linear Regressions

Drawing conclusions• Correlations only inform us about a

relationship between two or morevariables.

• Not able to talk about directionality orcausality.– An increase in X does not CAUSE an increase

in Y or vise versa. Cause could be fromunmeasured third variable.

– We don’t know which variable is influencingand which is being influenced.

Page 17: Correlations & Linear Regressions

R2

• By squaring our test statistic, we can tellhow much of total variance in the data forvariable x is accounted for by therelationship with variable y.

• R2 = .2382 = .056 = 5.6% of variance. (94%of variability still unaccounted for!)

• For height x age: .7632 = 0.582 = 58%

Page 18: Correlations & Linear Regressions

Non-parametric correlations

• Spearman’s Rho– Ranks the data and then applies Pearson’s

equation to ranks.• Kendall’s Tau

– Preferred for small data sets with many tiedrankings.

• Biserial correlations:– When one variable is dichotomous

Page 19: Correlations & Linear Regressions
Page 20: Correlations & Linear Regressions

Regressions

• Correlations detect associations betweentwo variables.– Say nothing of causal relationships or

directionality– Can’t predict behavior on one variable given a

value behavior for another variable

• With Regression models we can predictvariable Y based on variable X.

Page 21: Correlations & Linear Regressions

Simple Linear Regressions

• A line is fit to the data (similar to thecorrelations line).– Best line is one that produces the smallest

sum of squares from regression line to datapoints.

• Evaluation based on improvement ofprediction relative to using the mean orsome other model.

Page 22: Correlations & Linear Regressions

Hypothetical Data

0

10

20

30

40

50

60

70

80

0 5 10 15 20 25

outcome variable

Pre

dict

or v

aria

ble

Mean

Page 23: Correlations & Linear Regressions

Error from Mean

0

10

20

30

40

50

60

70

80

0 5 10 15 20 25

outcome variable

Pre

dic

tor

vari

able

Mean

Page 24: Correlations & Linear Regressions

Error from regression line

0

10

20

30

40

50

60

70

80

0 5 10 15 20 25

outcome variable

Pre

dict

or v

aria

ble

Mean

Regressionline

Page 25: Correlations & Linear Regressions

Regression Results

• The best regression line has the lowestsum of squared errors

• Evaluation of the regression model isachieved via– R2 = tells you % of variance accounted for by

the regression line (as with correlations)– F = Evaluates improvement of regression line

compared to the mean as a model of the data.

Page 26: Correlations & Linear Regressions

Simple Linear Regression inSPSS

• Data input in SPSS as for correlation• Only one predictor (IV) and one outcome

(DV) allowed.

• Coefficient table allows you to predict DVfor new values of IV.

Page 27: Correlations & Linear Regressions
Page 28: Correlations & Linear Regressions
Page 29: Correlations & Linear Regressions

R2 proportion of variance accountedfor by the regression, biased

Adjusted R2 = adjusts for bias

Is the model an improvementover the mean or over a priormodel?

β = Change inoutcome resulting inchange in predictor

Tests null hypothesisfor relationshipbetween IV an DV

Correlation between expected y and y

Page 30: Correlations & Linear Regressions

Predicting New Values

• Equation for line:– Y - output value– X = predictor value– β0 = intercept (constant in table. Value of Y

without predictors)– β1 = slope of line (value for predictor)– ε = residual (error)!

Y = "0 + "1Xi + #i

Page 31: Correlations & Linear Regressions

Multiple Regression

• Extends principles of simple linearregression to situation with multiplepredictor variables.

• We seek to find the linear combination ofpredictors that correlate maximally with theoutcome variable.

!

Y = "0 + "1Xi + "nXn + #iPredictor 1 Predictor 2

Page 32: Correlations & Linear Regressions

Multiple Regression, con’t

• R2 gives the % variance accounted for bythe model consisting of the multiplepredictors.

• T-test tell you independent contribution ofeach predictor in capturing data.

Page 33: Correlations & Linear Regressions

Descriptivesand otherstats fromhere

Page 34: Correlations & Linear Regressions

Evaluation of modelas a whole

Linearrelationship ofeach factor tothe dependentvariable

R2 was .33, now .44

Page 35: Correlations & Linear Regressions

Logistic Regression

• If you are instead interested in predicting classmembership, or seeing how well your variablespredict class membership, then you can do aLogistic Regression.

• You can use multiple factors as predictors, just like inmultiple regression.– You can also enter interactions between factors.– Categorical and continuous factors can be combined, but you

must tell SPSS which factors are continuous.

Page 36: Correlations & Linear Regressions

Summary• Correlations tell you about the relationship

between 2 variables

• Regressions allow you to predict an outcomevariable and to make causal inferences.

• Logistic regressions good for predicting groupmembership

• Next, I’ll tell you about some new developmentsin my world of stats including more sophisticatedregression techniques and a method to comparechanges to data over time.

Page 37: Correlations & Linear Regressions

Informative Websites

• http://faculty.vassar.edu/lowry/VassarStats.html

• http://www.uwsp.edu/psych/stat/

• http://www.richland.edu/james/lecture/m170/

• http://calcnet.mth.cmich.edu/org/spss/index.htmfor spss videos!

• Or, just google the test you are interested in.