1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative...

42
1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce [email protected]
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    2

Transcript of 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative...

Page 1: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

1

Lecture 8Relationships between Scale variables: Regression Analysis

Graduate SchoolQuantitative Research Methods

Gwilym [email protected]

Page 2: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

2

Notices: Register

Page 3: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

3

Plan:

1. Linear & Non-linear Relationships 2. Fitting a line using OLS 3. Inference in Regression 4. Ommitted Variables & R2 5. Types of Regression Analysis 6. Properties of OLS Estimates 7. Assumptions of OLS 8. Doing Regression in SPSS

Page 4: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

4

1. Linear & Non-linear relationships between variables

Often of greatest interest in social science is investigation into relationships between variables:– is social class related to political perspective?– is income related to education?– is worker alienation related to job monotony?

We are also interested in the direction of causation, but this is more difficult to prove empirically:– our empirical models are usually structured

assuming a particular theory of causation

Page 5: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

5

Relationships between scale variables

The most straight forward way to investigate evidence for relationship is to look at scatter plots:– traditional to:

• put the dependent variable (I.e. the “effect”) on the vertical axis

– or “y axis”

• put the explanatory variable (I.e. the “cause”) on the horizontal axis

– or “x axis”

Page 6: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

6

Scatter plot of IQ and Income:

IQ

1601401201008060

INC

OM

E

40000

30000

20000

10000

Page 7: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

7

We would like to find the line of best fit:

IQ

1601401201008060

INC

OM

E

40000

30000

20000

10000

bxay ˆ

line of slope

intercept

where,

b

ya

IQbaINCOME

Page 8: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

8

What does the output mean?

Page 9: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

9

Sometimes the relationship appears non-linear:

IQ2

3002001000

INC

OM

E

40000

30000

20000

10000

Page 10: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

10

… and so a straight line of best fit is not always very satisfactory:

IQ2

3002001000

INC

OM

E

40000

30000

20000

10000

Page 11: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

11

Could try a quadratic line of best fit:

IQ2

3002001000

INC

OM

E

40000

30000

20000

10000

Page 12: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

12

But we can simulate a non-linear relationship by first transforming one of the variables:

IQ2

3002001000

INC

OM

E

40000

30000

20000

10000

IQ2

3002001000

INC

_S

Q

1000000000

800000000

600000000

400000000

200000000

0

Page 13: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

13

IQ2

3002001000

INC

OM

E

40000

30000

20000

10000

IQ2_LN

5.85.65.45.25.04.84.64.44.2

INC

OM

E

40000

30000

20000

10000

Page 14: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

14

… or a cubic line of best fit:(overfitted?)

IQ2

3002001000

INC

OM

E

40000

30000

20000

10000

Page 15: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

15

Or could try two linear lines:“structural break”

IQ2

3002001000

INC

OM

E

40000

30000

20000

10000

Page 16: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

16

2. Fitting a line using OLS

The most popular algorithm for drawing the line of best fit is one that minimises the sum of squared deviations from the line to each observation:

n

iii yy

1

2)ˆ(minWhere:

yi = observed value of y

= predicted value of yi

= the value on the line of best fit corresponding to xi

iy

Page 17: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

17

Regression estimates of a, b:or Ordinary Least Squares (OLS):

This criterion yields estimates of the slope b and y-intercept a of the straight line:

22 )( xxn

yxxynb

xbya

Page 18: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

18

3. Inference in Regression: Hypothesis tests on the slope coefficient:

Regressions are usually run on samples, so what can we say about the population relationship between x and y?

Repeated samples would yield a range of values for estimates of b ~ N(, sb)

• I.e. b is normally distributed with mean = = population mean = value of b if regression run on population

If there is no relationship in the population between x and y, then = 0, & this is our H0

Page 19: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

19

What does the standard error mean?

Page 20: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

20

Hypothesis test on b: (1) H0: = 0

(I.e. slope coefficient, if regression run on population, would = 0)

H1: (2) = 0.05 or 0.01 etc.

(3) Reject H0 iff P < • (N.B. Rule of thumb: P < 0.05 if tc 2, and P < 0.01 if tc 2.6)

(4) Calculate P and conclude.

bbb s

b

s

b

s

bt

0 2ndf

Page 21: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

21

Example using SPSS output: (1) H0: no relationship between house price and floor area.

H1: there is a relationship (2), (3), (4):

P = 1- CDF.T(24.469,554) = 0.000000

Reject H0

Coefficientsa

-10029.0 3242.545 -3.093 .002

638.825 26.108 .721 24.469 .000

(Constant)

Floor Area (sq meters)

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: Purchase Pricea.

Page 22: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

22

4. Ommitted Variables & R2 Q/ is floor area the only factor?How much of the variation in Price does it explain?

Floor Area (sq meters)

3002001000

Pu

rch

ase

Pri

ce300000

200000

100000

0

Page 23: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

23

R-square R-square tells you how much of the variation in y is explained by the

explanatory variable x– 0 < R2 < 1 (NB: you want R2 to be near 1).

– If more than one explanatory variable, use Adjusted R2

Model Summary

.721a .519 .519 26925.18Model1

R R SquareAdjustedR Square

Std. Errorof the

Estimate

Predictors: (Constant), Floor Area (sq meters)a.

Page 24: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

24

Example: 2 explanatory variablesCoefficientsa

-23817.3 3623.761 -6.573 .000

507.271 30.722 .572 16.511 .000

24951.769 3401.282 .254 7.336 .000

(Constant)

Floor Area (sq meters)

Number of Bathrooms

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: Purchase Pricea.

Model Summary

.750a .562 .560 25726.74Model1

R R SquareAdjustedR Square

Std. Errorof the

Estimate

Predictors: (Constant), Number of Bathrooms ,Floor Area (sq meters)

a.

Page 25: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

25

Scatter plot (with floor spikes)

Purchase Price

300

100000

200000

300000

200

Floor Area (sq meters)3.53.0100 2.5

Number of Bathrooms2.01.51.0

Page 26: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

26

3D Surface Plots:Construction, Price & UnemploymentQ = -246 + 27P - 0.2P2 - 73U + 3U2

Q

Ut-1

P

020

4060

80

0

5

1015

-500

0

500

020

4060

80

0

5

1015

Page 27: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

27

Construction Equation in a Slump

Q = 315 + 4P - 73U + 5U2

020

4060

80

0

510

15

0

200

400

600

800

020

4060

80

0

510

15

Page 28: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

28

5. Types of regression analysis: Univariate regression: one explanatory variable

– what we’ve looked at so far in the above equations

Multivariate regression: >1 explanatory variable– more than one equation on the RHS

Log-linear regression & log-log regression:– taking logs of variables can deal with certain types of non-

linearities & useful properties (e.g. elasticities)

Categorical dependent variable regression:– dependent variable is dichotomous -- observation has an

attribute or not • e.g. MPPI take-up, unemployed or not etc.

Page 29: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

29

6. Properties of OLS estimators

OLS estimates of the slope and intercept parameters have been shown to be BLUE (provided certain assumptions are met):

• Best• Linear• Unbiased• Estimator

Page 30: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

30

“Best” in that they have the minimum variance compared with other estimators (i.e. given repeated samples, the OLS estimates for α and β vary less between samples than any other sample estimates for α and β).

“Linear” in that a straight line relationship is assumed.

“Unbiased” because, in repeated samples, the mean of all the estimates achieved will tend towards the population values for α and β.

“Estimates” in that the true values of α and β cannot be known, and so we are using statistical techniques to arrive at the best possible assessment of their values, given the information available.

Page 31: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

31

7. Assumptions of OLS:

For estimation of a and b to be BLUE and for regression inference to be correct:1. Equation is correctly specified:

– Linear in parameters (can still transform variables)– Contains all relevant variables– Contains no irrelevant variables– Contains no variables with measurement errors

2. Error Term has zero mean3. Error Term has constant variance

Page 32: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

32

4. Error Term is not autocorrelated– I.e. correlated with error term from previous time

periods

5. Explanatory variables are fixed– observe normal distribution of y for repeated fixed

values of x

6. No linear relationship between RHS

variables – I.e. no “multicolinearity”

Page 33: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

33

8. Doing Regression analysis in SPSS To run

regression analysis in SPSS, click on Analyse, Regression, Linear:

Page 34: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

34

Select your dependent (i.e. ‘explained’) variable and independent (i.e. ‘explanatory’) variables:

Page 35: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

35

e.g. Floor area and bathrooms:Floor area = + Number of bathrooms +

Model Summary

.584a .341 .340 35.58Model1

R R SquareAdjustedR Square

Std. Errorof the

Estimate

Predictors: (Constant), Number of Bathroomsa.

ANOVAb

362375.7 1 362375.7 286.292 .000a

701228.5 554 1265.755

1063604 555

Regression

Residual

Total

Model1

Sum ofSquares df

MeanSquare F Sig.

Predictors: (Constant), Number of Bathroomsa.

Dependent Variable: Floor Area (sq meters)b.

Page 36: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

36

Coefficientsa

40.928 4.700 8.708 .000

64.622 3.819 .584 16.920 .000

(Constant)

Number of Bathrooms

Model1

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig.

Dependent Variable: Floor Area (sq meters)a.

Page 37: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

37

Confidence Intervals for regression coefficients Population slope coefficient CI:

Rule of thumb:

bi SEtb

bSEb 2

Page 38: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

38

e.g. regression of floor area on number of bathrooms, CI on slope:

= 64.6 2 3.8

= 7.6

95% CI = ( 57, 72)

Page 39: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

39

Confidence Intervals in SPSS:Analyse, Regression, Linear, click on Statistics and

select Confidence intervals:

Page 40: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

40

Our rule of thumb said 95% CI for slope = ( 57, 72). How does this compare?

Coefficientsa

40.928 4.700 8.708 .000 31.697 50.160

64.622 3.819 .584 16.920 .000 57.120 72.123

(Constant)

BATHROOM Numberof Bathrooms

B Std. Error

UnstandardizedCoefficients

Beta

Standardized

Coefficients

t Sig. Lower Bound Upper Bound

95% Confidence Interval for B

Dependent Variable: FLOORARE Floor Area (sq meters)a.

Page 41: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

41

Past Paper: (C2)Relationships (30%)

Suppose you have a theory that suggests that time watching TV is determined by gregariousness the less gregarious, the more time spent watching TV

Use a random sample of 60 observations from the TV watching data to run a statistical test for this relationship that also controls for the effects of age and gender.

Carefully interpret the output from this model and discuss the statistical robustness of the results.

Page 42: 1 Lecture 8 Relationships between Scale variables: Regression Analysis Graduate School Quantitative Research Methods Gwilym Pryce g.pryce@socsci.gla.ac.uk.

42

Reading:

Regression Analysis:– *Field, A. chapters on regression.– *Moore and McCabe Chapters on regression.– Kennedy, P. ‘A Guide to Econometrics’– Bryman, Alan, and Cramer, Duncan (1999)

“Quantitative Data Analysis with SPSS for Windows: A Guide for Social Scientists”, Chapters 9 and 10.

– Achen, Christopher H. Interpreting and Using Regression (London: Sage, 1982).