PPA 501 – Analytical Methods in Administration Lecture 8 – Linear Regression and Correlation.

31
PPA 501 – Analytical Methods in Administration Lecture 8 – Linear Regression and Correlation
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    1

Transcript of PPA 501 – Analytical Methods in Administration Lecture 8 – Linear Regression and Correlation.

PPA 501 – Analytical Methods in Administration

Lecture 8 – Linear Regression and Correlation

Scattergrams

The first step in examining a relationship between interval-ratio variables is to prepare a scattergram.

A scattergram, like a bivariate table, has two dimensions. The scores on the independent variable (X) are arrayed along the horizontal axis. The scores on the dependent variables (Y) are arrayed along the vertical axis.

Each dot in the scattergram represents the X and Y scores for a case.

Scattergrams

The pattern can be enhanced by drawing a straight line to represent the data that is as close as possible to all of the points in the scattergram.

Two variables are associated if the values of Y are conditional on the values of X (increase or decrease depending on X).

Scattergrams

The strength of the relationship can be visualized by examining how tightly the points fit around the line.

Positive relationships will slope up; Negative relationships will slope down. Zero relationships will appear to be a cloud of random points.

Scattergrams – Positive Relationship

Republicanism by Conservatism

LIBERAL-CONSERVATIVE 7PT SCALE

876543210

7-P

T S

CA

LE

PA

RT

Y ID

EN

TIF

ICA

TIO

N

8

7

6

5

4

3

2

1

0 Rsq = 0.2350

Scattergrams – Negative Relationship

Support for Abortion by Conservatism

LIBERAL-CONSERVATIVE 7PT SCALE

876543210

WH

EN

SH

OU

LD

AB

OR

TIO

N B

E A

LL

OW

ED

BY

LA

W

4.5

4.0

3.5

3.0

2.5

2.0

1.5

1.0

.5 Rsq = 0.1192

Scattergram – Little or No Relationship

Conservatism by Age

RESPONDENT AGE

908070605040302010

LIB

ER

AL

-CO

NS

ER

VA

TIV

E T

HE

RM

OM

ET

ER

IND

EX

100

80

60

40

20

0 Rsq = 0.0050

Regression and Prediction

One key assumption underlying the statistical techniques to be discussed here is that the two variables have an essentially linear relationship. That is, the scattergram approximates a straight line.

Non-linearity requires adjustments to the model that will not be discussed in this course.

Regression and Prediction

Mean Party ID by Level of Conservatism

LIBERAL-CONSERVATIVE 7PT SCALE

Extremely conservati

Conservative

Slightly conservativ

Moderate, middle of

Slightly liberal

Liberal

Extremely liberal

Me

an

of

Pa

rty

ID

7.0

6.0

5.0

4.0

3.0

2.0

1.0

Regression and Prediction

The mean of any distribution of scores is the point around which the variation of scores is a minimum (i.e., no other point provides a smaller minimum).

The same is true of conditional means.

minimum2

XX i

Regression and Prediction

The best fitting line will go as close as possible to the conditional means since the conditional means will rarely lie in a straight line.

The formula for the best fitting line is:

t variableindependen on the scoreX

line regression theof slope theb

intercept Y thea

variabledependent on the scoreY

where

bxaY

Regression and Prediction

The Y intercept is the point at which the regression line crosses the Y axis.

The slope of the least-squares regression line is the amount of change in the dependent variable (Y) that is produced by a unit change in the independent variable (X).

You can use the formula for the regression line to predict scores not in the data set.

Regression and Prediction

Xon scores squared theofsummation the

Yon scores theofsummation the

Xon scores theofsummation the

scores theof ctscrossprodu theofsummation the

cases ofnumber theN

intercept thea

slopeb

where

formula. nalcomputatio ;

formula. al theoretic;

2

22

2

X

Y

X

XY

XbYa

XXN

YXXYNb

XX

YYXXb

Regression and Prediction

Regression and Prediction

2346.05654.48.4)3.4(0617.18.4

0617.181

86

18491930

20642150

)43()193(10

)48)(43()215(10222

XbYa

XXN

YXXYNb

The Correlation Coefficient (Pearson’s r)

The slope of the regression line is a measure of the effect of X on Y.

Because it is measure in the units of measurement of Y, it is not restricted to fall between zero and one.

As a result, researchers rely on Pearson’s r as a measure of interval-ratio association.

Varies from -1 to 1 with 0 equaling no association.

The Correlation Coefficient (Pearson’s r)

formula. nalcomputatio ;

formula. al theoretic;

2222

22

YYNXXN

YXXYNr

YYXX

YYXXr

The Correlation Coefficient (Pearson’s r)

The Correlation Coefficient (Pearson’s r)

2565.

5064.8117.169

86

35681

20642150

48266104319310

484321510

2

222222

r

r

YYNXXN

YXXYNr

The r suggests a moderately strong, positive relationship.

Interpreting the Correlation Coefficient: r2

The interpretation of Pearson’s r focuses on r2, the coefficient of determination.

The coefficient of determination is based on the following formula for r-squared.

Variation Total

Variation Explained'2

2

2

YY

YYr

Interpreting the Correlation Coefficient: r2

R-squared can be interpreted in two ways: Percentage of total variation explained by the independent

variable. Proportional reduction in error.

Thus, the r-squared in the ideology-party ID problem calculated on slide 19 is .2565. Ideology explains 25.65% of the variation in party identification. Knowing a person’s ideology reduces the error in predicting their

party identification by 25.65%. The unexplained variation is represented by the scatter

of points around the regression line.

The Five-Step Model for Testing Pearson’s r.

Step 1. Making assumptions. Random sampling. Interval-ratio measurement. Bivariate normal distribution. Linear relationship. Homoscedasticity. Normal sampling distribution.

The Five-Step Model for Testing Pearson’s r.

Step 2. Stating the null hypothesis. H0: ρ = 0.0.

H1: ρ > 0.0.

Step 3. Selecting the sampling distribution and establishing the critical region. Sampling distribution = t distribution. Alpha=.05, one-tailed. Degrees of freedom=N-2=10-2=8. T (critical) = +1.860.

The Five-Step Model for Testing Pearson’s r.

Step 4. Computing the test statistic.

66.1)(

7672.105064.7435.

85064.)(

2565.1

2105064.

1

2)(

2

obtainedt

obtainedt

r

Nrobtainedt

The Five-Step Model for Testing Pearson’s r.

Step 5. Making a decision. T(obtained) is less than t(critical). Therefore,

we cannot reject the null hypothesis that the relationship between ideology and party ID is zero in the population.

Multiple Regression

Multiple regression analysis has wide application in political, economic, social, and education research.

It can be used with either continuous data or categorical data.

Extension of linear regression with several independent variables.

Multiple regression

The basis for multiple regression is the correlation matrix among the independent and dependent variables (the matrix of intercorrelations among all variables).

Multiple Regression – Disaster Declaration

Multiple Regression – Disaster Declaration

Multiple Regression – Disaster Declaration

Multiple Regression – Disaster Declaration