Post on 19-Dec-2015
Scattergrams
The first step in examining a relationship between interval-ratio variables is to prepare a scattergram.
A scattergram, like a bivariate table, has two dimensions. The scores on the independent variable (X) are arrayed along the horizontal axis. The scores on the dependent variables (Y) are arrayed along the vertical axis.
Each dot in the scattergram represents the X and Y scores for a case.
Scattergrams
The pattern can be enhanced by drawing a straight line to represent the data that is as close as possible to all of the points in the scattergram.
Two variables are associated if the values of Y are conditional on the values of X (increase or decrease depending on X).
Scattergrams
The strength of the relationship can be visualized by examining how tightly the points fit around the line.
Positive relationships will slope up; Negative relationships will slope down. Zero relationships will appear to be a cloud of random points.
Scattergrams – Positive Relationship
Republicanism by Conservatism
LIBERAL-CONSERVATIVE 7PT SCALE
876543210
7-P
T S
CA
LE
PA
RT
Y ID
EN
TIF
ICA
TIO
N
8
7
6
5
4
3
2
1
0 Rsq = 0.2350
Scattergrams – Negative Relationship
Support for Abortion by Conservatism
LIBERAL-CONSERVATIVE 7PT SCALE
876543210
WH
EN
SH
OU
LD
AB
OR
TIO
N B
E A
LL
OW
ED
BY
LA
W
4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
.5 Rsq = 0.1192
Scattergram – Little or No Relationship
Conservatism by Age
RESPONDENT AGE
908070605040302010
LIB
ER
AL
-CO
NS
ER
VA
TIV
E T
HE
RM
OM
ET
ER
IND
EX
100
80
60
40
20
0 Rsq = 0.0050
Regression and Prediction
One key assumption underlying the statistical techniques to be discussed here is that the two variables have an essentially linear relationship. That is, the scattergram approximates a straight line.
Non-linearity requires adjustments to the model that will not be discussed in this course.
Regression and Prediction
Mean Party ID by Level of Conservatism
LIBERAL-CONSERVATIVE 7PT SCALE
Extremely conservati
Conservative
Slightly conservativ
Moderate, middle of
Slightly liberal
Liberal
Extremely liberal
Me
an
of
Pa
rty
ID
7.0
6.0
5.0
4.0
3.0
2.0
1.0
Regression and Prediction
The mean of any distribution of scores is the point around which the variation of scores is a minimum (i.e., no other point provides a smaller minimum).
The same is true of conditional means.
minimum2
XX i
Regression and Prediction
The best fitting line will go as close as possible to the conditional means since the conditional means will rarely lie in a straight line.
The formula for the best fitting line is:
t variableindependen on the scoreX
line regression theof slope theb
intercept Y thea
variabledependent on the scoreY
where
bxaY
Regression and Prediction
The Y intercept is the point at which the regression line crosses the Y axis.
The slope of the least-squares regression line is the amount of change in the dependent variable (Y) that is produced by a unit change in the independent variable (X).
You can use the formula for the regression line to predict scores not in the data set.
Regression and Prediction
Xon scores squared theofsummation the
Yon scores theofsummation the
Xon scores theofsummation the
scores theof ctscrossprodu theofsummation the
cases ofnumber theN
intercept thea
slopeb
where
formula. nalcomputatio ;
formula. al theoretic;
2
22
2
X
Y
X
XY
XbYa
XXN
YXXYNb
XX
YYXXb
Regression and Prediction
2346.05654.48.4)3.4(0617.18.4
0617.181
86
18491930
20642150
)43()193(10
)48)(43()215(10222
XbYa
XXN
YXXYNb
The Correlation Coefficient (Pearson’s r)
The slope of the regression line is a measure of the effect of X on Y.
Because it is measure in the units of measurement of Y, it is not restricted to fall between zero and one.
As a result, researchers rely on Pearson’s r as a measure of interval-ratio association.
Varies from -1 to 1 with 0 equaling no association.
The Correlation Coefficient (Pearson’s r)
formula. nalcomputatio ;
formula. al theoretic;
2222
22
YYNXXN
YXXYNr
YYXX
YYXXr
The Correlation Coefficient (Pearson’s r)
2565.
5064.8117.169
86
35681
20642150
48266104319310
484321510
2
222222
r
r
YYNXXN
YXXYNr
The r suggests a moderately strong, positive relationship.
Interpreting the Correlation Coefficient: r2
The interpretation of Pearson’s r focuses on r2, the coefficient of determination.
The coefficient of determination is based on the following formula for r-squared.
Variation Total
Variation Explained'2
2
2
YY
YYr
Interpreting the Correlation Coefficient: r2
R-squared can be interpreted in two ways: Percentage of total variation explained by the independent
variable. Proportional reduction in error.
Thus, the r-squared in the ideology-party ID problem calculated on slide 19 is .2565. Ideology explains 25.65% of the variation in party identification. Knowing a person’s ideology reduces the error in predicting their
party identification by 25.65%. The unexplained variation is represented by the scatter
of points around the regression line.
The Five-Step Model for Testing Pearson’s r.
Step 1. Making assumptions. Random sampling. Interval-ratio measurement. Bivariate normal distribution. Linear relationship. Homoscedasticity. Normal sampling distribution.
The Five-Step Model for Testing Pearson’s r.
Step 2. Stating the null hypothesis. H0: ρ = 0.0.
H1: ρ > 0.0.
Step 3. Selecting the sampling distribution and establishing the critical region. Sampling distribution = t distribution. Alpha=.05, one-tailed. Degrees of freedom=N-2=10-2=8. T (critical) = +1.860.
The Five-Step Model for Testing Pearson’s r.
Step 4. Computing the test statistic.
66.1)(
7672.105064.7435.
85064.)(
2565.1
2105064.
1
2)(
2
obtainedt
obtainedt
r
Nrobtainedt
The Five-Step Model for Testing Pearson’s r.
Step 5. Making a decision. T(obtained) is less than t(critical). Therefore,
we cannot reject the null hypothesis that the relationship between ideology and party ID is zero in the population.
Multiple Regression
Multiple regression analysis has wide application in political, economic, social, and education research.
It can be used with either continuous data or categorical data.
Extension of linear regression with several independent variables.
Multiple regression
The basis for multiple regression is the correlation matrix among the independent and dependent variables (the matrix of intercorrelations among all variables).