SW388R6 Data Analysis and Computers I Slide 1 General Linear Models The theory of general linear...

47
SW388R6 Data Analysis and Computers I Slide 1 General Linear Models The theory of general linear models posits that many statistical tests can be solved as a regression analysis, including t-tests and ANOVA’s. General linear models become even more useful when our analysis includes both numeric (interval level) and categorical variables (nominal level), since both can directly be entered into the analysis, and SPSS will do any needed dummy coding. In this example, we will demonstrate the equivalence of regression and ANOVA. We will use the SPSS General Linear Models procedure for a variety of tests in the future.

Transcript of SW388R6 Data Analysis and Computers I Slide 1 General Linear Models The theory of general linear...

Page 1: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 1

General Linear Models

The theory of general linear models posits that many statistical tests can be solved as a regression analysis, including t-tests and ANOVA’s.

General linear models become even more useful when our analysis includes both numeric (interval level) and categorical variables (nominal level), since both can directly be entered into the analysis, and SPSS will do any needed dummy coding.

In this example, we will demonstrate the equivalence of regression and ANOVA. We will use the SPSS General Linear Models procedure for a variety of tests in the future.

Page 2: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 2

This problem uses the data set GSS2000R.Sav to compare the average score on the variable "highest year of school completed" [educ] for groups of survey respondents defined by the variable "subjective class identification" [class]. Using a one-way analysis of variance and a post hoc test with an alpha of .05, is the following statement true, true with caution, false, or an incorrect application of a statistic?

Survey respondents who said they belonged in the working class completed fewer years of school (M = 12.58, SD = 2.50) than survey respondents who said they belonged in

the middle class (M = 13.83, SD = 3.14).

o Trueo True with cautiono Falseo Incorrect application of a statistic

Homework problems: One-way Analysis of Variance – Specific Relationship

Tested

In the PowerPoint for One-Way ANOVA, we solved this problem, using SPSS’ One-Way ANOVA command.

Applying the theory of general linear models, we will solve this problem with linear regression.

Page 3: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 3

This problem uses the data set GSS2000R.Sav to compare the average score on the variable "highest year of school completed" [educ] for groups of survey respondents defined by the variable "subjective class identification" [class]. Using a one-way analysis of variance and a post hoc test with an alpha of .05, is the following statement true, true with caution, false, or an incorrect application of a statistic?

Survey respondents who said they belonged in the working class completed fewer years of school (M = 12.58, SD = 2.50) than survey respondents who said they belonged in

the middle class (M = 13.83, SD = 3.14).

o Trueo True with cautiono Falseo Incorrect application of a statistic

Converting the One-Way ANOVA problem

to a Regression problem

To solve this problem with regression, we need to dummy code the independent variable.

Since the problem includes, a specific comparison, we need to select the reference group that makes this comparison possible.

Specifically, we will use the working class category as the reference group, so that we can compare the difference between the middle class and the working class.

We could just as easily have chose the middle class as the reference category.

Page 4: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 4

Coding scheme for new variables

Original Variable Coding

Coding for New Variables

lowerClass

middleClass

upperClass

1 = lower class 1 0 0

2 = working class 0 0 0

3 = middle class 0 1 0

4 = upper class 0 0 1

The coding scheme for the new variables in shown in the table below.

The class variable contained the four categories in the first column.

We will create three new dichotomous variables: lowerClass, middleClass, and upperClass. Each new variable will have a 1 in the matching category from the original variable and zeros for all of the other categories.

Page 5: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 5

Using Recoding in SPSS to Create New Variables

Select the Recode > Into Different Variables command from the Transform menu.

Page 6: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 6

Creating the lowerClass variable

First, select the variable to be dummy-coded, class, from the list of variables and move it to the Numeric Variable -> Output Variable list box.

Third, click on the Change button to replace the ? with this new variable name.

Second, type in the name for the new variable.

Page 7: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 7

Assigning values to new variable

Next, click on the Old and New Values button to assign values to the new variable.

Page 8: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 8

Preserving missing values

Third, click on the Add button to include this recoding for the variable

First, mark the System- or user-missing option button on the Old Value panel.

Second, mark the System-missing option button on the New Value panel.

If we forget to explicitly assign missing values, cases with missing data will be recoded with a 0 and become part of the reference group.

Page 9: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 9

Coding the lowerClass category

Third, click on the Add button to include this recoding for the variable

First, to recode the 1 = lower class category to the dummy variable, mark the Value option button and type a 1 in the text box on the Old Value panel.

Second, mark the Value option button and type a 1 in the text box on the New Value panel. This coding says: if they were originally in the lower class category, they are assigned a value of 1 for the lowerClass dummy variable.

Page 10: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 10

Coding the other categories

Third, click on the Add button to include this recoding for the variable

First, to identify subjects in the categories other than lower class, mark the All other values option button on the Old Value panel.

Second, mark the Value option button and type a 0 in the text box on the New Value panel. This coding says: if they were originally NOT in the lower class category, they are assigned a value of 0 for the lowerClass dummy variable.

Page 11: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 11

Completing the recoding

When we have completed the coding for the new variable, click on the Continue button.

Page 12: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 12

Completing the lowerClass variable

Click on the OK button to create the new variable in the data editor.

Page 13: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 13

Dummy variable coding for middleClass variable

Following the same steps, we create the dummy variable for subjects who were 3 = middle class on the original class variable.

The coding is similar to that for married subjects, except the category that was originally coded 3 = middle class is translated into a 1 on the new variable.

Page 14: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 14

Dummy variable coding for upperClass variable

Following the same steps, we create the dummy variable for subjects who were 4 = upper class on the original class variable.

The coding is similar to that for married subjects, except the category that was originally coded 4 = upper class is translated into a 1 on the new variable.

Page 15: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 15

Dummy-coded variables for class - 1

Subjects with a code value of 2 on the original class variable now have a 0 for all the new variables.

Subjects with a code value of 3 on the original class variable now have a 1 for middleClass and a 0 for the other new variables.

Page 16: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 16

Dummy-coded variables for class - 2

Subjects with a code value of 4 on the original class variable now have a 1 for upperClass and a 0 for the other new variables.

Subjects with a code value of 1 on the original class variable now have a 1 for lowerClass and a 0 for the other new variables.

Since it is very easy to make a mistake in recoding, it is imperative that we check the results of our recoding.

Page 17: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 17

Regression of education on class variables - 1

Select the Regression > Linear command from the Analyze menu.

Page 18: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 18

Regression of education on class variables - 2

Third, click on the OK button to produce the output.

First, we move the dependent variable to the Dependent Variable text box.

Second, we move the three dummy coded variables to the list of Independents.

Page 19: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 19

Results of regression of education on class variables – overall relationship

The overall relationship is statistically significant, (F(3, 264) = 4.97, p < .01).

Page 20: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 20

Comparison to One-way ANOVA of education by class – overall relationship

The overall relationship is statistically significant, (F(3, 264) = 4.97, p < .01).

Moreover, all of the statistical values in the ANOVA table are identical to the results from regression.

Page 21: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 21

Results of regression of education on class variables – individual

relationships

The tests of individual relationships are a comparison each group to the reference group.

The difference between the middle class group and the working group is statistically significant.

Page 22: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 22

Results of regression of education on class variables – individual

relationships

Subjects in the middle class had, on average, 1.249 more years of education than the working class.

B coefficients are interpreted as the increase or decrease in the estimate of the dependent variable associated with the change from the reference group to the dummy-coded group.

Page 23: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 23

Comparison to One-way ANOVA of education by class – individual

relationship

In the post hoc test, the difference between the middle class and the working class was also 1.249 years of education, and was a statistically significant relationship.

Page 24: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 24

Comparison to One-way ANOVA of education by class – individual

relationship

However, the calculations for the post hoc test are completely different from the test of the b coefficient in the regression, which is reasonable since they are very different tests. The test of the b coefficient is a test of the hypothesis that b is not equal to 0.

Post hoc tests are not hypothesis tests. The only hypothesis tested in the One-Way ANOVA was that one of the group means was different from the others. The post hoc test provided additional information about the differences, but it is not a hypothesis test because no hypothesis test was specified in advance of the statistical calculations.

The significance of the test of the b coefficient was .001, while the significance of the post hoc test was .005.

In this example we would make a similar interpretation, but that is not always the case.

Page 25: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 25

Using linear contrasts to test specific group hypotheses - 1

It is possible to include a hypothesis test of differences between specific groups within the one-way ANOVA, using linear contrasts.

Using the notation from the text, we would specify the linear contrast as the difference between the working class and the middle class. Since the problem indicated that middle class respondents had more education than working class respondents, we would write the contrast as:

l = μmiddle class – μworking class

where l is a linear contrast and μ’s are group means

Page 26: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 26

Using linear contrasts to test specific group hypotheses - 2

If we explicitly include coefficients for the population means in the contrast equation

l = μmiddle class – μworking class

becomesl = +1 × μmiddle class –1 × μworking class

and if we add in the means for the other groupsl = +1 × μmiddle class –1 × μworking class

+0 × μlower class +0 × μupper class

which is the contrast we will enter into SPSS

Page 27: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 27

Testing a hypothesis comparing groupswithin One-Way ANOVA - 1

Select the Compare Means > One-Way ANOVA command from the Analyze menu.

Page 28: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 28

Testing a hypothesis comparing groupswithin One-Way ANOVA - 2

First, move the dependent variable educ and the independent variable class into the list boxes.

Second, click on the Contrasts button to add the linear contrast.

Page 29: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 29

Testing a hypothesis comparing groupswithin One-Way ANOVA - 3

The contrasts must be entered in the same order that the variable is coded, i.e. from low to high codes for categories.

First, type the contrast coefficient for the lower class group, 0, into the Coefficients text box.

Second, click on the Add button to add the coefficent to the list box.

The contrast coefficients were:

•0 for lower class•-1 for working class•+1 for middle class•0 for upper class

Page 30: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 30

Testing a hypothesis comparing groupswithin One-Way ANOVA - 1

Click on the Continue button to close the dialog box.

Add the contrast coefficients for the working class (-1), the middle class (+1), and the upper class (0) to the list box.

Page 31: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 31

Testing a hypothesis comparing groupswithin One-Way ANOVA - 5

Click on the OK button to request the output.

Page 32: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 32

Testing a hypothesis comparing groupswithin One-Way ANOVA - 6

The value and significance of the F-test are identical to the results obtained in the regression, as well as the one-way ANOVA with the post hoc tests.

Moreover, the results for the contrast test match the test of the b coefficient in the regression analysis (β(264) =3.372, p < .01)

Page 33: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 33

SPSS’ general linear models procedure

SPSS has a command for directly computing general linear models that is much more versatile that the regression command that we just used. The procedure contains options and diagnostic statistics that are not available in its linear regression command.

The default for group comparisons with this command is to compute contrasts with group with the highest numeric code. Since we want the comparison to be with the working class group, we will first change the numeric code for the group from 2 to 5 so that it is the highest numeric value.

Page 34: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 34

Recoding the class variable - 1

To change the numeric coding for the working category so it is the highest numeric value, we again select Recode > Into Different Variables command from the Transform variable.

Page 35: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 35

Recoding the class variable - 2

First, select the variable to be dummy-coded, class, from the list of variables and move it to the Numeric Variable -> Output Variable list box.

Third, click on the Change button to replace the ? with this new variable name.

Second, type in the name for the new variable.

Page 36: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 36

Recoding the class variable - 3

Next, click on the Old and New Values button to assign values to the new variable.

Page 37: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 37

Recoding the class variable - 4

Third, click on the Add button to include this recoding for the variable

First, mark the System- or user-missing option button on the Old Value panel.

Second, mark the System-missing option button on the New Value panel.

Page 38: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 38

Recoding the class variable - 5

Third, click on the Add button to include this recoding for the variable

First, to recode the 2 = working class category to the dummy variable, mark the Value option button and type a 2 in the text box on the Old Value panel.

Second, mark the Value option button and type a 5 in the text box on the New Value panel. This coding says: if they were originally in the working class category, they are assigned a value of 5 for the new variable.

Page 39: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 39

Recoding the class variable - 5

Third, click on the Add button to include this recoding for the variable

First, since we want all of the other codes to remain the same, we click on the All other values option button.

Second, mark the Copy old values option button to retain the codes for the remaining groups.

Page 40: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 40

Recoding the class variable - 6

When we have completed the coding for the new variable, click on the Continue button.

Page 41: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 41

Recoding the class variable - 7

Click on the OK button to create the new variable in the data editor.

Page 42: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 42

Recoding the class variable - 8

We check the values in the data editor to make sure the recode worked as anticipated. In this example, we see that the 2’s for class are correctly recoded as 5’s.

Page 43: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 43

Using SPSS’ general linear models - 1

To solve the problem using SPSS’ General Linear Model command, select General Linear Model > Univariate from the Analyze menu.

The univariate command indicates that we have a single dependent variable.

Page 44: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 44

Using SPSS’ general linear models - 2

First, we move the dependent variable to the Dependent Variable text box.

Second, we move the newly created independent variable to the Fixed Factors list box.

Fixed factors are those for which all possible codes are represented in the data set.

Third, click on the Options button to specify additional output. While the univariate GLM command has numerous specifications, we only need one request for this problem.

Random Factors are categorical variables which can take on values different from those in our data set.

Covariates are interval level variables or variables we wish to treat as interval level.

Page 45: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 45

Using SPSS’ general linear models - 3

First, mark the check box for Parameter estimates. This will compute and test the coefficients.

Second, click on the Continue button to close the dialog box.

Page 46: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 46

Using SPSS’ general linear models - 4

Click on the OK button to produce the output.

Page 47: SW388R6 Data Analysis and Computers I Slide 1 General Linear Models  The theory of general linear models posits that many statistical tests can be solved.

SW388R6Data Analysis

and Computers I

Slide 47

SPSS’ general linear models output

The value and significance of the F-test are identical to the results obtained in the regressionand the one-way ANOVA with the post hoc tests.

Subjects in the middle class (code 3) had, on average, 1.249 more years of education than the working class. The difference is statistically significant and identical to the findings from the other comparisons, (β(264) =3.372, p < .01)