Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1 ...

74
Slide 1 Stepwise Binary Logistic Regression

Transcript of Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1 ...

Page 1: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 1

Stepwise Binary Logistic Regression

Page 2: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 2

Stepwise Binary Logistic Regression - 1

Stepwise binary logistic regression is very similar to stepwise multiple regression in terms of its advantages and disadvantages.

Stepwise logistic regression is designed to find the most parsimonious set of predictors that are most effective in predicting the dependent variable.

Variables are added to the logistic regression equation one at a time, using the statistical criterion of reducing the -2 Log Likelihood error for the included variables.

After each variable is entered, each of the included variables are tested to see if the model would be better off the variable were excluded. This does not happen often.

The process of adding more variables stops when all of the available variables have been included or when it is not possible to make a statistically significant reduction in -2 Log Likelihood using any of the variables not yet included.

Nonmetric variables are added to the logistic regression as a group. It is possible, and often likely, that not all of the individual dummy-coded variables will have a statistically significant individual relationship with the dependent variable. We limit our interpretation to the dummy-coded variables that do have a statistically significant individual relationship.

Page 3: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 3

Stepwise Binary Logistic Regression - 2

SPSS provides a table of variables included in the analysis and a table of variables excluded from the analysis.  It is possible that none of the variables will be included.  It is possible that all of the variables will be included.

The order of entry of the variables can be used as a measure of relative importance.

Once a variable is included, its interpretation in stepwise logistic regression is the same as it would be using other methods for including variables.

The number of cases required for stepwise logistics regression is greater than the number for the other forms. We will use the norm of 20 cases for each independent variable, double the recommendation of Hosmer and Lemeshow.

Page 4: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 4

Pros and Cons of Stepwise Logistic Regression

Stepwise logistic regression can be used when the goal is to produce a predictive model that is parsimonious and accurate because it excludes variables that do not contribute to explaining differences in the dependent variable.

Stepwise logistic regression is less useful for testing hypotheses about statistical relationships. It is widely regarded as atheoretical and its usage is not recommended.

Stepwise logistic regression can be useful in finding relationships that have not been tested before. Its findings invite one to speculate on why an unusual relationship makes sense.

It is not legitimate to do a stepwise logistic regression and present the results as though one were testing a hypothesis that included the variables found to be significant in the stepwise logistic regression.

Using statistical criteria to determine relationships is vulnerable to over-fitting the data set used to develop the model at the expense of generalizability.

When stepwise logistic regression is used, some form of validation analysis is a necessity. We will use 75/25% cross-validation.

Page 5: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 5

75/25% Cross-validation

To do cross validation, we randomly split the data set into a 75% training sample and a 25% validation sample. We will use the training sample to develop the model, and we test its effectiveness on the validation sample to test the applicability of the model to cases not used to develop it.

In order to be successful, the follow two questions must be answers affirmatively: Did the stepwise logistic regression of the training sample produce the same subset of

predictors produced by the regression model of the full data set? If yes, compare the classification accuracy rate for the 25% validation sample to the

classification accuracy rate for the 75% training sample. If the shrinkage (accuracy for the 75% training sample - accuracy for the 25% validation sample) is 2% (0.02) or less, we conclude that validation was successful.

Note: shrinkage may be a negative value, indicating that the accuracy rate for the validation sample is larger than the accuracy rate for the training sample. Negative shrinkage (increase in accuracy) is evidence of a successful validation analysis.

If the validation is successful, we base our interpretation on the model that included all cases.

Page 6: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 6

The Problem in BlackboardThe Problem in Blackboard

The problem statement tells us: the variables included in the analysis to make the assumption that it is not

necessary to omit outliers whether each variable should be treated

as metric or non-metric the type of dummy coding and reference

category for non-metric variables the alpha for both the statistical

relationships and for diagnostic tests the random number seed for the

validation analysis

Page 7: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 7

The Statement about Level of Measurement

SPSS Binary Logistic Regression will dummy-code categorical variables for us, provided it is useful to use either the first or last category as the reference category.

The first statement in the problem asks about level of measurement. Stepwise binary logistic regression requires that the dependent variable be dichotomous, the metric independent variables be interval level, and the non-metric independent variables be dummy-coded if they are not dichotomous. SPSS Binary Logistic Regression calls non-metric variables “categorical.”

Page 8: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 8

Marking the Statement about Level of Measurement

• The independent variable "socioeconomic index" [sei] is interval level, satisfying the requirement for independent variables

• The independent variable "sex" [sex] is dichotomous level, satisfying the requirement for independent variables.

• The independent variable "respondent's degree of religious fundamentalism" [fund] is ordinal level, which the problem instructs us to dummy-code as a non-metric variable.

Mark the check box as a correct statement.

• The dependent variable "attitude toward abortion when a woman wants one for any reason" [abany] is dichotomous level, satisfying the requirement for the dependent variable. variable.

• The independent variable "age" [age] is interval level, satisfying the requirement for independent variables.

• The independent variable "highest year of school completed" [educ] is interval level, satisfying the requirement for independent variables.

• The independent variable "income" [rincom98] is ordinal level, but the problem calls for treating it as metric by applying the common convention of treating ordinal variables as interval level.

Page 9: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 9

The statement about multicollinearity and other numerical problems

To check for multicolliearity, we run the binary logistic regression in SPSS and check for outliers.

Multicollinearity in the logistic regression solution is detected by examining the standard errors for the b coefficients. A standard error larger than 2.0 indicates numerical problems, such as multicollinearity among the independent variables, cells with a zero count for a dummy-coded independent variable because all of the subjects have the same value for the variable, and 'complete separation' whereby the two groups in the dependent event variable can be perfectly separated by scores on one of the independent variables. Analyses that indicate numerical problems should not be interpreted.

Page 10: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 10

Running the Stepwise binary logistic regression

Select the Regression | Binary Logistic… command from the Analyze menu.

Page 11: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 11

Selecting the dependent variable

Second, click on the right arrow button to move the dependent variable to the Dependent text box.

First, highlight the dependent variable abany in the list of variables.

Page 12: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 12

Selecting the independent variables

First, move the control independent variables stated in the problem•"age" [age], •"highest year of school completed" [educ], •"income" [rincom98], "socioeconomic index" [sei], •"sex" [sex] and •"respondent's degree of religious fundamentalism" [fund]) to the Covariates list box.

Page 13: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 13

Declare the categorical variables - 1

To indicate that "sex" [sex] and "respondent's degree of religious fundamentalism" [fund] are categorical variables, we click on the Categorical button.

Page 14: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 14

Declare the categorical variables - 2

Move the variables sex and fund to the Categorical Covariates list box.

SPSS assigns its default method for dummy-coding, Indicator coding, to each variable, placing the name of the coding scheme in parentheses after each variable name.

Page 15: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 15

Declare the categorical variables - 3

We will also accept the default of using the last category as the reference category for each variable.

Click on the Continue button to close the dialog box.

We accept the default of using the Indicator method for dummy-coding variable..

Page 16: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 16

Specifying the method for including variables

Since the problem calls for a Stepwise binary logistic regression, we select the Forward:LR method for including variables.

Forward LR uses likelihood ratio tests to determine which variables are entered in what order.

Page 17: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 17

Requesting the output

While optional statistical output is available, we do not need to request any optional statistics.

Click on the OK button to request the output.

Page 18: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 18

Checking for multicollinearity

The standard errors for the variables included in the stepwsie procedure were: the standard error for "highest year of school completed" [educ] was .09, the standard error for survey respondents who said they were religiously fundamentalist was .56 and the standard error for survey respondents who said they were religiously moderate was .48.

Page 19: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 19

Marking the statement about multicollinearity and other numerical problems

Since none of the independent variables in this analysis had a standard error larger than 2.0, we mark the check box to indicate there was no evidence of multicollinearity.

Page 20: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 20

The statement about sample size

Hosmer and Lemeshow, who wrote the widely used text on logistic regression, suggest that the sample size should be 10 cases for every independent variable. Because stepwise procedures tend to overfit the data at the expense of generalizability, we will double the requirement to 20 cases for every independent variable.

Page 21: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 21

The output for sample size

The 106 cases available for the analysis did not satisfy the recommended sample size of 140 (7 independent variables times 20 cases per variable), which is based on double the recommended number of 10 cases per independent variable for logistic regression recommended by Hosmer and Lemeshow because of the issue of over-fitting the data when using stepwise methods. The failure to meet the sample size requirement should be mentioned as a limitation to the analysis. The number of independent variables includes 4 metric variables and 3 dummy-coded variables.

We find the number of cases included in the analysis in the Case Processing Summary.

Page 22: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 22

Marking the statement for sample size

Since we do not satisfy the sample size requirement, we leave the check box unmarked.

We should consider including this as a limitation to the analysis.

Page 23: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 23

The stepwise relationship between the dependent and independent variables

Three statements in the problem list different combinations of the variables included in the stepwise logistic regression.

To determine which is correct, we look at the table of Variables in the Equation for Block 1 in the SPSS output.

Page 24: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 24

The output for the stepwise relationship

Two independent variables satisfied the statistical criteria for entry into the model. The variable "highest year of school completed" [educ] had the largest individual impact (entered on step 1) on the dependent variable "attitude toward abortion when a woman wants one for any reason" [abany]. The second variable included in the model at step 2 was "respondent's degree of religious fundamentalism" [fund].

Page 25: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 25

Marking the statement for stepwise relationship

Two independent variables satisfied the statistical criteria for entry into the model. The variable "highest year of school completed" [educ] had the largest individual impact on the dependent variable "attitude toward abortion when a woman wants one for any reason" [abany]. The second variable included in the model was "respondent's degree of religious fundamentalism" [fund].

We mark the first check box in the set of three.

Note that in stepwise logistic regression, if any variables are entered, the overall relationship must be significant, since that is the criteria for including variables.

Page 26: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 26

The statement about the relationship between education and abortion for any reason

Having satisfied the criteria for the stepwise relationship, we examine the findings for individual relationships with the dependent variable. If the overall relationship were not significant, we would not interpret the individual relationships.

The first two statements offer alternative interpretations for the relationship between education and abortion for any reason.

Page 27: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 27

Output for the relationship between education and abortion for any reason

The probability of the Wald statistic for the independent variable "highest year of school completed" [educ] (χ²(1, N = 106) = 5.48, p = .019) was less than or equal to the level of significance of .05. The null hypothesis that the b coefficient for "highest year of school completed" [educ] was equal to zero was rejected. The value of Exp(B) for the variable "highest year of school completed" [educ] was 1.235 which implies an increase in the odds of 23.5% (1.235 - 1.000 = .235). The statement that 'For each unit increase in "highest year of school completed", survey respondents were 23.5% more likely to have thought it should be possible for a woman to obtain a legal abortion if she wants it for any reason' is correct.

Page 28: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 28

Marking the statement for relationship between education and abortion for any reason

Survey respondents were 23.5% more likely to have thought it should be possible for a woman to obtain a legal abortion if she wants it for any reason, we mark the check box for the second statement.

Page 29: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 29

Statement for relationship between fundamentalism and abortion for any reason

The next two statements concerns the relationship between the dummy-coded variable for religiously fundamentalist and abortion for any reason.

Page 30: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 30

Output for relationship between fundamentalism and abortion for any reason

The probability of the Wald statistic for the independent variable survey respondents who said they were religiously fundamentalist (χ²(1, N = 106) = 6.80, p = .009) was less than or equal to the level of significance of .05. The null hypothesis that the b coefficient for survey respondents who said they were religiously fundamentalist was equal to zero was rejected. The value of Exp(B) for the variable survey respondents who said they were religiously fundamentalist was .231 which implies a decrease in the odds of 76.9% (.231 - 1.000 = -.769). The statement that 'Survey respondents who said they were religiously fundamentalist were 76.9% less likely to have thought it should be possible for a woman to obtain a legal abortion if she wants it for any reason compared to those who said they were religiously liberal' is correct.

Page 31: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 31

Marking the relationship between fundamentalism and abortion for any reason

The statement that 'Survey respondents who said they were religiously fundamentalist were 76.9% less likely to have thought it should be possible for a woman to obtain a legal abortion if she wants it for any reason compared to those who said they were religiously liberal' is correct. The first statement is marked.

Page 32: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 32

Statement for relationship between fundamentalism and abortion for any reason

The next statement concerns the relationship between the dummy-coded variable for religious moderation and abortion for any reason.

Page 33: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 33

Output for relationship between fundamentalism and abortion for any reason

The probability of the Wald statistic for the independent variable survey respondents who said they were religiously moderate (χ²(1, N = 106) = 2.87, p = .090) was greater than the level of significance of .05. The null hypothesis that the b coefficient for survey respondents who said they were religiously moderate was equal to zero was not rejected. Survey respondents who said they were religiously moderate does not have an impact on the odds that survey respondents have thought it should be possible for a woman to obtain a legal abortion if she wants it for any reason. The analysis does not support the relationship that 'Survey respondents who said they were religiously moderate were 56.0% less likely to have thought it should be possible for a woman to obtain a legal abortion if she wants it for any reason compared to those who said they were religiously liberal‘.

Page 34: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 34

Marking the relationship between fundamentalism and abortion for any reason

Since the relationship was not statistically significant, we do not mark the check box for the statement.

Page 35: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 35

Statement for relationship between socioeconomic index and abortion for any reason

The next statement concerns the relationship between the metric variable socioeconomic index and abortion for any reason.

Page 36: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 36

Output for relationship between socioeconomic index and abortion for any reason

The independent variable "socioeconomic index" [sei] was not included among the statistically significant predictors and should not be intepreted. The statement that "For each unit increase in "socioeconomic index", survey respondents were 10.5% more likely to have thought it should be possible for a woman to obtain a legal abortion if she wants it for any reason" is not correct.

Page 37: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 37

Marking the relationship between socioeconomic index and abortion for any reason

Since the relationship was not statistically significant, the statement is marked.

Page 38: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 38

Statement about the usefulness of the model based on classification accuracy

The final statement concerns the usefulness of the logistic regression model. The independent variables could be characterized as useful predictors distinguishing survey respondents who use a computer from survey respondents who not use a computer if the classification accuracy rate was substantially higher than the accuracy attainable by chance alone. Operationally, the classification accuracy rate should be 25% or more higher than the proportional by chance accuracy rate.

Page 39: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 39

Computing proportional by-chance accuracy rate

The proportional by chance accuracy rate was computed by calculating the proportion of cases for each group based on the number of cases in each group in the classification table at Step 0, and then squaring and summing the proportion of cases in each group (.509² + .491² = .500).

The proportion in the largest group is 50.9%% or .509. The proportion in the other group is 1.0 – 0.509 = .491.

At Block 0 with no independent variables in the model, all of the cases are predicted to be members of the modal group, 0=NO in this example.

Page 40: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 40

Output for the usefulness of the model based on classification accuracy

To be characterized as a useful model, the accuracy rate should be 25% higher than the by chance accuracy rate.

The by chance accuracy criteria is computed by multiplying the by chance accurate rate of .500 times 1.25, or 1.25 x .500 = .625 (62.5%)..

The classification accuracy rate computed by SPSS was 67.9% which was greater than or equal to the proportional by chance accuracy criteria of 62.5% (1.25 x 50.0% = 62.5%).

The criteria for classification accuracy is satisfied.

Page 41: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 41

Marking the statement for usefulness of the model

Since the criteria for classification accuracy was satisfied, the check box is marked.

Page 42: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 42

Statement about Cross-validation

The final statement concerns the generalizability of our findings to the larger population. To answer this question, we will do a 75/25% cross-validation.

The findings from our analysis are generalizable to the extent that they are applicable to cases not included in the analysis. Since we cannot collect new cases, we will divide our sample into two subsets, using one subset to create the model and test the findings on the second subset of cases which were not included in the analysis that created the model.

Page 43: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 43

Creating the Training Sample and the Validation Sample - 1

The 75/25% cross-validation requires that we randomly divide the cases for this analysis into two parts:75% of the cases will be used to run the stepwise logistic regression (the training sample), which will be tested for accuracy on the remaining 25% of the cases (the validation sample).

To set the seed for the random number generator, select Random Number Generator from the Transform menu.

NOTE: you must use the random number seed that is stated in the problem in order to produce the same results that I found. Any other seed will generate a different random sequence that can produce results that are very different from mine.

Page 44: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 44

Creating the Training Sample and the Validation Sample - 2

Third, type the seed number provided in the problem directions: 981982.

First, mark the check for Set Starting Point.

Second, select the option button for a Fixed Value.

Fourth, click on the OK button to complete the action.

NOTE: SPSS does not provide any feedback that the seed has been set or changed. If you are in doubt, you can reopen the dialog box and see what it indicates.

Page 45: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 45

Creating the Training Sample and the Validation Sample - 3

We will create a variable that will contain the information about whether a case is in the training sample or the validation sample. We will name this variable “split” and use a value of 1 to indicate the training sample and a value of 0 to indicate the validation sample.

To create the new variable, select Compute from the Transform menu.

Page 46: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 46

Creating the Training Sample and the Validation Sample - 4

Type the name of the new variable, split, in the Target Variable text box.

Type the formula as shown in the Numeric Expression text box.

Click on the OK button to create the variable.

The formula uses the SPSS UNIFORM function to create a uniform distribution of decimal numbers between 0 and 1. If the generated number for a case is less than or equal to 0.75, the statement in the text box is True and the split variable will be assigned a 1 for that case. If the generated number is larger than 0.75, the statement is false and the case will be assigned a 0 for split.

Page 47: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 47

Creating the Training Sample and the Validation Sample - 5

If we scroll the data editor window to the right, we see the split variable in a new column.

Page 48: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 48

Creating the Training Sample and the Validation Sample - 6

If we created a frequency distribution for the split variable, we see that the breakdown is approximately, not exactly, correct. This is a consequence of generating random numbers – you have no control over the sequence that it generates beyond setting an initial seed.

Though I have done it to create specific results for homework problems, it is not acceptable to run repeated series of random numbers until one gets a sequence that has desirable properties.

Page 49: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 49

An Additional Task before Running the Stepwise Logistic Regression on Training Sample

Before we run the regression on the training sample, we need an additional step that will enable us to compare the accuracy of the model for the training sample to the accuracy of the model for the validation sample, using the R2 for each as our measure of accuracy.

We need to exclude from the analysis cases that are missing data for any of the variables that we have designated as candidates for inclusion. If we don’t specifically do this, SPSS may include different cases in predicting values for the dependent variable than it does in determining which variables to include in the model.

In model building, SPSS does listwise exclusion of missing data and omits any cases that have missing data for any variable. In predicting scores on the dependent variable, it excludes cases that are missing data for only the variables included in the stepwise model. Thus, when selecting variables, SPSS assumes that only respondents who answer all questions are valid cases; in predicting scores, it assumes that failing to answer a question on a variable that is not included has no importance in the analysis.

Page 50: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 50

Selecting Cases with Valid Data for All Variables in the Analysis - 1

To include only those cases that have valid data for all variables in the analysis, choose the Select Cases command from the Data menu.

Page 51: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 51

Selecting Cases with Valid Data for All Variables in the Analysis - 2

First, mark the option button for If condition is satisfied.

Second, click on the If button to add the condition.

Page 52: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 52

Selecting Cases with Valid Data for All Variables in the Analysis - 3

Type

NMISS(abany,age,educ,sex,rincom98,fund,sei) = 0

in the condition textbox. In the parentheses, we type the names of the dependent variable and all of the independent variables.

The SPSS NMISS function counts the number of variables in the list that have missing data.

Telling SPSS to include cases for which this calculation results in 0 indicates that the case was not missing data for any of the variables.

Page 53: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 53

Selecting Cases with Valid Data for All Variables in the Analysis - 4

Click on the Continue button to close the dialog box.

Page 54: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 54

Selecting Cases with Valid Data for All Variables in the Analysis - 5

Click on the OK button to execute the command.

Page 55: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 55

Selecting Cases with Valid Data for All Variables in the Analysis - 6

The excluded cases have a slash through the case number.

Page 56: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 56

Run the Stepwise Logistic Regression on the Training Sample - 1

To run the logistic regression, select Regression > Binary Logisitic from the Analyze menu.

Page 57: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 57

Run the Stepwise Logistic Regression on the Training Sample - 2

Move the dependent variable:•"attitude toward abortion when a woman wants one for any reason" [abany]

to the Dependent text box.

Move the control independent variables stated in the problem•"age" [age], •"highest year of school completed" [educ],• "sex" [sex] and •"respondent's degree of religious fundamentalism" [fund])•"income" [rincom98], •"socioeconomic index" [sei],

to the Covariates list box.

Page 58: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 58

Run the Stepwise Logistic Regression on the Training Sample - 3

To indicate that "sex" [sex] and "respondent's degree of religious fundamentalism" [fund] are categorical variables, we click on the Categorical button.

Page 59: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 59

Run the Stepwise Logistic Regression on the Training Sample - 4

Move the variables sex and fund to the Categorical Covariates list box.

Click on the Continue button to close the dialog box.

Page 60: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 60

Run the Stepwise Logistic Regression on the Training Sample - 5

Since the problem calls for a Stepwise binary logistic regression, we select the Forward:LR method for including variables.

Forward LR uses likelihood ratio tests to determine which variables are entered in what order.

Page 61: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 61

Run the Stepwise Logistic Regression on the Training Sample - 6

First, highlight the split variable.

To select the training sample, we move the split variable to the Selection Variable text box.

Second, click on the right arrow button to the left of the Selection Variable text box..

Page 62: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 62

Run the Stepwise Logistic Regression on the Training Sample - 7

Click on the Rule button to specify the value that we want split to use to select cases.

Page 63: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 63

Run the Stepwise Logistic Regression on the Training Sample - 7

First, type 1 in the Value text box. Recall that this is the value of split indicating training cases.

Second, click on the Continue button to close the dialog box.

Page 64: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 64

Run the Stepwise Logistic Regression on the Training Sample - 8

Click on the OK button to produce the output.

Page 65: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 65

Validating the Model - 1

If the number of steps were different, the validation would fail.

The stepwise binary logistic regression of the training sample resulted in the same number of steps as the full sample model (2).

Page 66: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 66

Validating the Model - 2

If the variables included were different, the validation would fail.

The same variables were selected in the stepwise logistic regression of the training sample that were selected in the stepwise logistic regression of the full sample "highest year of school completed" [educ], "respondent's degree of religious fundamentalism" [fund].

Page 67: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 67

Validating the Model - 3

Third, we compare the accuracy of the model for the validation sample to the accuracy of the model for the training sample.

The classification accuracy rate for the model using the training sample was 67.9%, compared to 72.7% for the validation sample. The classification accuracy for the validation sample was actually larger than the classification accuracy for the training sample, implying a better fit than obtained for the training sample. This supports a conclusion that the logistic regression model based on this analysis would be effective in predicting scores for cases other than those included in the sample.

Page 68: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 68

Marking the Check Box for the Cross-validation Statement

The validation analysis supported the generalizability of the findings of the analysis to the population represented by the sample in the data set.

We mark the check box for the validation.

Page 69: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 69

Stepwise Binary Logistic Regression: Level of Measurement

No

No

Ordinal level variable treated as metric?

Yes

Yes

Level of measurement ok?

Consider limitation in discussion of findings

Mark check box for level of measurement

Do not mark check box for level of measurement

Mark: Inappropriate application of the statistic

Stop

Page 70: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 70

Stepwise Binary Logistic Regression: Multicollinearity and Sample Size

No

YesMulticollinearity/Numerical Problems (S. E. > 2.0)

Stop

Yes

NoAdequate Sample Size(Number of IV’s x 20)

Consider limitation in discussion of findings

Mark check box for no multicollinearity

Do not mark check box for no multicollinearity

Mark check box for sample size

Do not mark check box for sample size

Run Stepwise Binary Logistic Regression, Assuming that it is not necessary to

remove any outliers

Page 71: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 71

Logic Diagram for Solving Homework Problems: Stepwise Relationship

1+ variables entered in model?

No

Yes

Stop (no significant predictors)

Note: model will be statistically significant if any variables entered

Do not mark check box for correct subset

Yes

Parsimonious subset of variables correctly

identified?No

Mark check box for correct subset

Page 72: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 72

Stepwise Binary Logistic Regression: Individual Relationships

Yes

Individual relationship(Wald Sig ≤ α)?

No

Mark check box for individual relationship

Correct interpretation of direction and strength of

relationship?

Yes

Do not mark check box for individual relationship

No

Additional individualRelationships to

interpret?Yes

No

For each of the variables included by the stepwise procedure.

Page 73: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 73

Stepwise Binary Logistic Regression: Classification Accuracy

Yes

Classification accuracy = or > 1.25 x by chance accuracy rate

Do not mark check box for classification accuracy

No

Mark check box for classification accuracy

Stop (the model does meet criteria for usefulness)

Page 74: Slide 1 Stepwise Binary Logistic Regression. Slide 2 Stepwise Binary Logistic Regression - 1  Stepwise binary logistic regression is very similar to.

Slide 74

Stepwise Binary Logistic Regression: Cross-validation

Create split variableusing specified seed

Select cases with no missingvalues for all variables

Run stepwise logistic regressionon training sample

Same variables entered in full model?

Yes

Do not mark check box for supporting validation

No

Shrinkage for accuracy rate < or = 2%?

Yes

Mark check box for supporting validation

No Do not mark check box for supporting validation