Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4,...

66
Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration of Regression Analysis

Transcript of Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4,...

Page 1: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 1

Illustration of Regression Analysis

This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook.

Illustration of Regression Analysis

Page 2: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 2

Stage 1: Definition Of The Research ProblemIn the first stage, we state the research problem, specify the variables to be used in the analysis, and specify the method for conducting the analysis: standard multiple regression, hierarchical regression, or stepwise regression.

Illustration of Regression Analysis

Relationship to be AnalyzedHATCO management has long been interested in more accurately predicting the level of business obtained from its customers in the attempt to provide a better basis for production controls and marketing efforts. To this end, analysts at HATCO proposed that a multiple regression analysis should be attempted to predict the product usage levels of customers based on their perceptions of HATCO's performance. In addition to finding a way to accurately predict usage levels, the researchers were also interested in identifying the factors that led to increased product usage for application in differentiated marketing campaigns. (page 196)

Specifying the Dependent and Independent VariablesThe dependent variable is Product Usage (x9).

The independent variables are Delivery Speed (x1), Price Level (x2), Price Flexibility (x3), Manufacturer's Image (x4), Service (x5), Sales force Image (x6), and Product Quality (x7). (page 196)

Method for including independent variables: standard, hierarchical, stepwise

Since this is an exploratory analysis and we are interested in identifying the best subset of predictors, we will employ a stepwise regression.

Page 3: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 3

Stage 2: Develop The Analysis Plan: Sample Size Issues

In stage 2, we examine sample size and measurement issues.

Illustration of Regression Analysis

Power to Detect Relationships: Page 165 of Text

If the significance level is set to 0.05, then with a sample size of 100, we can identify relationships that explain about 13% of the variance. (text, page 196), referencing the power table on page 165 of the text.

Missing data analysis

If the significance level is set to 0.05, then with a sample size of 100, we can identify relationships that explain about 13% of the variance. (text, page 196), referencing the power table on page 165 of the text.

Minimum Sample Size Requirement: 15-20 Cases Per Independent Variable

With 100 cases in the sample and 7 independent variables, we are very close to satisfying the 15 case per independent variable requirement.

Page 4: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 4

Stage 2: Develop The Analysis Plan: Measurement Issues

In stage 2, we examine sample size and measurement issues.

Illustration of Regression Analysis

Incorporating Nonmetric Data with Dummy Variables

All of the variables are metric, so no dummy coding is required.

Representing Curvilinear Effects with Polynomials

We do not have any evidence of curvilinear effects at this point in the analysis.

Representing Interaction or Moderator Effects

We do not have any evidence at this point in the analysis that we should add interaction or moderator variables.

Page 5: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 5

Stage 3: Evaluate Underlying Assumptions

In this stage, we verify that all of the independent variables are metric or dummy-coded, and test for normality of the metric variables, linearity of the relationships between the dependent and the independent variables, and test for homogeneity of variance for the nonmetric independent variables.

Illustration of Regression Analysis

Page 6: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 6

Normality of metric variables

The null hypothesis in the K-S Lilliefors test of normality is that the data for the variable is normally distributed. The desirable outcome for this test is to fail to reject the null hypothesis.  When the probability value in the Sig. column is less than 0.05, we conclude that the variable is not normally distributed. 

If a variable is not normally distributed, we can try three transformations (logarithmic, square root, and inverse) to see if we can induce the distribution of cases to fit a normal distribution.  If one or more of the transformations induces normality, we have the option of substituting the transformed variable in the analysis to see if it improves the strength of the relationship.

To reduce the tedium of the sequence of tests and computations that are required by an analysis of normality, I have produced an SPSS script that produces the tests and creates transformed variables where necessary.  The use of this script is detailed below.

The results of the tests of normality indicate that the following variables are normally distributed: X1  'Delivery Speed', X5  'Service', and X9  'Usage Level' (the dependent variable). 

X2 'Price Level' is induced to normality by a log and a square root transformation. X7 'Product Quality' is induced to normality by a log and a square root transformation. The other non-normal variables are not improved by a transformation. 

Note that this finding does not agree with the text, which finds that X2 'Price Level', X4 'Manufacturer Image', and X6 'Salesforce Image' are correctable with a log transformation. I have no explanation for the discrepancy.

Illustration of Regression Analysis

Page 7: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 7

Run the 'NormalityAssumptionAndTransformations' Script

First, select the 'Run Script...' command from the Utilities menu.

Second, navigate to the SW388R7 folder where we downloaded the script files.

Third, highlight the "NormalityAssumptionAndTransformation.SBS" script.

Fourth, click on the Run button.

Illustration of Regression Analysis

Page 8: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 8

Complete the 'Test for Assumption of Normality' Dialog Box

Illustration of Regression Analysis

First, move the variables that will be used in the regression to the 'Variable to Test:' list box.

Second, click on the OK button to produce the output for the normality tests.

Page 9: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 9

Normal Q-Q Plot of Delivery Speed

Observed Value

76543210-1

Exp

ecte

d N

orm

al

3

2

1

0

-1

-2

-3

Delivery Speed

6.00

5.50

5.00

4.50

4.00

3.50

3.00

2.50

2.00

1.50

1.00

.50

0.00

Histogram

Fre

quen

cy

16

14

12

10

8

6

4

2

0

Std. Dev = 1.32

Mean = 3.52

N = 100.00

Tests of Normality

.063 100 .200* .985 100 .341Delivery SpeedStatistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova

Shapiro-Wilk

This is a lower bound of the true significance.*.

Lilliefors Significance Correctiona.

Output for a Variable that Passes the Test of Normality

If a variable passes the test of normality, the Sig value for the K-S Lilliefors test will be greater than the 0.05 alpha level.  The table of output will be followed by the histogram and normality plot.

Illustration of Regression Analysis

Page 10: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 10

Normal Q-Q Plot of Price Level

Observed Value

6543210-1

Exp

ecte

d N

orm

al

3

2

1

0

-1

-2

-3

Price Level

5.50

5.00

4.50

4.00

3.50

3.00

2.50

2.00

1.50

1.00

.50

0.00

Histogram

Fre

quen

cy

30

20

10

0

Std. Dev = 1.20

Mean = 2.36

N = 100.00

Tests of Normality

.095 100 .028 .969 100 .017Price LevelStatistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova

Shapiro-Wilk

Lilliefors Significance Correctiona.

Output for a Variable that Fails the Test of Normality

If a variable has a Sig value of less than 0.05 for the K-S Lilliefors test, we conclude that the distribution of the variable is not normal.  After the histogram and normality plot for this variable, the script will calculate the three transformations to correct for problems of normality, as shown below.

Illustration of Regression Analysis

Page 11: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 11

Output for a Variable that Fails the Test of Normality

Illustration of Regression Analysis

The test of normality for the transformed variables is shown in a table following the output for the original form of the variable. Each row of the table contains one of the possible transformations. The formula used to compute the transformation in shown in square brackets at the end of the row label. If the distribution is negatively skewed, the reflection formulas for the transformations will be shown and used in the computations.

In this example, two transformations induce normality, the log transformation and the square root transformation. Since log transformations are more commonly used, we might prefer that option in subsequent analyses.

Tests of Normality

.083 100 .088 .946 100 .000

.062 100 .200* .990 100 .636

.241 100 .000 .567 100 .000

Logarithm of X2[LG10(X2)]

Square Root of X2[SQRT(X2)]

Inverse of X2 [1/(X2)]

Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova

Shapiro-Wilk

This is a lower bound of the true significance.*.

Lilliefors Significance Correctiona.

Page 12: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 12

Linearity between metric independent variables and dependent variable

Another script, 'LinearityAssumptionAndTransformations' tests for linearity of relationships between the dependent variable and each of the independent variables.  Since there is no simple score that indicates whether or not a relationship is linear or nonlinear, a scatterplot matrix is created for the dependent variable, the independent variable, and transformations of the independent variable.  The user can visually inspect the scatterplot matrix for evidence of nonlinearity.  If nonlinearity is evident, but not corrected by a transformation of the independent variable, transformation of the dependent variable is available.  More detailed information is available by requesting a correlation matrix for the variables included in the scatterplot matrix.  If the scatterplot matrix does not provide sufficient detail, individual scatterplots overlaid with fit lines can be requested.

When we run the script as described below, we do not find any nonlinear relationships between the dependent variables and the metric independent variables.

Illustration of Regression Analysis

Page 13: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 13

Run the 'LinearityAssumptionAndTransformations' Script

First, select the 'Run Script...' command from the Utilities menu.

Second, navigate to the SW388R7 folder where we downloaded the script files.

Third, highlight the "LinearityAssumptionAndTransformations.SBS" script.

Fourth, click on the Run button.

Illustration of Regression Analysis

Page 14: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 14

Complete the 'Check for Linear Relationships' Dialog Box

Illustration of Regression Analysis

First, move the variable that will be used for the dependent variable in the regression to the 'Dependent (Y) Variable:' list box.

Second, move the variables that will be used as independent variables in the regression to the 'Independent (X) Variables: list box.

Third, click on the OK button to produce the output for the linearity tests.

Page 15: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 15

The Scatterplot Matrix

Each scatterplot matrix examines the linearity and impact of transformations for the dependent variable and one of the independent variables.  Focusing our attention on the first column with a 'Y' in cell 1, we see the scatterplot for the independent variable and all of its transformations.  In this scatterplot, we see in the second row of the first column that the relationship between X9 'Usage Level' and 'X1 Delivery Speed' is linear, so we ignore the remaining scatterplots in column 1.  If the relationship were nonlinear, we would examine the scatterplots in column 1 to visually determine is one of them induced a linear relationship between the variables.

Illustration of Regression Analysis

Page 16: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 16

The Correlation Matrix

In the correlation matrix shown below the scatterplot matrix, we note that the correlation between the dependent and independent variable is strong (0.676) and that none of the correlations for the transformed versions of the independent variable offer a large increase in correlation over that for the original form of the variables.  The correlation matrix confirms the linearity of the relationship.

Illustration of Regression Analysis

Correlations

1 .676** -.684** .682** -.685** -.648**

. .000 .000 .000 .000 .000

100 100 100 100 100 100

.676** 1 -.972** .975** -.993** -.886**

.000 . .000 .000 .000 .000

100 100 100 100 100 100

-.684** -.972** 1 -.997** .993** .968**

.000 .000 . .000 .000 .000

100 100 100 100 100 100

.682** .975** -.997** 1 -.993** -.952**

.000 .000 .000 . .000 .000

100 100 100 100 100 100

-.685** -.993** .993** -.993** 1 .931**

.000 .000 .000 .000 . .000

100 100 100 100 100 100

-.648** -.886** .968** -.952** .931** 1

.000 .000 .000 .000 .000 .

100 100 100 100 100 100

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Pearson Correlation

Sig. (2-tailed)

N

Usage Level

Delivery Speed

Logarithm of X1 [LG10(7.1-X1)]

Square of X1 [(X1)**2]

Square Root of X1[SQRT( 7.1-X1)]

Inverse of X1 [-1/( 7.1-X1)]

Usage LevelDeliverySpeed

Logarithm ofX1 [LG10(7.1-X1)]

Square of X1[(X1)**2]

Square Rootof X1 [SQRT(

7.1-X1)]Inverse of X1[-1/( 7.1-X1)]

Correlation is significant at the 0.01 level (2-tailed).**.

Page 17: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 17

Output Demonstrating Nonlinearity

None of the relationships in the HATCO data set demonstrate nonlinear relationships, so we cannot demonstrate what output looks like when nonlinearity is detected.

The following example uses the World95.Sav data set and does demonstrate a nonlinear relationship between population increase and fertility.  Visual inspection shows a nonlinear relationship in the second row the first column, which is corrected by the log transformation in row 3 and the inverse transformation in row 6.

Illustration of Regression Analysis

Page 18: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 18

Output Demonstrating Nonlinearity

The correlation matrix, shown below, indicates first of all that there is a very strong relationship between population increase and fertility, as we might expect from the nature of the variables.  The strength of this relationship (0.840) is enhanced by both the log transformation (0.884) and the inverse transformation (0.886).  The analysis would be strengthened by the use of the transformations, though a case could be made that the relationship is strongly linear without the transformation.

Illustration of Regression Analysis

Page 19: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 19

Constant variance across categories of nonmetric independent variables

While this particular analysis does not include any nonmetric variables, the nonmetric variable X8 'Firm Size' will be used in a subsequent analysis, so we will do the test for homogeneity of variance here.

Another script, 'HomoscedasticityAssumptionAndTransformations' test for homogeneity of variance across groups designated by nonmetric independent variables.  The script uses the One-Way ANOVA procedure to produce a Levene test of the homogeneity of variance.

The null hypothesis of the Levene test is that the variance of all groups of the independent variable are equal.  If the Sig value associated with the test is greater than the alpha level, we fail to reject the null hypothesis and conclude that the variance for all groups is equivalent. 

If the Sig value associated with the test is less than the alpha level, we reject the null hypothesis and conclude that the variance of at least one group is different.  If we fail the homogeneity of variance test, we can attempt to correct the problem by applying the transformations for normality to the dependent variable specified in the test.  The script computes the transformations and applies the Levene test to the transformed variables.  If one of the transformed variables corrects the problem, we can consider substituting it for the original form of the variable.

When we run the script for this problem, as described below, we find that the nonmetric X8 'Firm Size' variable passes the homogeneity of variance test so no transformation is required.

Illustration of Regression Analysis

Page 20: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 20

Run the 'HomoscedasticityAssumptionAndTransformations' Script

First, select the 'Run Script...' command from the Utilities menu.

Second, navigate to the SW388R7 folder where we downloaded the script files.

Third, highlight the "HomoscedasticityAssumptionAndTransformations.SBS" script.

Fourth, click on the Run button.

Illustration of Regression Analysis

Page 21: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 21

Complete the Dialog Box

Illustration of Regression Analysis

First, move the variable that will be used for the dependent variable in the regression to the 'Dependent (Y) Variable:' list box.

Second, move the variables that will be used as independent variables in the regression to the 'Nonmetric Independent (X) Variables: list box.

Third, click on the OK button to produce the output for the linearity tests.

Page 22: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 22

Output for the Test of Homogeneity of Variances

The Sig value in the Test of Homogeneity of Variances test guides our conclusion about meeting this assumption.  For the nonmetric variable X8 'Firm Size', the homogeneity of variance assumption is met when the X9 'Usage Level' is the dependent variable.

Illustration of Regression Analysis

Page 23: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 23

The output shown below is from the 'GSS93 Subset.Sav' data set.  It demonstrates what the script does when the relationship fails the Levene test.

In this example, males and females did not have the same variance for the variable on spanking as a form of discipline, as indicated by the Sig value for the Levene Statistic being below 0.05. 

A log transformation of the spanking variable does correct the heterogeneity of variance problem, as shown in the second Levene test, which has a non-significant SIG value greater than 0.05.

Output Failing to Pass the Homogeneity of Variance Test

Illustration of Regression Analysis

Page 24: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 24

Stage 4: Compute the Statistics And Test Model Fit: Computations

In this stage, we compute the actual statistics to be used in the analysis. Regression requires that we specify a variable selection method. The text specifies a stepwise regression procedure, which is appropriate to exploratory analyses where we are uncertain which variables are important predictors and we want the analysis to select the best subset of predictors.

The objective of the stepwise regression procedure is to find the smallest subset of independent variables that have the strongest R Square relationship to the dependent variable. At each step, the independent variable that will be added is the one that contributes the largest, statistically significant increase to the R square measure of relationship between the dependent and the independent variables. At the conclusion of each step, the variables already entered into the equation are tested to see if the overall relationship would be stronger if one or more variables were removed. The significance levels for adding a variable are specified as PIN, the probability for inclusion and POUT, the criterion probability for taking a variable out of the regression equation. We will accept the SPSS default of 0.05 for PIN and 0.10 for POUT.

The stepping procedure continues until none of the variables not included on a previous step have a significance for the increase in R square value that is less than the PIN value, or until the remaining variables have a tolerance less than that specified, or until all independent variables have been included.

Illustration of Regression Analysis

Page 25: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 25

Stage 4: Compute the Statistics And Test Model Fit: Computations

When SPSS is evaluating independent variables for possible inclusion in the regression equation, it also tests for multicollinearity among the independent variables. Multicollinearity exists when two or more independent variables are very highly intercorrelated, i.e, 0.90 or higher. Where there is high multicollinearity, it means that two independent variables are attempting to explain the same variance in the dependent variable. Extreme mutlicollinearity causes the mathematical computations to fail, producing incorrect answers.

SPSS computes the tolerance for each variable as its test for multicollinearity. If the computed tolerance is less than the amount specified in the TOL option, the variable is excluded from consideration. The author's note that a TOL specification of 0.10 will eliminate variables correlated at the 0.95 level or higher (page 193).  The author's also suggest that we set the tolerance higher than the SPSS default of 0.0001, but I cannot find a place in the menu commands where this specification can be made.  If it is necessary to set the tolerance, the regression will have to be run using syntax commands.

Illustration of Regression Analysis

Page 26: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 26

Request the Regression Analysis

Choose 'Regression | Linear...' from the Analyze menu.

Illustration of Regression Analysis

The first task in this stage is to request the initial regression model and all of the statistical output we require for the analysis.

Page 27: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 27

Specify the Dependent and Independent Variables and the Variable Selection Method

First, in the Linear Regression dialog box, move the variable X9 'Usage Level' to the 'Dependent: ' text box.

Second, move the metric variables X1 'Delivery Speed', X2 'Price Level', X3 'Price Flexibility', X4 'Manufacturer Image', X5 'Service', X6 'Salesforce Image', X7 'Product Quality' to the 'Independent(s): ' list box.

Third, select 'Stepwise' from the 'Method: ' drop down menu.

Illustration of Regression Analysis

Page 28: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 28

Specify the Statistics Options

First, click onthe 'Statistics...'button to accessthe 'LinearRegression:Statistics' dialogbox.

Second, mark the 'Model fit' checkbox, the 'R squaredchange' checkbox, the 'Descriptives' checkbox, andthe 'Collinearity diagnostics' checkbox.

Third, on the 'Residuals' panel, mark the'Durbin-Watson' checkbox, the 'Casewisediagnostics' checkbox, the 'Outliers'option button, and fill in a '3' in the'Outliers outside ... standard deviations'text box.

Fourth, click on the'Continue' button toclose the dialog box.

Illustration of Regression Analysis

Page 29: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 29

Specify the Plots to Include in the Output

First, clickon the'Plots...'button toaccess thedialog box.

Second, move the'*SRESID' variable nameto the 'Y: ' text box for theresidual scatter plot.

Third, move the '*ZPRED'variable name to the 'X: 'text box for the residualscatter plot.

Fourth, mark the'Normal probabilityplot' checkbox on the'Standardized ResidualPlots' panel.

Fifth, mark the'Produce all partialplots' checkbox.

Sixth,click ontheContinuebutton.

Illustration of Regression Analysis

Page 30: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 30

Specify Diagnostic Statistics to Save to the Data Set

First, click on theSave...' button toaccess the dialog box.

Second, mark the'Mahalanobis' and 'Cook's'checkboxes on the'Distances' panel.

Third, click onthe Continuebutton.

Illustration of Regression Analysis

Page 31: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 31

Complete the Regression Analysis Request

Click on the OK button tocomplete the request for theregression analysis.

Illustration of Regression Analysis

Page 32: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 32

Stage 4: Compute the Statistics And Test Model Fit: Model FitIn this stage, we examine the relationships between our independent variables and the dependent variable. 

First, we look at the F test for R Square which represents the relationship between the dependent variable and the set of independent variables.  This analysis tests the hypothesis that there is no relationship between the dependent variable and the set of independent variables, i.e. the null hypothesis is: R2 = 0.  If we cannot reject this null hypothesis, then our analysis is concluded; there is no relationship between the dependent variable and the independent variables that we can interpret.

If we reject the null hypothesis and conclude that there is a relationship between the dependent variable and the set of independent variables, then we examine the table of coefficients to identify which independent variables have a statistically significant individual relationship with the dependent variable.  For each independent variable in the analysis, a t-test is computed that the slope of the regression line (B) between the independent variable and the dependent variable is not zero.  The null hypothesis is that the slope is zero, i.e. B = 0, implying that the independent variable has no impact or relationship on scores on the dependent variable.  This part of the analysis is more important in standard multiple regression where we enter all independent variables into the regression at one time, and hierarchical multiple regression where we specify the order of entry of independent variables than it is in stepwise multiple regression where the computer picks the order of entry and stops adding variables when some statistical limit is reached.  In stepwise regression, we would expect all of the individual variables that passed the statistical entry for entry to have a significant individual relationship with the dependent variable.

When we are determining which independent variables have a significant relationship with the dependent variable, we often are interested in the question of the relative importance of the predictor variables to predicting the dependent variable.  To answer this question, we will examine the Beta coefficients, or standardized version of the coefficients of the individual independent variables.

Illustration of Regression Analysis

Page 33: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 33

Significance Test of the Coefficient of Determination R Square

Footnote C to the Model Summary table tells us that the regression model contained three independent variables when the stepwise criteria were satisfied:  X5 Service, X3 Price Flexibility, and X6 Salesforce Image.

The R² value for the three variable model is 0.768, which means that the three independent variables in the regression collectively explain 76.8% of the variance in the dependent variable. Using the descriptive framework described previously, we would describe this relationship as very strong.

The Model Summary table also indicates that the increase in predictive power or accuracy at each step of the model's three steps was a statistically significant addition. The 'Sig. F Change' provides the probability that the increase in R² is greater than 0.

Illustration of Regression Analysis

Page 34: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 34

Significance Test of the Coefficient of Determination R Square

The 'Sig.' column of the ANOVA table for model 3 supports a conclusion that there is a statistically significant relationship between the dependent variable and the set of independent variables.

Illustration of Regression Analysis

Page 35: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 35

Significance Test of Individual Regression Coefficients

In addition to the issue of the statistical significance of the overall relationship between the dependent variable and the independent variables, SPSS provides us with the statistical tests of whether or not each of the individual regression coefficients are significantly different from 0, as shown in the table below:

The t test for the B coefficient is a test of relationship between the dependent variable and a specific independent variable. The null hypothesis for this test is B=0, the slope of the regression line for the two variables in zero, making it flat or parallel to the x-axis. A flat line implies that for any value of x, we would predict the same value for y. Therefore, knowledge of x does not improve our ability to predict y, and there is no relationship.

We also use the B coefficients to define the direction of the relationship between the independent and dependent variable.  If the sign of the coefficient is positive (usually not shown the table), the relationship between the variables is direct; scores on the two variable change in the same direction.  If the sign of the coefficient is negative, the relationship is inverse, meaning that increases in one variable correspond to decreases in the other variable.  Another way to interpret the B coefficient is that it represents the amount of change in the dependent variable for a 1unit change in the independent variable.  For example, it might tell us that starting salaries increase by $2,114 for each additional year of education.

It can be problematic to compare B coefficients because they are often measured in units of different magnitudes. For example, a problem may involve GPA's measured from 0 to 4.0 and GRE scores measured from 400 to 1600. Computing standard scores for both reduces them to a common metric that supports direct comparison of their values. Beta is the standard score equivalent of the B coefficient, used to compare the relative importance of different variables toward the prediction of the dependent variable.  We will discuss Beta coefficients in greater detail in stage 5 when we examine the importance of the predictors to use in our findings.Illustration of Regression Analysis

Page 36: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 36

Stage 4: Compute the Statistics And Test Model Fit: Meeting Assumptions

Using output from the regression analysis to examine the conformity of the regression analysis to the regression assumptions is often referred to as "Residual Analysis" because if focuses on the component of the variance which our regression model cannot explain.  Using the regression equation, we can estimate the value of the dependent variable for each case in our sample.  This estimate will differ from the actual score for each case by an amount referred to as the residual.  Residuals are a measure of unexplained variance or error that remains in the dependent variable that cannot be explained or predicted by the regression equation.

Illustration of Regression Analysis

Page 37: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 37

Linearity and Constant Variance for the Dependent Variable:Residual Plot

When we specified plots for regression output, we made a specific request for a plot of studentized residuals versus standardized predicted values. This plot is often referred to as the 'Residual Plot.' It is used to evaluate whether or not the derived regression model violates the assumption of linearity and constant variance in the dependence variable.

If the plot shows a curvilinear pattern, it indicates a violation of the linearity assumption. If the spread of the residuals varies substantially across the range of the predicted values, it indicates a violation of the constant variance assumption. To correct for these violations of assumptions, we would employ transformations.If we do not see a pattern of nonlinearity or restricted spread to the residuals, the residual plot is termed a "Null Plot" to indicate that it does not show any violations of assumptions. The author's interpretation of the residual plot for this analysis is that the residuals fall in a generally random pattern similar to the null plot shown on page 174.

Illustration of Regression Analysis

Page 38: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 38

Normal Distribution of Residuals:Normality Plot of Residuals

Illustration of Regression Analysis

To check for meeting the assumption that the residuals or error terms are normally distributed, we look at the Normal p-p plot of Regression Standardized Residual as shown below:

Our criteria for normal distribution is the degree to which the plot for the actual values coincides with the green line of expected values. For this problem, the plot of residuals fits the expected pattern well enough to support a conclusion that the residuals are normally distributed. If a more exact computation is desired, we instruct SPSS to save the residuals in our data file and do a test of normality on the residual values using the Explore command.

Page 39: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 39

Linearity of Independent Variables:Partial Plots

A partial regression plot is a scatterplot of the partial correlation of each independent with the dependent variable after removing the linear effects of the other independent variables in the model. The values plotted on this chart are two sets of residuals. The residuals from regressing the dependent variable on the other independent variables are plotted on the vertical axis. The residuals from regressing the particular predictor variable on all other independent variables are plotted on the horizontal axis.

The partial regression, thus, shows the relationship between the dependent variable and a specific independent variable. We examine each plot to see if it shows a linear or nonlinear pattern. If the specific independent variable shows a linear relationship to the dependent variable, it meets the linearity assumption of multiple regression. If there is an obvious nonlinear pattern, we should consider a transformation of either the dependent or independent variable.

I like to add a total fit line to the scatterplot to make it easier to interpret. We added the fit line to scatterplots in previous examples when we examined scatterplots for linearity.

The partial regression plots for the three independent variables in the analysis are shown below. None of the plots demonstrates an obvious nonlinear pattern.

Illustration of Regression Analysis

Page 40: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 40

Linearity of Independent Variables:Partial Plots

Illustration of Regression Analysis

Page 41: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 41

Independence of Residuals:Durbin-Watson Statistic

Illustration of Regression Analysis

The next assumption is that the residuals are not correlated serially from one observation to the next. This means the size of the residual for one case has no impact on the size of the residual for the next case. While this is particularly a problem for time-series data, SPSS provides a simple statistical measure for serial correlation for all regression problems. The Durbin-Watson Statistic is used to test for the presence of serial correlation among the residuals. Unfortunately, SPSS does not print the probability for accepting or rejecting the presence of serial correlation, though probability tables for the statistic are available in other texts.

The value of the Durbin-Watson statistic ranges from 0 to 4. As a general rule of thumb, the residuals are uncorrelated is the Durbin-Watson statistic is approximately 2. A value close to 0 indicates strong positive correlation, while a value of 4 indicates strong negative correlation.

For our problem, the value of Durbin-Watson is 1.910, approximately equal to 2, indicating no serial correlation.

Page 42: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 42

Identifying Dependent Variable Outliers:Casewise Plot of Standardized Residuals

Illustration of Regression Analysis

As shown in the following table of Residual Statistics, all standardized residuals (Std. Residual) fell within +/- 3 standard deviations (actually, between -2.9 and 1.8). We do not have cases where the value of the dependent variable indicates an outlier.

Page 43: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 43

Identifying Independent Variable Outliers - Mahalanobis Distance

While SPSS will save the Mahalanobis Distance score for each case to the data set, we must specifically request the probability to identify the outliers. 

Illustration of Regression Analysis

Page 44: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 44

Identifying Independent Variable Outliers:Mahalanobis Distance

First, we select the 'Compute...' command from the Transform menu.

Second, we type the name of the probability variable 'p_mahal' into the 'Target Variable: ' text box.

Third, we type the formula '1 - CDF.CHISQ(mah_1, 3)' into the 'Numeric Expression' text box. We use 3 degrees of freedom because that is the number of variables included in the stepwise regression.

Fourth, we click on the OK button to close the 'Compute Variable' dialog box.

Illustration of Regression Analysis

While SPSS will save the Mahalanobis Distance score for each case to the data set, we must specifically request the probability to identify the outliers. 

Page 45: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 45

Identifying Statistically Significant Mahalanobis Distance Scores

First, we select the 'Descriptive Statistics | Explore...' command from the Analyze menu.

Second, we move the 'p_mahal' variable to the 'Dependent List: ' list box.

Third, we move the 'ID (id)' variable to the 'Label Cases by: ' text box.

Fourth, we click on the 'Statistics' option in the Display panel.

Fifth, we click on the 'Statistics...' button to specify statistics.

Sixth, we mark the 'Outliers' check box to get a list of largest and smallest values.

Seventh, we click on the 'Continue' box to close the 'Explore: Statistics' dialog.

Eighth, we click on the OK button to close the 'Explore' dialog.

Illustration of Regression Analysis

Page 46: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 46

Identifying Potential Outliers

The cases with a probability of Mahalanobis Distance smaller than 0.05 are shown in the half of the list titled 'Lowest.'  As we can see, cases 96, 82, 42, and 5 are outliers by this criteria.

Illustration of Regression Analysis

Page 47: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 47

Identifying Influential Cases - Cook's Distance

In addition to the request for Mahalanobis distance score, we also requests that Cook's distance scores be saved to the data editor.

Cook's distance identifies cases that are influential or have a large effect on the regression solution and may be distorting the solution for the remaining cases in the analysis.  While we cannot associate a probability with Cook's distance, we can identify problematic cases that have a score larger than the criteria computed using the formula: 4/(n - k - 1), where n is the number of cases in the analysis and k is the number of independent variables.  For this problem which has 100 subjects and 3 independent variables, the formula equates to: 4 / (100 - 3 - 1) = 0.042. 

To identify the influential cases with large Cook's distances, we sort the data set by the Cook's distance variable, 'coo_1' that SPSS created in the data set.

Illustration of Regression Analysis

Page 48: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 48

Sorting Cook's Distance Scores in Descending Order

First, we select the 'Sort Cases...' command from the Data menu.

Second, we move the 'Cook's Distance (coo_1)' variable to the 'Sort by: ' text box.

Third, mark the 'Descending' option in the 'Sort Order' panel.

Fourth, click on the OK button to commence sorting.

Illustration of Regression Analysis

Page 49: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 49

Cases with Large Cook's Distances

When the data set is sorted we see that there are four cases with Cook's distances above 0.042 in the first four rows of the data set. These four cases have id numbers 7, 100, 14, and 11. When we have completed this analysis, we will re-run the analysis without these four influential cases in the practice problems to see what impact they are having on the regression solution.

Illustration of Regression Analysis

Page 50: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 50

Stage 5: Interpret The Findings - Regression Coefficients

Interpreting the regression coefficients enables us to make statements about the direction of the relationship between the dependent variable and each of the independent variables, the size of the contribution of each independent variable to the dependent variable, and the relative importance of the independent variables as predictors of the dependent variable.

Illustration of Regression Analysis

Page 51: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 51

To determine the direction of the relationship between each independent variable and the dependent variable, we look at the sign of the B coefficient.  For all three of the independent variables, the sign of the B coefficient is positive or direct, meaning that as the magnitude of these independent variables increases, the magnitude of the dependent variable also increases.  Had the sign of any of these coefficients been negative, we would describe the relationship with the dependent variable as inverse, meaning that the score on the dependent variable would go down as the score on the independent variable went up.

In addition to interpreting the sign of the B coefficient, we can interpret the quantity of the coefficient which represents the amount of change in the dependent variable associated with a one unit change in the independent variable. For example, we would expect Usage Level to increase by 7.621 units for every unit change in the Service variable. 

Finally, we use the regression coefficients to state the regression equation.  For this analysis, the regression equation is:

Usage Level = -6.520 + 7.621 x Service + 3.376 x Price Flexibility + 1.406 x Sales Force Image

Direction of relationship and contribution to dependent variable

Illustration of Regression Analysis

Page 52: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 52

Importance of Predictors

When we are examining variables with different measurement scales, we compute standard scores for the variables before making our comparisons.  Standardizing the variables converts all variables to a standard metric which can be used for comparisons.

Regression output contains a column of standardized coefficients, call Beta coefficients, which can be used to judge the relative importance of the variables to the dependent variable.  The variable with the largest impact on the dependent variable is the one with the largest Beta, in this example, the Service variable with a Beta of 0.637.  The second most important variable was Price Flexibility with a Beta of 0.521.  The least important predictor of these three variables is Salesforce Image with a Beta of 0.121.

The order of importance based on Beta coefficients is the same order determined by the stepwise regression, as we would expect.Illustration of Regression Analysis

In stepwise regression the importance of the predictors can be derived from the steps in which the variables entered the regression equation, because the criteria for selecting the next variable to include is based on its increase in the strength of the overall relationship between the dependent and the independent variables.

If we did not use stepwise regression, we could look to the B coefficients to indicate the relative impact of each variable on the dependent variable, since the B coefficients represent the amount of change associated with a one-unit change in each of the independent variables.  This does not work, however, because the measurement units differ for the various independent variables.

Page 53: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 53

Stage 5: Interpret The Findings - Impact of Multicollinearity

Multicollinearity is a concern in our interpretation of the finding because it could lead us to mistakenly conclude that there was not a relationship between the dependent variable and one of the independent variables because a strong relationship between the independent variable and another independent variables in the analysis prevented the independent variable from demonstrating its relationship to the dependent variable.

SPSS supports the detection of this problem by providing Tolerance statistics and the Variance Inflation Factor or VIF statistic, which is the inverse of tolerance.  To detect problems of multicolllinearity, we look for tolerance values less than 0.10 or VIF greater than 10 (1/0.10=10).

Illustration of Regression Analysis

Page 54: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 54

Tolerance or VIF statistics

In a stepwise regression problem, SPSS will prevent a colinear variable from entering the solution by  checking the tolerance of the variable at each step.  To identify problems with multicollinearity, we check the tolerance or VIF for variables not included in the analysis after the last step.

None of the variables which were not included in the analysis have a tolerance less than 0.10, so there is no indication that a variable was excluded from the regression equation because of a very strong relationship with one of the variables included in the analysis.

Illustration of Regression Analysis

Page 55: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 55

Stage 6: Validate The Model

If we wish to generalize our findings to populations beyond our sample, we need to aggregate evidence that our regression results are not limited to our sample. Since we do not usually have the resources available to replicate and validate our study, we employ statistical procedures to assure ourselves that we do not have a solution that fits our data sample but unlikely to generalize.

Illustration of Regression Analysis

Page 56: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 56

Interpreting Adjusted R Square

Our first indicator of generalizability is the adjusted R Square value, which is adjusted for the number of variables included in the regression equation.  This is used to estimate the expected shrinkage in R Square that would not generalize to the population because our solution is over-fitted to the data set by including too many independent variables.

If the adjusted R Square value is much lower than the R Square value, it is an indication that our regression equation may be over-fitted to the sample, and of limited generalizability.

For the problem we are analyzing, R Square = .768 and the Adjusted R Square =.761. These values are very close, anticipating minimal shrinkage based on this indicator.

Illustration of Regression Analysis

Page 57: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 57

Split-Sample Validation

A more elaborate strategy to validate our regression requires us to randomly divide our sample into two groups, a screening sample and a validation sample. The regression is computed for the screening sample and used to predict the values of the dependent variable in the validation sample. SPSS provides us with Multiple R statistics for both the screening and the validation sample. If the Multiple R value for the validation sample is close to the value for the screening sample, the model is validated. In the double cross- validation strategy, we reverse the designation of the screening and validation sample and re-run the analysis.

We can then compare the regression equations derived for both samples.  If the two regression equations contain a very different set of variables, it indicates that the variables might have achieved significance because of the sample size and not because of the strength of the relationship. Our findings about these individual variables would be that the predictive utility of these variables is not generalizable.

Illustration of Regression Analysis

Page 58: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 58

Set the Starting Point for Random Number Generation

First, select the 'Random Number Seed...' command from the 'Transform' menu.

Second, click on the 'Set seed to: ' option to access the text box for the seed number.

Third, type '34567' in the 'Set seed to: ' text box. (This is the same random number seed specified by the authors.)

Fourth, click on the OK button to complete this action.

Illustration of Regression Analysis

Page 59: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 59

Compute the Variable to Randomly Split the Sample into Two Halves

First, select the 'Compute...' command from the Transform menu.

Second, create a new variable named 'split' that has the values 1 and 0 to divide the sample into two part. Type the name 'split' into the 'Target Variable: ' text box.

Third, type the formula 'uniform(1) > 0.52' in the 'Numeric Expression: ' text box. The uniform function will generate a random number between 0.0 and 1.0 for each case. If the generated random number is greater than 0.52, the numeric expression will result in a 1, since the numeric expression is true. If the generated random number is 0.52 or less, the numeric expression will produce a 0, since its value is false. In many computer programs, true is represented by the number 1 and false is represented by a 0. Fourth, we click on the

OK button to compute the split variable.

Illustration of Regression Analysis

Page 60: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 60

Specify the Cases to Include in the First Screening Sample

First, select 'Linear Regression' from the 'Dialog Recall' drop down menu.

Second, highlight the 'split' variable and click on the move button to put it into the 'Selection Variable: ' text box.

Illustration of Regression Analysis

Page 61: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 61

Define the Case Selection Rule

First, after 'split=?' appears in the 'Selection Variable: ' text box, click on the 'Rule..' button to specify which cases to include in the screening sample.

Second, accept the default relationship of 'equal to' in the drop down menu and type a '0' in the 'Value: ' text box.

Third, click on the 'Continue' button to complete setting the rule.

Illustration of Regression Analysis

Page 62: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 62

Complete the Regression Analysis Request for the First Screening Sample

Click on the OK button in the Linear Regressiondialog to compute the regression for the firsthalf of the sample.

Illustration of Regression Analysis

Page 63: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 63

Specify the Cases to Include in the Second Screening Sample

First, select 'Linear Regression' from the 'Dialog Recall' drop down menu.

Second, highlight 'split=0' in the 'Selection Variable: ' text box and click on the 'Rule...' button.

Third, replace the 0 in the 'Value: ' text box with a 1.

Fourth, click on the 'Continue' button to close the 'Set Rule' dialog box.

Illustration of Regression Analysis

Page 64: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 64

Complete the Regression Analysis Request for the Second Screening Sample

Click on the OK button in the Linear Regression dialog to compute the regression for the second half of the sample.

Illustration of Regression Analysis

Page 65: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 65

Summary Table for Validation Analysis

If we look at the Variables with Significant Coefficients, we see that the stepwise procedure for the first validation learning sample (Split = 0) included only the first two variables, Service and Price Flexibility, that were entered in the full model. The variable, Sales Force Image, the weakest of our predictors in the full model was not included in as a predictor in the first validation model.

The multiple R values for the validations samples (0.845 and 0.847) are of similar magnitude to the Multiple R for the Full Model and the Multiple R for the two learning samples.  The Adjusted R2 values for each analysis are close to the R2 values, suggesting that the two or three variables are generalizable as predictors to the larger population.  The values for the second validation analysis appear optimistic when compared to the values for the full sample and the first validation analysis.

Illustration of Regression Analysis

Full Model Split = 0 Split = 1

R for Learning Sample

0.877 0.861 0.909

R for Validation Sample

0.845 0.847

Significant Coefficients

(p < 0.05)

ServicePrice FlexibilitySalesforce Image

ServicePrice Flexibility

ServicePrice FlexibilitySalesforce Image

R2 0.768 0.741 0.826

Adjusted R2 0.761 0.730 0.814

Page 66: Slide 1 Illustration of Regression Analysis This problem is the major problem for Chapter 4, "Multiple Regression Analysis," from the textbook. Illustration.

Slide 66

Summary Table for Validation Analysis

The validation analysis produced results that were very comparable to the results for the full model analysis, with the exception of the inclusion of the variable Sales Force Image. I would conclude that this data analysis supports the existence of a strong relationship between the dependent variable Usage Level and the predictors Service and Price Flexibility. The variable Sales Force Image probably has some relationship to Usage Level in some cases, but it is not consistently evident.

I would report the statistical results for the model with the two independent variables.  To obtain the correct R2 and other statistics for this model, I would re-run the regression, specifying only Service (x5) and Price Flexibility (x3) as the independent variables.

Though this is a business type of problem, the results should make some sense to us in the sense that we often hear that consumption and vendor selection is a function of price and service. We tend to consume more of a product as the price declines, and we tend to make purchases from vendors that offer greater service.

Illustration of Regression Analysis