Regression Equation

download Regression Equation

of 56

Transcript of Regression Equation

  • 7/31/2019 Regression Equation

    1/56

  • 7/31/2019 Regression Equation

    2/56

    xbya

    xxn

    yxyxn

    S

    S

    xVar

    yxCovb

    n

    i

    n

    i ii

    n

    ii

    n

    ii

    n

    iii

    x

    xy

    1

    2

    1

    2

    111

    2)(

    ),(

    Now for any fixed value of x, each observation in our sample satisfies the relation

    iyixiy

    Wherei is a random error representing the vertical deviation of the point from the population

    regression line (parametric regression equation). From previous assumptions oni

    y , i must

    necessarily be a value of a random variable having a mean of zero and the variance2 . In terms of the

    sample regression line, we can also write:

    iii eyy An essential part of regression analysis involves the construction of confidence intervals for and and

    testing hypothesis concerning these regression coefficients. The hypothesis for testing the coefficients

    are =0 and =0. However the unknown variance 2

    must be estimated from the data. An unbiased

    estimate of 2

    with n-2 degree of freedom, denoted by2

    eS , is given by the formula:

    2

    )(

    22

    222

    n

    yy

    n

    e

    n

    SSES iiie

    In usual the sample variance formula we use to take one degree of freedom, provide an unbiased

    estimate of the population variance, since only is replaced by the sample mean in our calculations.

    Here, it is necessary to take 2 degree of freedom in the formula for2eS because 2 degree of freedom are

    lost by replacing and by a and b in our calculation of the syi ' .The simple formula for thecalculation of SSE are as follows:

    )1(

    )(

    )1(

    )(

    ))(1(

    1 1

    22

    21 1

    22

    2

    222

    nn

    yyn

    andSnn

    xxn

    S

    Where

    SbSnSSE

    n

    i

    n

    i

    ii

    y

    n

    i

    n

    i

    ii

    x

    xy

  • 7/31/2019 Regression Equation

    3/56

    Test for Linearity of Regression Equation

    OR

    Validity of the Regression Model

    We define the regression to be linear when all the means of y corresponding to each fall on a straight

    line. One can always prefer a linear regression model over non linear model. We can test the linearity of

    the regression equation by using the ANOVA test. If the linearity will be confirm than we can say that

    regression model is valid and then we develop the model.

    Calculation of ANOVA:

    Values of x

    50 55 65 70

    Values of

    corresponding to

    each X

    74.893 79.378 88.348 92.833

    74.893 79.378 88.348 92.833

    79.378 88.348 92.833

    79.378

    Sum 149.786 317.512 265.044 278.499 1010.841

    Square of sum 22435.85 100813.9 70248.32 77561.69 1021800

    Square of sum/ni 11217.92 25203.47 23416.11 25853.9 85149.96

    Regression sum of square = (11217.92+25203.47+23416.11+25853.9)-85149.96

    Regression sum of square = 541.69

    Residual sum of square = SSE = 186.557 and 2eS = 18.656

    ANOVAb

    Model Sum of Squares df Mean Square F Sig.

    1 Regression 541.693 1 541.693 29.036 .000a

    Residual 186.557 10 18.656

    Total 728.250 11

    a. Predictors: (Constant), TestScore

    b. Dependent Variable: CheScore

    Here the significance value of 0.000 < 0.05 which means that the Ho of ANOVA is significant

    means that the Ho reject.

  • 7/31/2019 Regression Equation

    4/56

    Inference: Ho = all means are equal; which is rejected means Regression line is not

    horizontal that implies that line with some slope and slope shows the correlation between the

    predictor and the estimator.

    Inferences Concerning the regression coefficients:Confidence interval for

    A (1-)100% confidence interval for the parameter in the regression line is

    )1()1(

    1

    2

    21

    2

    2

    nnS

    xSt

    annS

    xSt

    ax

    n

    iie

    x

    n

    iie

    Note that the symbol is being used here in two totally unrelated ways, first as the level ofsignificance and then as the intercept of the regression line.

    Confidence interval for

    A (1-)100% confidence interval for the parameter in the regression line is

    )1()1(

    22

    nS

    Stb

    nS

    Stb

    x

    e

    x

    e

    Note that the symbol is being used here in two totally unrelated ways, first as the level ofsignificance and then as the intercept of the regression line.

    Predictions

    The equation may be used to predict the mean responseoxy

    at x = xo, where, xo

    is not necessarily one of the pre-chosen values, or it may be used to predict a single value of

    the variable when . We would expect the error of the prediction to be higher in the

    case of a single predicted value than in the case where a mean is predicted. This, then, will

    affect the width of our confidence intervals for the values being predicted.

    Predictions foroxy

    A (1-) 100% confidence interval for the meanoxY

    is given by:

    2

    2

    22

    2

    2)1(

    )(1

    )1(

    )(1

    x

    oeoxY

    x

    oeo

    Sn

    xx

    nSty

    Sn

    xx

    nSty

    o

  • 7/31/2019 Regression Equation

    5/56

    Predictions for oy

    A (1-) 100% confidence interval for the single value oy when is given by:

    2

    2

    22

    2

    2)1(

    )(11)1(

    )(11x

    oeoo

    x

    oeoSnxx

    nStyy

    Snxx

    nSty

    What is the necessary condition for Simple linear regression?

    The necessary condition for simple linear regression is that the test must be run between two

    scale variable.

    The variable must be correlated with each other.

    How to run the test?

    For the understanding we take the example from the book wall pole page no. 347.

    In this example the two variables are IQ test score and Chemistry test score. Both are scale

    measurement and theoretically they have correlation with each other.

  • 7/31/2019 Regression Equation

    6/56

  • 7/31/2019 Regression Equation

    7/56

  • 7/31/2019 Regression Equation

    8/56

    Interpretation of output:

    Descriptive Statistics

    Mean Std. Deviation N

    Chemistry test score 84.2500 8.13662 12

    IQ test score 60.4167 7.82140 12

    The descriptive of the variables

    Correlations

    Chemistry test

    score IQ test score

    Pearson Correlation Chemistry test score 1.000 .862

    IQ test score .862 1.000

    Sig. (1-tailed) Chemistry test score . .000

    IQ test score .000 .

    N Chemistry test score 12 12

    IQ test score 12 12

    The independent and dependent variable having correlation with each other therefore test can

    be run.The significant value 0.000 which means that the test is significant. The hypothesis for the test

    is there is no correlation between the two study variables which is rejected and test become

    significant.

    Model Summaryb

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .862a

    .744 .718 4.31923

    a. Predictors: (Constant), IQ test score

    b. Dependent Variable: Chemistry test score

    (18.656)^0.5 = 4.319 (sum of the square of error term or residual

    R-square = 0.744 this means 74.4% variation is explained by the predictors of the model.

  • 7/31/2019 Regression Equation

    9/56

    ANOVAb

    Model Sum of Squares df Mean Square F Sig.

    1 Regression 541.693 (no. of predictors)1 541.693 29.036 .000a

    Residual 186.557 10 18.656

    Total 728.250 (N-no. of predictors)11

    a. Predictors: (Constant), IQ test score

    b. Dependent Variable: Chemistry test score

    The value of F-statistic is 29.036 which is very high and the p-value (or the sig value) is 0.000which is less than 0.05 ( level of significance) this implies that the test of ANOVA is significant

    and the model is valid from the given predictors. (See page # 365 for study)

    Coefficientsa

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence

    Interval for B

    Collinearity

    Statistics

    B Std. Error Beta

    Lower

    Bound

    Upper

    Bound Tolerance VIF

    1 (Constant) 30.043 10.137 2.964 .014 7.458 52.629

    IQ test

    score

    .897 .167 .862 5.389 .000 .526 1.268 1.000 1.000

    a. Dependent Variable: Chemistry test score

    1. t-values are calculated by taking the ratio between and the standard error. (e.g.30.043/10.137 = 2.964)

    2. as far as standard error is increases the t-value is decreases and as t-value decreases the

    significant value (p-value) will increases and if p-value become more than the level of

    significance which is usually 0.05 than the predictor become insignificant or less

    important for the model

    3. Here the significance value (the p-value) is less than 0.14 and 0.000; both are less than

    0.05 which means that the constant term as well as the coefficient of x both is significant

    for model.4. Standardized Coefficients of Beta: it can be calculated by taking standardized values of

    all the predictors and then run the test of regression analysis. In this context, whatever

    be the value of be calculated that will be the Standardized Coefficients of Beta.

    5. If there are more than one predictors, Standardized Coefficients of Beta will rank the

    importance of the predictors. The bigger value will be more important predictor as

    compare to the one which has the smaller value.

  • 7/31/2019 Regression Equation

    10/56

    6. It shows that 95% confidence Interval which fall between 7.458 and 52.629. It means

    researcher is 95% confident that minimum value for 30.043 may be 7.458 and maximum

    may be 52.629. (See page no. 358 to 360 of wall Pole for further study).

    7. As in this model which is simple linear regression model where only one predictor

    therefore the explanation of tolerance and VIF cannot be explain well. It will discuss

    when we discuss multiple regression model.

    Variable which is save during run of the test is RES_1, which shows the residual value we can

    check by taking the difference between predicted value (PRE_1) and actual (chemistryScore)

  • 7/31/2019 Regression Equation

    11/56

    The variables 7, 8 and 9, 10 show the confidence interval of 95% for the predicted value of y at

    specific value of x on the basis of mean of sample and on the individual basis. See page 361 to

    363 of Wall Pole; for further understanding. Here this is important to understand that the range

    of individual is more than mean. Because, it depends upon sample and its mean

    Final Regression model

    xY 897.0043.30 The predictor explaining 74.4% (R-Square = 0744) of the model.

  • 7/31/2019 Regression Equation

    12/56

    Exponential regression equation

    Or

    Log TransformationIf a set of data appears to be best represented by a nonlinear regression curve, we must then try to

    determine the form of the curve and estimate the parameters. Non linear regression curve means

    the mean values of ys corresponding to each value of x are not fall on a straight line which

    shows that the curve is nonlinear. In that situation we mostly apply exponential curve of the

    form:

    xxy

    Where and are parameters to be estimated from the data. Denoting these estimates by c and

    d, respectively, we can estimate xy by y from the sample regression curve.

    xcdy

    Taking log base 10 on both sides

    xdcy )(logloglog

    And each pair of observations in th sample satisfies the relation

    iii

    iii

    ebxay

    exdcy

    log

    )(logloglog

    Where a = log c and b = log d. Therefore, it is possible to find a and b by the formulas discussed

    above and then find c and d by taking the anti-log of the values.

    Note: the log transformation is usually a good transformation technique for addressing the

    nonlinearity of xy .

  • 7/31/2019 Regression Equation

    13/56

    Multiple regression equationWhat is Multiple Regression Equation?

    Multiple regression equation is a linear regression model with one independent and multiplenumbers of dependent variables. It is a dependence technique.

    rrxxxY xxxxr .........332211....,2,1

    Why is Multiple Regression Analysis?

    Multiple regression analysis is a statistical technique that can be used to analyze the relationship

    between a single dependent (criterion) variable and several independent (predictors) variables.

    The objective of multiple regression analysis is to use the independent variables whose values

    are known to predict the single dependent value selected by the researcher.

    When Multiple Regression Analysis? It will carry out when all the variables are scale in measurement.

    Some time, the variable with ordinal measurement may also use in MLR but without changing

    the measurement.

    In SLR can be effective with a sample size of 20, but in multiple regression requires a minimum

    sample of 50 and preferably 100 observations for most research situations.

    The minimum ratio of observations to variables is 5:1, but the preferred ratio is 15:1 or 20:1,

    which should increase when stepwise estimation is used.

    As the structure of y and error term are same therefore we study the structure of error

    term instead ofy , because it is easy to study. For the study of MLR, the following assumptions about the error term should not violate:

    Error term must hold the Normality Variables are Identical All predictors are Independent or lack of multi-collinearity or Independency Error term structure holds the linearity

  • 7/31/2019 Regression Equation

    14/56

    Important terms to understand the discussion

  • 7/31/2019 Regression Equation

    15/56

    How to perform Multiple Regression Analysis (MLR) on SPSS?

    There are two steps involve in the analysis.

    1. To run the test in SPSS

    2. To understand and interpret the output of the test.

    How to run the test in SPSS?

    For running the test we consider the file ofcar_sales.sav from the sample files of system file.

  • 7/31/2019 Regression Equation

    16/56

    In the beginning remove all the variables from variable no. 15 till end ( these are transformed or

    standardized variables which is not useful during the initial level of MLR)

    Make a correction in the measurement of the variable type from ordinal to nominal.

  • 7/31/2019 Regression Equation

    17/56

    Out of 11 scale variables sales in thousand is dependent variable, while all other scale

    variables (10 in number) are independents. In other words we can say that there are 10 predictors

    which estimate the car sales.

    Here we are using enter method.

  • 7/31/2019 Regression Equation

    18/56

    Do continue and ok

  • 7/31/2019 Regression Equation

    19/56

    How to understand the output of the test?

    Descriptive Statistics

    Mean Std. Deviation N

    Sales in thousands 59.11232 75.058933 117

    4-year resale value 18.03154 11.605632 117

    Price in thousands 25.96949 14.149699 117

    Engine size 3.049 1.0552 117

    Horsepower 181.28 58.592 117

    Wheelbase 107.326 8.0506 117

    Width 71.190 3.5302 117

    Length 187.718 13.8499 117

    Curb weight 3.32405 .597177 117

    Fuel capacity 17.813 3.7946 117

    Fuel efficiency 24.12 4.404 117

    Descriptive of all the variables

  • 7/31/2019 Regression Equation

    20/56

    Correlations

    Sales in

    thousands

    4-year

    resale

    value

    Price in

    thousands

    Engine

    size Horsepower Wheelbase Width Length

    Curb

    weight

    Fuel

    capacity

    Fuel

    efficiency

    Pearso

    n

    Correlat

    ion

    Sales in

    thousands

    1.000 -.275 -.252 .038 -.153 .407 .178 .272 .067 .138 -.067

    4-year resale

    value

    -.275 1.000 .955 .527 .773 -.054 .178 .025 .363 .325 -.399

    Price in

    thousands

    -.252 .955 1.000 .649 .853 .067 .301 .183 .511 .406 -.480

    Engine size .038 .527 .649 1.000 .862 .410 .672 .537 .743 .617 -.725

    Horsepower -.153 .773 .853 .862 1.000 .226 .507 .401 .599 .480 -.596

    Wheelbase .407 -.054 .067 .410 .226 1.000 .676 .854 .676 .659 -.471

    Width .178 .178 .301 .672 .507 .676 1.00

    0

    .743 .736 .672 -.600

    Length .272 .025 .183 .537 .401 .854 .743 1.000 .684 .563 -.466

    Curb weight .067 .363 .511 .743 .599 .676 .736 .684 1.000 .848 -.819

    Fuel capacity .138 .325 .406 .617 .480 .659 .672 .563 .848 1.000 -.809

    Fuel efficiency -.067 -.399 -.480 -.725 -.596 -.471 -.600 -.466 -.819 -.809 1.000

    Sig. (1-

    tailed)

    Sales in

    thousands

    . .001 .003 .342 .050 .000 .028 .001 .236 .069 .237

    4-year resale

    value

    .001 . .000 .000 .000 .283 .027 .393 .000 .000 .000

    Price in

    thousands

    .003 .000 . .000 .000 .236 .000 .024 .000 .000 .000

    Engine size .342 .000 .000 . .000 .000 .000 .000 .000 .000 .000

    Horsepower .050 .000 .000 .000 . .007 .000 .000 .000 .000 .000

    Wheelbase .000 .283 .236 .000 .007 . .000 .000 .000 .000 .000

    Width .028 .027 .000 .000 .000 .000 . .000 .000 .000 .000

    Length .001 .393 .024 .000 .000 .000 .000 . .000 .000 .000

    Curb weight .236 .000 .000 .000 .000 .000 .000 .000 . .000 .000

    Fuel capacity .069 .000 .000 .000 .000 .000 .000 .000 .000 . .000

    Fuel efficiency .237 .000 .000 .000 .000 .000 .000 .000 .000 .000 .

  • 7/31/2019 Regression Equation

    21/56

    The first column shows the correlation between the dependent variable sales in thousands and

    other independent variables. So high correlation in this column and low level of significant (p-

    value) shows that those variables are important for the model otherwise the variable is

    insignificant and less important for the model. On the other hand the high correlation in the other

    columns shows that the independent variables are highly correlated with each other. This shows

    the property ofmulticollinearity is exist between the independent variables. Multicollineartity

    violate the assumption of the independencyamong predictors. The advantage which one can

    observe that, when we remove the variable which has the multicollineartity than the R-square of

    the model will increase which is also a good sign for the model. One of the solutions of this

    problem is to make the factors or components by factor analysis before the regression analysis.

    Model Summaryb

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .565a

    .319 .255 64.798014

    a. Predictors: (Constant), Fuel efficiency, 4-year resale value, Length,

    Width, Engine size, Fuel capacity, Wheelbase, Curb weight,

    Horsepower, Price in thousands

    b. Dependent Variable: Sales in thousands

    R-Square (Coefficient of determination) is equals to 0.319 or 31.9%. It means that all the 10

    predictors explain the 31.9% of the variation in the estimators.

    ANOVAb

    Model Sum of Squares df Mean Square F Sig.

    1 Regression 208454.878 10 20845.488 4.965 .000a

    Residual 445070.963 106 4198.783

    Total 653525.841 116

    a. Predictors: (Constant), Fuel efficiency, 4-year resale value, Length, Width, Engine size, Fuel

    capacity, Wheelbase, Curb weight, Horsepower, Price in thousands

    b. Dependent Variable: Sales in thousands

    The test of ANOVA is significant which show that the model will be valid.

  • 7/31/2019 Regression Equation

    22/56

    Coefficientsa

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence

    Interval for B

    Collinearity

    Statistics

    B Std. Error Beta

    Lower

    Bound

    Upper

    Bound Tolerance VIF

    1 (Constant) -225.116 191.901 -1.173 .243 -605.579 155.347

    4-year resale

    value

    .011 2.253 .002 .005 .996 -4.456 4.478 .053 18.890

    Price in

    thousands

    -.015 2.146 -.003 -.007 .994 -4.269 4.239 .039 25.464

    Engine size 37.640 15.588 .529 2.415 .017 6.735 68.545 .134 7.474

    Horsepower -.612 .342 -.477 -1.788 .077 -1.290 .067 .090 11.095

    Wheelbase 6.391 1.787 .685 3.576 .001 2.848 9.934 .175 5.718

    Width -.375 3.120 -.018 -.120 .905 -6.561 5.811 .298 3.352

    Length -.437 1.077 -.081 -.406 .686 -2.573 1.698 .163 6.149

    Curb weight -69.476 29.602 -.553 -2.347 .021 -128.165 -10.787 .116 8.633

    Fuel capacity -.157 3.697 -.008 -.043 .966 -7.487 7.172 .184 5.437

    Fuel efficiency -2.608 2.931 -.153 -.890 .376 -8.418 3.203 .217 4.602

    a. Dependent Variable: Sales in thousands

    All such predictors which are insignificant should remove one by one from the model start from

    the highest significant valued variable.

    NOTE:

    One should advise to study the residual structure and apply the appropriate transformation before

    the removal of the variable. The reason of this practice is just to find out the true significant

    variables otherwise it is quite possible that during the process of enter and remove one can

    remove the significant variable.

    Study of the residual Structure:

    Usually we perform the analysis of residual structure by drawing the graph (scatter plot) between

    unstandardized predicted value and standardized residual value (the variables unstandardizedpredicted value and standardized residual value generates when researcher runs the testconditionally when we give the option of save during the running the test)

  • 7/31/2019 Regression Equation

    23/56

    If the predicted values are very large than residuals will also be very large, and as a result

    analysis of residual structure becomes difficult. Therefore we use standardized or studentized

    residual values. The Standardized residual (taking on y-axis or dependent) and un-standardized

    predicted value (taking on x-axis or independent)

  • 7/31/2019 Regression Equation

    24/56

    Do OK

    The graph shows that the variation between the two variables initially less but later on it will

    increase. Compare with the following set of graph.

  • 7/31/2019 Regression Equation

    25/56

    Here the graph which we form shows the resemblance with the graph of C. as shown below

    Graph-1

  • 7/31/2019 Regression Equation

    26/56

    It means that the graph has the heteroscedasticity. Heteroscedasticity means that the variance

    pattern between the values of y for the different predictors is not same. So transformation

    requires lifting the assumption violation.

    Method of Transformations:

    When Data Transformation require?

    The data transformation require when the study of residual structure tells us that it is not

    following or violating the assumptions of the model. The assumptions are listed below:

    Linearity of the phenomenon measured (through scatter plot of the error term). Constant variance of the error terms (Homoscedasticity). Independence of the error term (multicollineartity). All predictors are

    Independent. Normality of the error term distribution.

    Why Data Transformation require?Data transformation gives us two benefits:

    1. Two correct the violations of the statistical assumption mentioned above for themultivariate technique.

    2. To improve the relationship between the study variable and the predictors.3. Transformation shows the correct picture of the significant variables. Sometime we may

    find such few variables which are insignificant but after transformation it deduces that thevariable was actually significant.

    How to do Data Transformation?Criterion for the transformation

    FigureViolation of assumption

    Situation of

    ViolationRemedy

    a Null Plot All the

    assumptions of

    the model are

    met.

    No remedy is required.

    b Non Normality Flat pattern Inverse transformation

    (1/y)

    Negatively

    Skewed

    Squared or Cubed (y2 or

    y3)

    Positively

    Skewed

    Square root or log

    transformation (y or lny)

    C Heteroscedasticity Cone shape

    distribution

    Inverse transformation

    (1/y)

  • 7/31/2019 Regression Equation

    27/56

    opens rightward

    Opens leftward Square root

    transformation (y)D,H Heteroscedasticity A diamond shape

    pattern shows

    that the highvariance in the

    middle while at

    the end the

    variance are less.

    It means it is

    nonlinearity is

    also exist.

    Usually log

    transformation is

    advisable. It is oftenhappened that when we

    address one of the

    violation the other will

    also adjusted

    simultaneously.

    E Time base dependence

    F Event base dependence

    G NormalNumerous procedures are available for achieving linearity between two variables but most

    simple nonlinear relationships can be placed in one of four categories in the figure below. If the

    relationship looks like figure a, then either variable can be squared to achieve linearity. Whenmultiple transformation possibilities are shown, start with the top method in each figure and then

    move downward until linearity is achieved.

  • 7/31/2019 Regression Equation

    28/56

    How to perform transformation:

    Stepwise transformation are shown below.

  • 7/31/2019 Regression Equation

    29/56

    The dependent variable sales has been transformed as variable name TransformedSales

    Now see the effect of transformation shown below.

    Before seeing the effect of transformation, run the regression test again by taking the dependent

    variable transformedSales. After that draw the scatter plot of Standardized residual and

    Unstandardized predicted value.

  • 7/31/2019 Regression Equation

    30/56

    Result of transformation

    Before Transformation After transformation

    Model Summaryb

    Mod

    el R

    R

    Square

    Adjusted R

    Square

    Std. Error

    of the

    Estimate

    1 .565a

    .319 .255 64.798014

    a. Predictors: (Constant), Fuel efficiency, 4-year

    resale value, Length, Width, Engine size, Fuel

    capacity, Wheelbase, Curb weight, Horsepower,

    Price in thousands

    b. Dependent Variable: Sales in thousands

    Model Summaryb

    Mode

    l R

    R

    Square

    Adjusted R

    Square

    Std. Error of

    the Estimate

    1 .634a

    .402 .345 1.08253

    a. Predictors: (Constant), Fuel efficiency, 4-year

    resale value, Length, Width, Engine size, Fuel

    capacity, Wheelbase, Curb weight, Horsepower,

    Price in thousands

    b. Dependent Variable: TransformedSales

  • 7/31/2019 Regression Equation

    31/56

    Conclusion:

    1. R-Square improved from 31.9% to 40.2%

    2. The scatter plot also show that this is looks like the null plot.

    Enter and Remove the Predictors (Before transformation)

    Here we perform the removal of insignificant variable from the model one by one we perform

    this exercise without the transformation. The removal of the variable will take place in

    descending order on the basis of significant values. The variable which has the highest sig value

    will remove first continue this procedure one by one.

    Note: It is quite possible that the researcher can think about any particular variable as an

    important factor but statistical may not support his/her idea. It does not matter, if researcher

    thinks to include that variable to be a part of the model so it can be.

    0. Without removing any variable (Consider all predictors)

    Model Summary

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .565a

    .319 .255 64.798014

    a. Predictors: (Constant), Fuel efficiency, 4-year resale value, Length,

    Width, Engine size, Fuel capacity, Wheelbase, Curb weight,

    Horsepower, Price in thousands

  • 7/31/2019 Regression Equation

    32/56

  • 7/31/2019 Regression Equation

    33/56

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence Interval

    for B

    B Std. Error Beta Lower Bound Upper Bound

    1 (Constant) -214.894 152.450 -1.410 .161 -516.258 86.471

    Price in

    thousands

    -.441 .731 -.092 -.604 .547 -1.886 1.004

    Engine size 34.469 11.271 .525 3.058 .003 12.189 56.750

    Horsepower -.563 .250 -.464 -2.254 .026 -1.056 -.069

    Wheelbase 4.729 1.317 .529 3.590 .000 2.125 7.333

    Width -.222 2.428 -.011 -.092 .927 -5.022 4.577

    Length -.129 .748 -.025 -.173 .863 -1.608 1.349

    Curb weight -50.079 19.946 -.462 -2.511 .013 -89.508 -10.649

    Fuel capacity .460 2.719 .026 .169 .866 -4.916 5.835

    Fuel efficiency -1.172 2.266 -.073 -.517 .606 -5.653 3.308

    a. Dependent Variable: Sales in thousands

    After the removal of 4-year resale value the significance value of the variables are addressed

    few are increase and few are decrease. The variable horse power which was initially

    insignificant, but now become significant.

    2. Remove variable width

    Model Summaryb

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .560a

    .314 .275 58.690836

    a. Predictors: (Constant), Fuel efficiency, Length, Price in thousands,

    Fuel capacity, Engine size, Wheelbase, Curb weight, Horsepower

    b. Dependent Variable: Sales in thousands

    This time there is no change in the value of R-square.

    Coefficientsa

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients t Sig.

    95.0% Confidence Interval

    for B

  • 7/31/2019 Regression Equation

    34/56

    B Std. Error Beta Lower Bound Upper Bound

    1 (Constant) -224.403 111.287 -2.016 .046 -444.383 -4.423

    Price in

    thousands

    -.433 .722 -.090 -.599 .550 -1.860 .995

    Engine size 34.277 11.036 .522 3.106 .002 12.463 56.091

    Horsepower -.565 .248 -.466 -2.283 .024 -1.054 -.076

    Wheelbase 4.712 1.299 .527 3.628 .000 2.144 7.279

    Length -.141 .734 -.028 -.192 .848 -1.592 1.310

    Curb weight -50.374 19.616 -.465 -2.568 .011 -89.148 -11.600

    Fuel capacity .447 2.706 .026 .165 .869 -4.902 5.796

    Fuel efficiency -1.181 2.257 -.074 -.523 .602 -5.642 3.280

    a. Dependent Variable: Sales in thousands

    After this run only 4 variables are now left as insignificant while the constant becomes

    significant itself.

    3. Remove variable Fuel Capacity

    Model Summaryb

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .560a .313 .280 58.492268

    a. Predictors: (Constant), Fuel efficiency, Length, Price in thousands,

    Engine size, Wheelbase, Curb weight, Horsepower

    b. Dependent Variable: Sales in thousands

    Again R-square is dropped slightly i.e. only 0.001

    Coefficientsa

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence Interval

    for B

    B Std. Error Beta Lower Bound Upper Bound

    1 (Constant) -221.857 109.840 -2.020 .045 -438.963 -4.750

    Price in

    thousands

    -.423 .718 -.089 -.590 .556 -1.842 .995

    Engine size 34.347 10.990 .523 3.125 .002 12.624 56.070

  • 7/31/2019 Regression Equation

    35/56

    Horsepower -.569 .245 -.469 -2.320 .022 -1.054 -.084

    Wheelbase 4.767 1.250 .534 3.815 .000 2.297 7.238

    Length -.151 .729 -.030 -.207 .836 -1.592 1.290

    Curb weight -48.971 17.621 -.452 -2.779 .006 -83.801 -14.142

    Fuel efficiency -1.310 2.111 -.082 -.620 .536 -5.482 2.863

    a. Dependent Variable: Sales in thousands

    4. Remove variable Length

    Model Summaryb

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .560

    a

    .313 .285 58.298917

    a. Predictors: (Constant), Fuel efficiency, Price in thousands,

    Wheelbase, Engine size, Curb weight, Horsepower

    b. Dependent Variable: Sales in thousands

    No change in the value of R-square

    Coefficientsa

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence Interval

    for B

    B Std. Error Beta Lower Bound Upper Bound

    1 (Constant) -223.549 109.174 -2.048 .042 -439.327 -7.771

    Price in

    thousands

    -.384 .690 -.080 -.556 .579 -1.747 .979

    Engine size 34.172 10.922 .520 3.129 .002 12.586 55.759

    Horsepower -.582 .237 -.480 -2.455 .015 -1.050 -.113

    Wheelbase 4.585 .884 .513 5.189 .000 2.839 6.331

    Curb weight -49.726 17.184 -.459 -2.894 .004 -83.689 -15.762

    Fuel efficiency -1.421 2.035 -.089 -.698 .486 -5.443 2.602

    a. Dependent Variable: Sales in thousands

    5. Remove variable Price in thousand

    Model Summaryb

  • 7/31/2019 Regression Equation

    36/56

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .560a

    .313 .290 57.965284

    a. Predictors: (Constant), Fuel efficiency, Wheelbase, Horsepower,

    Curb weight, Engine size

    b. Dependent Variable: Sales in thousands

    No change in the value of R-square

    Coefficientsa

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence Interval for

    B

    B Std. Error Beta Lower Bound Upper Bound

    1 (Constant) -224.038 108.178 -2.071 .040 -437.823 -10.253

    Engine size 36.113 10.310 .549 3.503 .001 15.737 56.489

    Horsepower -.682 .156 -.563 -4.382 .000 -.990 -.374

    Wheelbase 4.746 .830 .531 5.715 .000 3.105 6.386

    Curb weight -53.227 15.946 -.491 -3.338 .001 -84.739 -21.715

    Fuel efficiency -1.541 2.006 -.096 -.768 .444 -5.506 2.424

    a. Dependent Variable: Sales in thousands

    6. Remove variable Fuel efficiency

    Model Summaryb

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .556a

    .309 .291 57.595036

    a. Predictors: (Constant), Curb weight, Horsepower, Wheelbase,

    Engine size

    b. Dependent Variable: Sales in thousands

    This time R-square dropped by 0.004 unit but it is still predictors explain the estimator by 30.9%.

    Coefficientsa

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence Interval for

    B

    B Std. Error Beta Lower Bound Upper Bound

  • 7/31/2019 Regression Equation

    37/56

    1 (Constant) -288.879 73.316 -3.940 .000 -433.745 -144.012

    Engine size 36.856 9.985 .561 3.691 .000 17.127 56.585

    Horsepower -.669 .154 -.552 -4.354 .000 -.973 -.365

    Wheelbase 4.747 .820 .531 5.787 .000 3.126 6.367

    Curb weight -46.288 13.208 -.427 -3.505 .001 -72.386 -20.190

    a. Dependent Variable: Sales in thousands

    Now only significant variables are left.

    Enter and Remove the Predictors (After Transformation)

    0. First output with considering all the variables.

    Model Summaryb

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .634a

    .402 .345 1.08253

    a. Predictors: (Constant), Fuel efficiency, 4-year resale value, Length,

    Width, Engine size, Fuel capacity, Wheelbase, Curb weight,

    Horsepower, Price in thousands

    b. Dependent Variable: LogSale

    Coefficientsa

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence

    Interval for B

    Collinearity

    Statistics

    B Std. Error Beta

    Lower

    Bound

    Upper

    Bound Tolerance VIF

    1 (Constant) -1.097 3.206 -.342 .733 -7.453 5.259

    4-year resale

    value

    -.012 .038 -.101 -.311 .757 -.086 .063 .053 18.890

    Price in

    thousands

    -.036 .036 -.385 -1.015 .312 -.107 .035 .039 25.464

    Engine size .310 .260 .244 1.190 .237 -.206 .826 .134 7.474

    Horsepower -.003 .006 -.118 -.470 .639 -.014 .009 .090 11.095

    Wheelbase .093 .030 .559 3.111 .002 .034 .152 .175 5.718

    Width -.026 .052 -.068 -.492 .624 -.129 .078 .298 3.352

  • 7/31/2019 Regression Equation

    38/56

    Length -.018 .018 -.188 -1.008 .316 -.054 .018 .163 6.149

    Curb weight .262 .495 .117 .530 .597 -.718 1.242 .116 8.633

    Fuel capacity -.059 .062 -.166 -.949 .345 -.181 .064 .184 5.437

    Fuel efficiency .026 .049 .087 .538 .592 -.071 .123 .217 4.602

    a. Dependent Variable: LogSale

    1. Remove variable 4-year resale value

    The variable 4-year resale value having the heighest sig value i.e. 0.757 so remove it first.

    Model Summaryb

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .678a

    .459 .425 1.01072

    a. Predictors: (Constant), Fuel efficiency, Length, Price in thousands,

    Width, Fuel capacity, Engine size, Wheelbase, Curb weight,

    Horsepower

    b. Dependent Variable: LogSale

    Coefficientsa

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence

    Interval for B

    Collinearity

    Statistics

    B Std. Error Beta

    Lower

    Bound

    Upper

    Bound Tolerance VIF

    1 (Constant) -.414 2.616 -.158 .874 -5.586 4.758

    Price in

    thousands

    -.057 .013 -.616 -4.542 .000 -.082 -.032 .207 4.835

    Engine size .408 .193 .321 2.110 .037 .026 .791 .164 6.095

    Horsepower -.004 .004 -.151 -.828 .409 -.012 .005 .114 8.760

    Wheelbase .061 .023 .355 2.711 .008 .017 .106 .222 4.499

    Width -.049 .042 -.127 -1.175 .242 -.131 .033 .325 3.080

    Length -.003 .013 -.028 -.217 .828 -.028 .023 .226 4.418

    Curb weight .424 .342 .202 1.238 .218 -.253 1.100 .142 7.019

    Fuel capacity -.026 .047 -.077 -.560 .576 -.118 .066 .200 4.991

    Fuel efficiency .047 .039 .153 1.218 .225 -.030 .124 .241 4.144

  • 7/31/2019 Regression Equation

    39/56

  • 7/31/2019 Regression Equation

    40/56

    Coefficientsa,b

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence

    Interval for B

    Collinearity

    Statistics

    B Std. Error Beta

    Lower

    Bound

    Upper

    Bound Tolerance VIF

    1 Price in

    thousands

    -.057 .012 -.498 -4.603 .000 -.082 -.033 .045 22.048

    Engine size .413 .190 .376 2.174 .031 .038 .789 .018 56.259

    Horsepower -.004 .004 -.194 -.836 .405 -.012 .005 .010 101.939

    Wheelbase .061 .022 1.848 2.720 .007 .017 .105 .001 870.319

    Width -.053 .030 -1.072 -1.757 .081 -.114 .007 .001 702.034

    Length -.003 .013 -.141 -.209 .834 -.028 .023 .001 859.164

    Curb weight .424 .341 .410 1.241 .216 -.251 1.098 .005 205.739

    Fuel capacity -.027 .046 -.138 -.574 .567 -.118 .065 .009 109.013

    Fuel efficiency .044 .034 .304 1.299 .196 -.023 .112 .010 102.955

    a. Dependent Variable: LogSale

    b. Linear Regression through the Origin

    3. Remove variable Length

    Model Summary

    c,d

    Model R R Squareb

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .961a

    .924 .920 1.00392

    a. Predictors: Fuel efficiency, Price in thousands, Engine size, Fuel

    capacity, Horsepower, Curb weight, Wheelbase, Width

    b. For regression through the origin (the no-intercept model), R Square

    measures the proportion of the variability in the dependent variable about

    the origin explained by regression. This CANNOT be compared to R

    Square for models which include an intercept.

    c. Dependent Variable: LogSale

    d. Linear Regression through the Origin

    Coefficientsa,b

  • 7/31/2019 Regression Equation

    41/56

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence

    Interval for B

    Collinearity

    Statistics

    B Std. Error Beta

    Lower

    Bound

    Upper

    Bound Tolerance VIF

    1 Price in

    thousands

    -.057 .012 -.492 -4.703 .000 -.080 -.033 .048 20.811

    Engine size .411 .189 .374 2.173 .031 .037 .785 .018 56.089

    Horsepower -.004 .004 -.205 -.904 .367 -.012 .004 .010 97.349

    Wheelbase .058 .017 1.755 3.432 .001 .025 .091 .002 495.998

    Width -.055 .030 -1.096 -1.833 .069 -.114 .004 .001 678.380

    Curb weight .410 .334 .397 1.228 .221 -.250 1.070 .005 198.513

    Fuel capacity -.026 .046 -.133 -.558 .578 -.117 .065 .009 108.047

    Fuel efficiency .043 .033 .294 1.288 .200 -.023 .109 .010 99.044

    a. Dependent Variable: LogSale

    b. Linear Regression through the Origin

    4. Remove variable Fuel Capacity

    Model Summaryc,d

    Model R R Squareb

    Adjusted RSquare

    Std. Error of theEstimate

    1 .961a

    .924 .920 1.00154

    a. Predictors: Fuel efficiency, Price in thousands, Engine size, Curb

    weight, Horsepower, Wheelbase, Width

    b. For regression through the origin (the no-intercept model), R Square

    measures the proportion of the variability in the dependent variable about

    the origin explained by regression. This CANNOT be compared to R

    Square for models which include an intercept.

    c. Dependent Variable: LogSale

    d. Linear Regression through the Origin

    Coefficientsa,b

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients t Sig.

    95.0% Confidence

    Interval for B

    Collinearity

    Statistics

  • 7/31/2019 Regression Equation

    42/56

    B Std. Error Beta

    Lower

    Bound

    Upper

    Bound Tolerance VIF

    1 Price in

    thousands

    -.057 .012 -.499 -4.814 .000 -.081 -.034 .049 20.521

    Engine size .410 .189 .372 2.172 .031 .037 .783 .018 56.081

    Horsepower -.003 .004 -.188 -.841 .402 -.012 .005 .010 95.713

    Wheelbase .055 .016 1.680 3.413 .001 .023 .087 .002 461.716

    Width -.057 .030 -1.135 -1.916 .057 -.115 .002 .001 669.012

    Curb weight .334 .304 .323 1.098 .274 -.267 .934 .006 164.991

    Fuel efficiency .050 .031 .343 1.632 .105 -.011 .111 .012 84.344

    a. Dependent Variable: LogSale

    b. Linear Regression through the Origin

    5. Remove variable Horse Power

    Model Summaryc,d

    Model R R Squareb

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .961a

    .924 .920 1.00053

    a. Predictors: Fuel efficiency, Price in thousands, Engine size, Curbweight, Wheelbase, Width

    b. For regression through the origin (the no-intercept model), R Square

    measures the proportion of the variability in the dependent variable about

    the origin explained by regression. This CANNOT be compared to R

    Square for models which include an intercept.

    c. Dependent Variable: LogSale

    d. Linear Regression through the Origin

    Coefficients

    a,b

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence

    Interval for B

    Collinearity

    Statistics

    B Std. Error Beta

    Lower

    Bound

    Upper

    Bound Tolerance VIF

  • 7/31/2019 Regression Equation

    43/56

    1 Price in

    thousands

    -.065 .008 -.566 -8.443 .000 -.080 -.050 .117 8.584

    Engine size .308 .145 .280 2.128 .035 .022 .595 .030 33.125

    Wheelbase .055 .016 1.662 3.383 .001 .023 .087 .002 460.831

    Width -.062 .029 -1.235 -2.132 .035 -.119 -.004 .002 641.711

    Curb weight .403 .292 .390 1.381 .169 -.174 .980 .007 152.731

    Fuel efficiency .053 .031 .362 1.732 .085 -.007 .113 .012 83.408

    a. Dependent Variable: LogSale

    b. Linear Regression through the Origin

    6. Remove variable Curb weight

    Model Summaryc,d

    Model R R Squareb

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .961a

    .923 .920 1.00022

    a. Predictors: Fuel efficiency, Price in thousands, Engine size,

    Wheelbase, Width

    b. For regression through the origin (the no-intercept model), R Square

    measures the proportion of the variability in the dependent variable about

    the origin explained by regression. This CANNOT be compared to R

    Square for models which include an intercept.

    c. Dependent Variable: LogSale

    d. Linear Regression through the Origin

    Coefficientsa,b

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence

    Interval for B

    Collinearity

    Statistics

    B Std. Error Beta

    Lower

    Bound

    Upper

    Bound Tolerance VIF

    1 Price in

    thousands

    -.062 .007 -.542 -8.407 .000 -.077 -.047 .125 7.975

    Engine size .332 .143 .303 2.317 .022 .049 .615 .030 32.794

    Wheelbase .063 .015 1.930 4.265 .000 .034 .093 .003 393.006

    Width -.049 .027 -.989 -1.793 .075 -.103 .005 .002 583.801

    Fuel efficiency .027 .024 .186 1.125 .262 -.021 .075 .019 52.739

  • 7/31/2019 Regression Equation

    44/56

    a. Dependent Variable: LogSale

    b. Linear Regression through the Origin

    7. Remove variable Fuel efficiency

    8. Remove variable WidthModel Summary

    c,d

    Model R R Squareb

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .960a

    .922 .921 .99940

    a. Predictors: Wheelbase, Price in thousands, Engine size

    Model Summaryc,d

    Model R R Squareb

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .961a

    .923 .921 .99628

    a. Predictors: Width, Price in thousands, Engine size, Wheelbase

    b. For regression through the origin (the no-intercept model), R Square

    measures the proportion of the variability in the dependent variable about

    the origin explained by regression. This CANNOT be compared to R

    Square for models which include an intercept.

    c. Dependent Variable: LogSale

    d. Linear Regression through the Origin

    Coefficientsa,b

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence

    Interval for B

    Collinearity

    Statistics

    B Std. Error Beta

    Lower

    Bound

    Upper

    Bound Tolerance VIF

    1 Price in

    thousands

    -.062 .007 -.541 -8.480 .000 -.077 -.048 .125 8.005

    Engine size .223 .106 .203 2.094 .038 .013 .432 .054 18.427

    Wheelbase .061 .015 1.849 4.161 .000 .032 .090 .003 388.695

    Width -.031 .022 -.630 -1.398 .164 -.076 .013 .003 399.409

    a. Dependent Variable: LogSale

    b. Linear Regression through the Origin

  • 7/31/2019 Regression Equation

    45/56

    b. For regression through the origin (the no-intercept model), R Square

    measures the proportion of the variability in the dependent variable about

    the origin explained by regression. This CANNOT be compared to R

    Square for models which include an intercept.

    c. Dependent Variable: LogSale

    d. Linear Regression through the Origin

    Coefficientsa,b

    Model

    Unstandardized

    Coefficients

    Standardized

    Coefficients

    t Sig.

    95.0% Confidence

    Interval for B

    Collinearity

    Statistics

    B Std. Error Beta

    Lower

    Bound

    Upper

    Bound Tolerance VIF

    1 Price in

    thousands

    -.064 .007 -.559 -8.913 .000 -.078 -.050 .130 7.684

    Engine size .221 .107 .202 2.078 .039 .011 .432 .054 18.426

    Wheelbase .041 .003 1.237 16.255 .000 .036 .046 .088 11.328

    a. Dependent Variable: LogSale

    b. Linear Regression through the Origin

    Conclusion

    After all these exercise we conclude that there are three predictors; (1) price in thousand, (2)

    engine size and (3) wheel base are the significant predictors but the VIF of engine size is veryhigh that is 18.426 so it shows the effect of multicollinearity exist here although it is significant

    coefficient. So it depends upon the researcher and theory, whether to include or not to include

    this variable in the regression model. On the other hand, when the test was run before the

    transformation we find the other four variables with constant term was (1) engine size (2) horse

    power (3) wheel base and (4) curb weight. After transformation we get theoretically more logical

    predictors as compare to the result of before transformation while the value of R-Square is also

    much better than before.

    Finally the regression equation will be:

    Logsale = - 0.064 (price in thousands) + 0.041(Wheelbase)

    Enter & Remove by Backward Method

    The same test and approximately the same result can calculate by using the backward method.

    We validate the result of before and after transformation by backward method.

  • 7/31/2019 Regression Equation

    46/56

    How to perform Backward Method? (Without transformation)

  • 7/31/2019 Regression Equation

    47/56

    Variables Entered/Removedb

    Model

    Variables

    Entered

    Variables

    Removed Method

    1 Fuel efficiency,

    4-year resale

    value, Length,

    Width, Engine

    size, Fuel

    capacity,

    Wheelbase, Curb

    weight,

    Horsepower,

    Price in

    thousandsa

    . Enter

    2 . 4-year resale

    value

    Backward

    (criterion:

    Probability of F-

    to-remove >=

    .100).

    3 . Price in

    thousands

    Backward

    (criterion:

    Probability of F-

    to-remove >=

    .100).

    4 . Fuel capacity Backward

    (criterion:

    Probability of F-

    to-remove >=

    .100).

    5 . Width Backward

    (criterion:

    Probability of F-

    to-remove >=

    .100).

  • 7/31/2019 Regression Equation

    48/56

    6 . Length Backward

    (criterion:

    Probability of F-

    to-remove >=

    .100).

    7 . Fuel efficiency Backward

    (criterion:

    Probability of F-

    to-remove >=

    .100).

    a. All requested variables entered.

    b. Dependent Variable: Sales in thousands

    Model Summary

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .565a

    .319 .255 64.798014

    2 .565b

    .319 .262 64.494517

    3 .565c

    .319 .269 64.195246

    4 .565d

    .319 .275 63.900650

    5 .565e

    .319 .282 63.614616

    6 .563f

    .317 .287 63.398826

    7 .556g

    .309 .285 63.486236

    a. Predictors: (Constant), Fuel efficiency, 4-year resale value, Length,

    Width, Engine size, Fuel capacity, Wheelbase, Curb weight,

    Horsepower, Price in thousands

    b. Predictors: (Constant), Fuel efficiency, Length, Width, Engine size,

    Fuel capacity, Wheelbase, Curb weight, Horsepower, Price in

    thousands

    c. Predictors: (Constant), Fuel efficiency, Length, Width, Engine size,

    Fuel capacity, Wheelbase, Curb weight, Horsepower

    d. Predictors: (Constant), Fuel efficiency, Length, Width, Engine size,

    Wheelbase, Curb weight, Horsepower

    e. Predictors: (Constant), Fuel efficiency, Length, Engine size,

    Wheelbase, Curb weight, Horsepower

  • 7/31/2019 Regression Equation

    49/56

    f. Predictors: (Constant), Fuel efficiency, Engine size, Wheelbase, Curb

    weight, Horsepower

    g. Predictors: (Constant), Engine size, Wheelbase, Curb weight,

    Horsepower

    Coefficientsa

    Model

    Unstandardized Coefficients

    Standardized

    Coefficients

    t Sig.B Std. Error Beta

    1 (Constant) -225.116 191.901 -1.173 .243

    4-year resale value .011 2.253 .002 .005 .996

    Price in thousands -.015 2.146 -.003 -.007 .994

    Engine size 37.640 15.588 .529 2.415 .017

    Horsepower -.612 .342 -.477 -1.788 .077

    Wheelbase 6.391 1.787 .685 3.576 .001

    Width -.375 3.120 -.018 -.120 .905

    Length -.437 1.077 -.081 -.406 .686

    Curb weight -69.476 29.602 -.553 -2.347 .021

    Fuel capacity -.157 3.697 -.008 -.043 .966

    Fuel efficiency -2.608 2.931 -.153 -.890 .376

    2 (Constant) -224.919 186.969 -1.203 .232

    Price in thousands -.006 .991 -.001 -.006 .996

    Engine size 37.627 15.300 .529 2.459 .016

    Horsepower -.611 .339 -.477 -1.801 .074

    Wheelbase 6.392 1.769 .686 3.613 .000

    Width -.374 3.099 -.018 -.121 .904

    Length -.438 1.046 -.081 -.419 .676

    Curb weight -69.529 27.528 -.553 -2.526 .013

    Fuel capacity -.154 3.619 -.008 -.043 .966

    Fuel efficiency -2.610 2.878 -.153 -.907 .367

    3 (Constant) -225.048 184.706 -1.218 .226

  • 7/31/2019 Regression Equation

    50/56

    Engine size 37.654 14.496 .529 2.597 .011

    Horsepower -.613 .213 -.478 -2.881 .005

    Wheelbase 6.392 1.758 .686 3.636 .000

    Width -.371 3.057 -.017 -.121 .904

    Length -.437 1.019 -.081 -.429 .669

    Curb weight -69.586 25.500 -.554 -2.729 .007

    Fuel capacity -.156 3.590 -.008 -.043 .966

    Fuel efficiency -2.613 2.827 -.153 -.924 .357

    4 (Constant) -225.414 183.665 -1.227 .222

    Engine size 37.728 14.328 .530 2.633 .010

    Horsepower -.614 .211 -.479 -2.906 .004

    Wheelbase 6.364 1.619 .683 3.929 .000

    Width -.395 2.993 -.019 -.132 .895

    Length -.424 .967 -.078 -.438 .662

    Curb weight -70.015 23.394 -.557 -2.993 .003

    Fuel efficiency -2.561 2.557 -.150 -1.002 .319

    5 (Constant) -242.535 129.494 -1.873 .064

    Engine size 37.238 13.775 .523 2.703 .008

    Horsepower -.611 .209 -.477 -2.919 .004

    Wheelbase 6.346 1.607 .681 3.949 .000

    Length -.460 .924 -.085 -.498 .619

    Curb weight -70.466 23.039 -.561 -3.059 .003

    Fuel efficiency -2.554 2.545 -.150 -1.004 .318

    6 (Constant) -247.393 128.688 -1.922 .057

    Engine size 36.371 13.619 .511 2.671 .009

    Horsepower -.626 .207 -.489 -3.031 .003

    Wheelbase 5.745 1.057 .616 5.436 .000

    Curb weight -71.956 22.767 -.572 -3.161 .002

    Fuel efficiency -2.831 2.475 -.166 -1.144 .255

    7 (Constant) -353.714 89.146 -3.968 .000

    Engine size 39.752 13.313 .559 2.986 .003

    Horsepower -.638 .207 -.498 -3.087 .003

  • 7/31/2019 Regression Equation

    51/56

    Wheelbase 5.556 1.045 .596 5.315 .000

    Curb weight -56.888 18.597 -.453 -3.059 .003

    a. Dependent Variable: Sales in thousands

    At the end we find the same significant variables which we find in enter and remove method

    with approximately the same level of significant values.

    How to perform Backward Method? (After transformation)

  • 7/31/2019 Regression Equation

    52/56

    Variables Entered/Removedb

    Model Variables Entered

    Variables

    Removed Method

    1 Fuel efficiency, 4-year resale value, Length,

    Width, Engine size, Fuel capacity, Wheelbase,

    Curb weight, Horsepower, Price in thousandsa

    . Enter

    2 . 4-year resale value Backward (criterion: Probability of F-to-

    remove >= .100).

    3 . Horsepower Backward (criterion: Probability of F-to-

    remove >= .100).

    4 . Width Backward (criterion: Probability of F-to-

    remove >= .100).

    5 . Fuel efficiency Backward (criterion: Probability of F-to-

    remove >= .100).

    6 . Curb weight Backward (criterion: Probability of F-to-

    remove >= .100).

    7 . Length Backward (criterion: Probability of F-to-

    remove >= .100).

    8 . Engine size Backward (criterion: Probability of F-to-

    remove >= .100).

    9 . Fuel capacity Backward (criterion: Probability of F-to-

    remove >= .100).

    a. All requested variables entered.

    b. Dependent Variable: LogSale

  • 7/31/2019 Regression Equation

    53/56

    Model Summary

    Model R R Square

    Adjusted R

    Square

    Std. Error of the

    Estimate

    1 .634a

    .402 .345 1.08253

    2 .633b

    .401 .351 1.07796

    3 .632c

    .400 .355 1.07419

    4 .631d

    .398 .360 1.07068

    5 .629e

    .396 .363 1.06766

    6 .628f

    .394 .367 1.06480

    7 .621g

    .386 .364 1.06719

    8 .618h

    .382 .365 1.06590

    9 .613i

    .376 .365 1.06600

    a. Predictors: (Constant), Fuel efficiency, 4-year resale value, Length,

    Width, Engine size, Fuel capacity, Wheelbase, Curb weight,

    Horsepower, Price in thousands

    b. Predictors: (Constant), Fuel efficiency, Length, Width, Engine size,

    Fuel capacity, Wheelbase, Curb weight, Horsepower, Price in

    thousands

    c. Predictors: (Constant), Fuel efficiency, Length, Width, Engine size,

    Fuel capacity, Wheelbase, Curb weight, Price in thousands

    d. Predictors: (Constant), Fuel efficiency, Length, Engine size, Fuel

    capacity, Wheelbase, Curb weight, Price in thousands

    e. Predictors: (Constant), Length, Engine size, Fuel capacity,

    Wheelbase, Curb weight, Price in thousands

    f. Predictors: (Constant), Length, Engine size, Fuel capacity,

    Wheelbase, Price in thousands

    g. Predictors: (Constant), Engine size, Fuel capacity, Wheelbase, Price

    in thousands

    h. Predictors: (Constant), Fuel capacity, Wheelbase, Price in thousands

    i. Predictors: (Constant), Wheelbase, Price in thousands

  • 7/31/2019 Regression Equation

    54/56

    Coefficientsa

    Model

    Unstandardized Coefficients

    Standardized

    Coefficients

    t Sig.B Std. Error Beta

    1 (Constant) -1.097 3.206 -.342 .733

    4-year resale value -.012 .038 -.101 -.311 .757

    Price in thousands -.036 .036 -.385 -1.015 .312

    Engine size .310 .260 .244 1.190 .237

    Horsepower -.003 .006 -.118 -.470 .639

    Wheelbase .093 .030 .559 3.111 .002

    Width -.026 .052 -.068 -.492 .624

    Length -.018 .018 -.188 -1.008 .316

    Curb weight .262 .495 .117 .530 .597

    Fuel capacity -.059 .062 -.166 -.949 .345

    Fuel efficiency .026 .049 .087 .538 .592

    2 (Constant) -1.301 3.125 -.416 .678

    Price in thousands -.046 .017 -.489 -2.793 .006

    Engine size .323 .256 .255 1.264 .209

    Horsepower -.003 .006 -.124 -.497 .620

    Wheelbase .092 .030 .553 3.108 .002

    Width -.027 .052 -.071 -.516 .607

    Length -.017 .017 -.175 -.968 .335

    Curb weight .317 .460 .141 .689 .493

    Fuel capacity -.062 .060 -.176 -1.027 .307

    Fuel efficiency .029 .048 .095 .599 .551

    3 (Constant) -1.344 3.113 -.432 .667

    Price in thousands -.053 .010 -.557 -5.065 .000

    Engine size .238 .188 .187 1.262 .210

    Wheelbase .094 .029 .564 3.210 .002

    Width -.028 .052 -.073 -.537 .592

    Length -.019 .017 -.199 -1.147 .254

  • 7/31/2019 Regression Equation

    55/56

    Curb weight .377 .442 .168 .853 .395

    Fuel capacity -.062 .060 -.175 -1.024 .308

    Fuel efficiency .031 .048 .103 .653 .515

    4 (Constant) -2.502 2.239 -1.117 .266

    Price in thousands -.052 .010 -.547 -5.062 .000

    Engine size .204 .177 .161 1.153 .251

    Wheelbase .094 .029 .565 3.224 .002

    Length -.022 .016 -.228 -1.380 .170

    Curb weight .354 .439 .158 .806 .422

    Fuel capacity -.068 .059 -.192 -1.150 .253

    Fuel efficiency .029 .047 .096 .617 .539

    5 (Constant) -1.553 1.622 -.958 .340

    Price in thousands -.051 .010 -.539 -5.039 .000

    Engine size .167 .166 .132 1.006 .316

    Wheelbase .096 .029 .579 3.340 .001

    Length -.021 .016 -.218 -1.331 .186

    Curb weight .262 .411 .117 .637 .526

    Fuel capacity -.083 .053 -.236 -1.556 .123

    6 (Constant) -1.771 1.581 -1.120 .265

    Price in thousands -.050 .010 -.524 -5.037 .000

    Engine size .199 .158 .157 1.256 .212

    Wheelbase .098 .029 .589 3.425 .001

    Length -.019 .015 -.196 -1.226 .223

    Fuel capacity -.063 .042 -.177 -1.473 .143

    7 (Constant) -2.338 1.515 -1.542 .126

    Price in thousands -.050 .010 -.525 -5.043 .000

    Engine size .125 .147 .099 .852 .396

    Wheelbase .070 .017 .422 4.011 .000

    Fuel capacity -.050 .041 -.141 -1.205 .231

    8 (Constant) -2.593 1.484 -1.747 .083

    Price in thousands -.045 .008 -.474 -5.595 .000

    Wheelbase .073 .017 .441 4.292 .000

  • 7/31/2019 Regression Equation

    56/56

    Fuel capacity -.040 .040 -.113 -1.011 .314

    9 (Constant) -1.920 1.326 -1.448 .150

    Price in thousands -.049 .007 -.515 -6.945 .000

    Wheelbase .061 .012 .369 4.980 .000

    a. Dependent Variable: LogSale

    As here, only the two variables that is price in thousand and wheelbase are the two significant variables while

    the constant term is insignificant for the model.

    Finally, the regression equation will be:

    logSale = - 0.049(price in thousands) + 0.061(Wheelbase)

    The result which we generate by enter method is mentioned below.

    Logsale = - 0.064 (price in thousands) + 0.041(Wheelbase)

    After the study of the two regression model we find that both the method gives us the same variable with

    slightly different in the coefficients. So it is possible that may find a minor difference in the beta coefficients.

    Tolerance and Variance Inflation factor (VIF)The tolerance and VIF are reciprocal to each other. It use for the measurement of independency

    of the independent variables or one can say that to measure the multicollinearity among the

    independent variables. multicollinearity measure expressing the degree of explanation or

    variation of one independent variable in the model due to other independent variable in the

    model. This can be calculated as to take one of the independent variable as a dependent

    variable and rest of the independent variables as predictors and then to regress them. The

    regression will tell us the value of R2, the percentage which is explained by the other variable.

    This R2

    is called as R2*

    , the amount of independent variable which is explained by the other

    independent variable.Tolerance is calculated as 1-R

    2*, It means that as far as the value of tolerance becomes high, the

    multicollinearity will be low.

    Variance inflation factor (VIF) is calculated as 1/Tolerance. VIF is the degree to which thestandard error has been increased due to multicollinearity.

    R2*

    increases then tolerance will decrease this implies VIF will increase this means

    multicollinearity is increases.