1 Building the Regression Model –I Selection and Validation KNN Ch. 9 (pp. 343-375)
Model selection and model building
-
Upload
pascale-frank -
Category
Documents
-
view
46 -
download
0
description
Transcript of Model selection and model building
![Page 1: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/1.jpg)
Model selection and model building
![Page 2: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/2.jpg)
Model selection
Selection of predictor variables
![Page 3: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/3.jpg)
Statement of problem
• A common problem is that there is a large set of candidate predictor variables.
• Goal is to choose a small subset from the larger set so that the resulting regression model is simple, yet have good predictive ability.
![Page 4: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/4.jpg)
Example: Cement data
• Response y: heat evolved in calories during hardening of cement on a per gram basis
• Predictor x1: % of tricalcium aluminate
• Predictor x2: % of tricalcium silicate
• Predictor x3: % of tetracalcium alumino ferrite
• Predictor x4: % of dicalcium silicate
![Page 5: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/5.jpg)
Example: Cement data
83.35
105.05
6
16
37.25
59.75
8.75
18.25
83.35
105.0
5
19.5
46.5
6 1637.25
59.75 8.7
518.25 19
.546.5
y
x1
x2
x3
x4
![Page 6: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/6.jpg)
Two basic methods of selecting predictors
• Stepwise regression: Enter and remove variables, in a stepwise manner, until no justifiable reason to enter or remove more.
• Best subsets regression: Select the subset of variables that do the best at meeting some well-defined objective criterion.
![Page 7: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/7.jpg)
Stepwise regression: the idea
• Start with no predictors in the model.
• At each step, enter or remove a variable based on partial F-tests.
• Stop when no more variables can be justifiably entered or removed.
![Page 8: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/8.jpg)
Stepwise regression: the steps
• Specify an Alpha-to-Enter (0.15) and an Alpha-to-Remove (0.15).
• Start with no predictors in the model.• Put the predictor with the smallest P-value
based on the partial F statistic (a t-statistic) in the model. If P-value > 0.15, then stop. None of the predictors have good predictive ability. Otherwise …
![Page 9: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/9.jpg)
Stepwise regression: the steps
• Add the predictor with the smallest P-value (below 0.15) based on the partial F-statistic (a t-statistic) in the model. If none of the predictors yield P-values < 0.15, stop.
• If P-value > 0.15 for any of the partial F statistics, then remove violating predictor.
• Continue above two steps, until no more predictors can be entered or removed.
![Page 10: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/10.jpg)
Predictor Coef SE Coef T PConstant 81.479 4.927 16.54 0.000x1 1.8687 0.5264 3.55 0.005
Predictor Coef SE Coef T PConstant 57.424 8.491 6.76 0.000x2 0.7891 0.1684 4.69 0.001
Predictor Coef SE Coef T PConstant 110.203 7.948 13.87 0.000x3 -1.2558 0.5984 -2.10 0.060
Predictor Coef SE Coef T PConstant 117.568 5.262 22.34 0.000x4 -0.7382 0.1546 -4.77 0.001
Example: Cement data
![Page 11: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/11.jpg)
Predictor Coef SE Coef T PConstant 103.097 2.124 48.54 0.000x4 -0.61395 0.04864 -12.62 0.000x1 1.4400 0.1384 10.40 0.000
Predictor Coef SE Coef T PConstant 94.16 56.63 1.66 0.127x4 -0.4569 0.6960 -0.66 0.526x2 0.3109 0.7486 0.42 0.687
Predictor Coef SE Coef T PConstant 131.282 3.275 40.09 0.000x4 -0.72460 0.07233 -10.02 0.000x3 -1.1999 0.1890 -6.35 0.000
![Page 12: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/12.jpg)
Predictor Coef SE Coef T PConstant 71.65 14.14 5.07 0.001x4 -0.2365 0.1733 -1.37 0.205x1 1.4519 0.1170 12.41 0.000x2 0.4161 0.1856 2.24 0.052
Predictor Coef SE Coef T PConstant 111.684 4.562 24.48 0.000x4 -0.64280 0.04454 -14.43 0.000x1 1.0519 0.2237 4.70 0.001x3 -0.4100 0.1992 -2.06 0.070
![Page 13: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/13.jpg)
Predictor Coef SE Coef T PConstant 52.577 2.286 23.00 0.000x1 1.4683 0.1213 12.10 0.000x2 0.66225 0.04585 14.44 0.000
![Page 14: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/14.jpg)
Predictor Coef SE Coef T PConstant 71.65 14.14 5.07 0.001x1 1.4519 0.1170 12.41 0.000x2 0.4161 0.1856 2.24 0.052x4 -0.2365 0.1733 -1.37 0.205
Predictor Coef SE Coef T PConstant 48.194 3.913 12.32 0.000x1 1.6959 0.2046 8.29 0.000x2 0.65691 0.04423 14.85 0.000x3 0.2500 0.1847 1.35 0.209
![Page 15: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/15.jpg)
Predictor Coef SE Coef T PConstant 52.577 2.286 23.00 0.000x1 1.4683 0.1213 12.10 0.000x2 0.66225 0.04585 14.44 0.000
![Page 16: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/16.jpg)
Stepwise Regression: y versus x1, x2, x3, x4 Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is y on 4 predictors, with N = 13
Step 1 2 3 4Constant 117.57 103.10 71.65 52.58
x4 -0.738 -0.614 -0.237 T-Value -4.77 -12.62 -1.37 P-Value 0.001 0.000 0.205
x1 1.44 1.45 1.47T-Value 10.40 12.41 12.10P-Value 0.000 0.000 0.000
x2 0.416 0.662T-Value 2.24 14.44P-Value 0.052 0.000
S 8.96 2.73 2.31 2.41R-Sq 67.45 97.25 98.23 97.87R-Sq(adj) 64.50 96.70 97.64 97.44C-p 138.7 5.5 3.0 2.7
![Page 17: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/17.jpg)
Drawbacks of stepwise regression
• The final model is not guaranteed to be optimal in any specified sense.
• The procedure yields a single final model, although in practice there are often several almost equally good models.
![Page 18: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/18.jpg)
Best subsets regression
• If there are p-1 possible predictors, then there are 2p-1 possible regression models containing the predictors.
• For example, 10 predictors yields 210 = 1024 possible regression models.
• A best subsets algorithm determines the best subsets of each size, so that choice of the final model can be made by researcher.
![Page 19: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/19.jpg)
What is used to judge “best”?
• R-square
• Adjusted R-square
• MSE (or S = square root of MSE)
• Mallow’s Cp
![Page 20: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/20.jpg)
R-squared
SSTO
SSE
SSTO
SSRR 12
Use the R-squared values to find the point where adding more predictors is not worthwhile because it leads to a very small increase in R-squared.
![Page 21: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/21.jpg)
Adjusted R-squared or MSE
MSESSTO
n
SSTO
SSE
pn
nRa
1
11
12
Adjusted R-squared increases only if MSE decreases, so adjusted R-squared and MSE provide equivalent information.
Find a few subsets for which MSE is smallest (or adjusted R-squared is largest) or so close to the smallest (largest) that adding more predictors is not worthwhile.
![Page 22: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/22.jpg)
Mallow’s Cp criterion
pnXXMSE
SSEC
p
pp 2
),...,( 11
Mallow’s Cp statistic:
is an estimator of total standardized mean square error of prediction:
2
12
ˆ1
n
iiipp YEYE
n
i
n
iipiipp YVarYEYE
1 1
2
2ˆˆ1
which equals:
![Page 23: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/23.jpg)
Using the Cp criterion
• Subsets with small Cp values have a small total (standardized) mean square error of prediction.
• When the Cp value is also near p, the bias of the regression model is small.
![Page 24: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/24.jpg)
Using the Cp criterion (cont’d)
• So, identify subsets of predictors for which:– the Cp value is smallest, and
– the Cp value is near p (if possible)
• Note though that for the full model, Cp= p. So, the fullest model is always judged to be a good candidate model.
![Page 25: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/25.jpg)
Best Subsets Regression: y versus x1, x2, x3, x4
Response is y
x x x x Vars R-Sq R-Sq(adj) C-p S 1 2 3 4
1 67.5 64.5 138.7 8.9639 X 1 66.6 63.6 142.5 9.0771 X 2 97.9 97.4 2.7 2.4063 X X 2 97.2 96.7 5.5 2.7343 X X 3 98.2 97.6 3.0 2.3087 X X X 3 98.2 97.6 3.0 2.3121 X X X 4 98.2 97.4 5.0 2.4460 X X X X
![Page 26: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/26.jpg)
Example: Modeling PIQ
130.5
91.5
100.728
86.283
73.25
65.75
130.591.
5
170.5
127.5
100.72
886.
283 73.25
65.75
170.5
127.5
PIQ
MRI
Height
Weight
![Page 27: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/27.jpg)
Stepwise Regression: PIQ versus MRI, Height, Weight Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is PIQ on 3 predictors, with N = 38
Step 1 2Constant 4.652 111.276
MRI 1.18 2.06T-Value 2.45 3.77P-Value 0.019 0.001
Height -2.73T-Value -2.75P-Value 0.009
S 21.2 19.5R-Sq 14.27 29.49R-Sq(adj) 11.89 25.46C-p 7.3 2.0
![Page 28: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/28.jpg)
Best Subsets Regression: PIQ versus MRI, Height, WeightResponse is PIQ
H W e e i i M g g R h h Vars R-Sq R-Sq(adj) C-p S I t t
1 14.3 11.9 7.3 21.212 X 1 0.9 0.0 13.8 22.810 X 2 29.5 25.5 2.0 19.510 X X 2 19.3 14.6 6.9 20.878 X X 3 29.5 23.3 4.0 19.794 X X X
![Page 29: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/29.jpg)
The regression equation isPIQ = 111 + 2.06 MRI - 2.73 Height
Predictor Coef SE Coef T PConstant 111.28 55.87 1.99 0.054MRI 2.0606 0.5466 3.77 0.001Height -2.7299 0.9932 -2.75 0.009
S = 19.51 R-Sq = 29.5% R-Sq(adj) = 25.5%
Analysis of Variance
Source DF SS MS F PRegression 2 5572.7 2786.4 7.32 0.002Error 35 13321.8 380.6Total 37 18894.6
Source DF Seq SSMRI 1 2697.1Height 1 2875.6
![Page 30: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/30.jpg)
Example: Modeling BP
120
110
53.25
47.75
97.325
89.375
2.125
1.875
8.275
4.425
72.5
65.5
120110
76.25
30.75
53.25
47.75
97.325
89.375 2.1
251.8
758.2
754.4
25 72.5
65.5
76.25
30.75
BP
Age
Weight
BSA
Duration
Pulse
Stress
![Page 31: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/31.jpg)
Stepwise Regression: BP versus Age, Weight, BSA, Duration, Pulse, Stress Alpha-to-Enter: 0.15 Alpha-to-Remove: 0.15 Response is BP on 6 predictors, with N = 20
Step 1 2 3Constant 2.205 -16.579 -13.667
Weight 1.201 1.033 0.906T-Value 12.92 33.15 18.49P-Value 0.000 0.000 0.000
Age 0.708 0.702T-Value 13.23 15.96P-Value 0.000 0.000
BSA 4.6T-Value 3.04P-Value 0.008
S 1.74 0.533 0.437R-Sq 90.26 99.14 99.45R-Sq(adj) 89.72 99.04 99.35C-p 312.8 15.1 6.4
![Page 32: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/32.jpg)
Best Subsets Regression: BP versus Age, Weight, ...Response is BP D u W r S e a P t i t u r A g B i l e g h S o s s Vars R-Sq R-Sq(adj) C-p S e t A n e s
1 90.3 89.7 312.8 1.7405 X 1 75.0 73.6 829.1 2.7903 X 2 99.1 99.0 15.1 0.53269 X X 2 92.0 91.0 256.6 1.6246 X X 3 99.5 99.4 6.4 0.43705 X X X 3 99.2 99.1 14.1 0.52012 X X X 4 99.5 99.4 6.4 0.42591 X X X X 4 99.5 99.4 7.1 0.43500 X X X X 5 99.6 99.4 7.0 0.42142 X X X X X 5 99.5 99.4 7.7 0.43078 X X X X X 6 99.6 99.4 7.0 0.40723 X X X X X X
![Page 33: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/33.jpg)
The regression equation isBP = - 13.7 + 0.702 Age + 0.906 Weight + 4.63 BSA
Predictor Coef SE Coef T PConstant -13.667 2.647 -5.16 0.000Age 0.70162 0.04396 15.96 0.000Weight 0.90582 0.04899 18.49 0.000BSA 4.627 1.521 3.04 0.008S = 0.4370 R-Sq = 99.5% R-Sq(adj) = 99.4%
Analysis of Variance
Source DF SS MS F PRegression 3 556.94 185.65 971.93 0.000Error 16 3.06 0.19Total 19 560.00
Source DF Seq SSAge 1 243.27Weight 1 311.91BSA 1 1.77
![Page 34: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/34.jpg)
Stepwise regression in Minitab
• Stat >> Regression >> Stepwise …
• Specify response and all possible predictors.
• If desired, specify predictors that must be included in every model.
• Select OK. Results appear in session window.
![Page 35: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/35.jpg)
Best subsets regression
• Stat >> Regression >> Best subsets …
• Specify response and all possible predictors.
• If desired, specify predictors that must be included in every model.
• Select OK. Results appear in session window.
![Page 36: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/36.jpg)
Model building strategy
![Page 37: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/37.jpg)
The first step
• Decide on the type of model needed– Predictive: model used to predict the response
variable from a chosen set of predictors.– Theoretical: model based on theoretical
relationship between response and predictors.– Control: model used to control a response
variable by manipulating predictor variables.
![Page 38: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/38.jpg)
The first step (cont’d)
• Decide on the type of model needed– Inferential: model used to explore strength of
relationships between response and predictors.– Data summary: model used merely as a way to
summarize a large set of data by a single equation.
![Page 39: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/39.jpg)
The second step
• Decide which predictor variables and response variable on which to collect the data.
• Collect the data.
![Page 40: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/40.jpg)
The third step
• Explore the data– Check for outliers, gross data errors, missing
values on a univariate basis.– Study bivariate relationships to reveal other
outliers, to suggest possible transformations, to identify possible multicollinearities.
![Page 41: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/41.jpg)
The fourth step
• Randomly divide the data into a training set and a test set:– The training set, with at least 15-20 error d.f.,
is used to fit the model.– The test set is used for cross-validation of the
fitted model.
![Page 42: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/42.jpg)
The fifth step
• Using the training set, fit several candidate models:– Use best subsets regression.– Use stepwise regression (only gives one model
unless specifies different alpha-to-remove and alpha-to-enter values).
![Page 43: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/43.jpg)
The sixth step
• Select and evaluate a few “good” models:– Select based on adjusted R2, Mallow’s Cp,
number and nature of predictors.– Evaluate selected models for violation of model
assumptions.– If none of the models provide a satisfactory fit,
try something else, such as more data, different predictors, a different class of model …
![Page 44: Model selection and model building](https://reader035.fdocuments.net/reader035/viewer/2022062304/56812c55550346895d90df62/html5/thumbnails/44.jpg)
The final step
• Select the final model:– Compare competing models by cross-validating
them against the test data.– The model with a larger cross-validation R2 is a
better predictive model.– Consider residual plots, outliers, parsimony,
relevance, and ease of measurement of predictors.