Polynomial regression models Possible models for when the response function is “curved”
Transcript of Polynomial regression models Possible models for when the response function is “curved”
![Page 1: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/1.jpg)
Polynomial regression models
Possible models for when the response function is “curved”
![Page 2: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/2.jpg)
Uses of polynomial models
• When the true response function really is a polynomial function.
• (Very common!) When the true response function is unknown or complex, but a polynomial function approximates the true function well.
![Page 3: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/3.jpg)
Example
• What is impact of exercise on human immune system?
• Is amount of immunoglobin in blood (y) related to maximal oxygen uptake (x) (in a curved manner)?
![Page 4: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/4.jpg)
7060504030
2000
1500
1000
Immunoglobin (mg)
Max
imal
oxy
ge
n up
take
(m
l/kg
)
Scatter plot
![Page 5: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/5.jpg)
A quadratic polynomial regression function
iiii XXY 21110
where:
• Yi = amount of immunoglobin in blood (mg)
• Xi = maximal oxygen uptake (ml/kg)
• typical assumptions about error terms (“INE”)
![Page 6: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/6.jpg)
Estimated quadratic function
7060504030
2000
1500
1000
oxygen
igg
S = 106.427 R-Sq = 93.8 % R-Sq(adj) = 93.3 %
igg = -1464.40 + 88.3071 oxygen - 0.536247 oxygen**2
Regression Plot
![Page 7: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/7.jpg)
Interpretation of the regression coefficients
• If 0 is a possible x value, then b0 is the predicted response. Otherwise, interpretation of b0 is meaningless.
• b1 does not have a very helpful interpretation. It is the slope of the tangent line at x = 0.
• b2 indicates the up/down direction of curve– b2 < 0 means curve is concave down– b2 > 0 means curve is concave up
![Page 8: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/8.jpg)
The regression equation is igg = - 1464 + 88.3 oxygen - 0.536 oxygensq
Predictor Coef SE Coef T P VIFConstant -1464.4 411.4 -3.56 0.001oxygen 88.31 16.47 5.36 0.000 99.9oxygensq -0.5362 0.1582 -3.39 0.002 99.9
S = 106.4 R-Sq = 93.8% R-Sq(adj) = 93.3%
Analysis of Variance
Source DF SS MS F PRegression 2 4602211 2301105 203.16 0.000Residual Error 27 305818 11327Total 29 4908029
Source DF Seq SSoxygen 1 4472047oxygensq 1 130164
![Page 9: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/9.jpg)
A multicollinearity problem
7060504030
5000
4000
3000
2000
1000
oxygen
oxy
ge
nsq
Pearson correlation of oxygen and oxygensq = 0.995
![Page 10: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/10.jpg)
“Center” the predictors
637.50OxygenOxCent
2637.50 OxygenOxCentSq
Mean of oxygen = 50.637
oxygen oxcent oxcentsq 34.6 -16.037 257.185 45.0 -5.637 31.776 62.3 11.663 136.026 58.9 8.263 68.277 42.5 -8.137 66.211 44.3 -6.337 40.158 67.9 17.263 298.011 58.5 7.863 61.827 35.6 -15.037 226.111 49.6 -1.037 1.075 33.0 -17.637 311.064
![Page 11: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/11.jpg)
Does it really work?
20100-10-20
400
300
200
100
0
oxcent
oxc
ent
sq
Pearson correlation of oxcent and oxcentsq = 0.219
![Page 12: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/12.jpg)
A better quadratic polynomial regression function
iiii xxY 2*11
*1
*0
XXx ii where denotes the centered predictor, and
β*0 = mean response at the predictor mean
β*1 = “linear effect coefficient”
β*11 = “quadratic effect coefficient”
![Page 13: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/13.jpg)
The regression equation isigg = 1632 + 34.0 oxcent - 0.536 oxcentsq
Predictor Coef SE Coef T P VIFConstant 1632.20 29.35 55.61 0.000oxcent 34.000 1.689 20.13 0.000 1.1oxcentsq -0.5362 0.1582 -3.39 0.002 1.1
S = 106.4 R-Sq = 93.8% R-Sq(adj) = 93.3%
Analysis of Variance
Source DF SS MS F PRegression 2 4602211 2301105 203.16 0.000Residual Error 27 305818 11327Total 29 4908029
Source DF Seq SSoxcent 1 4472047oxcentsq 1 130164
![Page 14: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/14.jpg)
Interpretation of the regression coefficients
• b0 is predicted response at the predictor mean.
• b1 is the estimated slope of the tangent line at the predictor mean; and, typically, also the estimated slope in the simple model.
• b2 indicates the up/down direction of curve
– b2 < 0 means curve is concave down
– b2 > 0 means curve is concave up
![Page 15: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/15.jpg)
20 10 0-10-20
2000
1500
1000
oxcent
igg
S = 106.427 R-Sq = 93.8 % R-Sq(adj) = 93.3 %
igg = 1632.20 + 33.9995 oxcent - 0.536247 oxcent**2
Regression Plot
Estimated regression function
![Page 16: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/16.jpg)
Similar estimates
20 10 0-10-20
2000
1500
1000
oxcent
igg
S = 124.783 R-Sq = 91.1 % R-Sq(adj) = 90.8 %
igg = 1557.63 + 32.7427 oxcent
Regression Plot
![Page 17: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/17.jpg)
The relationship between the two forms of the model
2*11
*1
*0
ˆiii xbxbbY Centered model:
21110
ˆiii XbXbbY Original model:
*1111
*11
*11
2*11
*1
*00
2
bb
Xbbb
XbXbbb
Where:
![Page 18: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/18.jpg)
25362.00.342.1632ˆiii xxY
5362.0
3.88)637.50)(5362.(234
3.1464)637.50(5362.0)637.50(342.1632
11
1
20
b
b
b
2536.031.884.1464ˆiii XXY
Mean of oxygen = 50.637
![Page 19: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/19.jpg)
200015001000
200
100
0
-100
-200
Fitted Value
Res
idua
lResiduals Versus the Fitted Values
(response is igg)
![Page 20: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/20.jpg)
2001000-100-200
2
1
0
-1
-2
Nor
mal
Sco
re
Residual
Normal Probability Plot of the Residuals(response is igg)
![Page 21: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/21.jpg)
What is predicted IgG if maximal oxygen uptake is 90?
There is an even greater danger in extrapolation when modeling data with a polynomial function, because of changes in direction.
Predicted Values for New Observations
New Obs Fit SE Fit 95.0% CI 95.0% PI1 2139.6 219.2 (1689.8,2589.5) (1639.6,2639.7) XXX denotes a row with X values away from the centerXX denotes a row with very extreme X values
Values of Predictors for New Observations
New Obs oxcent oxcentsq1 39.4 1549
![Page 22: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/22.jpg)
It is possible to “overfit” the data with polynomial models.
65432
8
7
6
5
4
3
2
x
y
S = 2.62950 R-Sq = 64.0 % R-Sq(adj) = 0.0 %
- 8.64286 x**2 + 0.666667 x**3
y = -38.4 + 34.9762 x
Regression Plot
![Page 23: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/23.jpg)
It is even theoretically possible to fit the data perfectly.
If you have n data points, then a polynomial of order n-1 will fit the data perfectly, that is, it will pass through each data point.
** Error ** Not enough non-missing observations to fit a polynomial of this order; execution aborted
But, good statistical software will keep an unsuspecting user from fitting such a model.
![Page 24: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/24.jpg)
The hierarchical approach to model fitting
Widely accepted approach is to fit a higher-order model and then explore whether a lower-order (simpler) model is adequate.
iiiii xxxY 3111
21110
Is a first-order linear model (“line”) adequate?
0: 111110 H
![Page 25: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/25.jpg)
The hierarchical approach to model fitting
But then … if a polynomial term of a given order is retained, then all related lower-order terms are also retained.
That is, if a quadratic term was significant, you would use this regression function:
21110 iii xxYE
2110 ii xYE
and not this one:
![Page 26: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/26.jpg)
Example
• Quality of a product (y) – a score between 0 and 100
• Temperature (x1) – degrees Fahrenheit
• Pressure (x2) – pounds per square inch
![Page 27: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/27.jpg)
82.725
53.375
95
85
82.72553.375
57.5
52.5
9585 57.552.5
quality
temp
pressure
![Page 28: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/28.jpg)
A two-predictor, second-order polynomial regression function
iiiiiiii XXXXXXY 21122222
211122110
where:
• Yi = quality
• Xi1 = temperature
• Xi2 = pressure
• β12 = “interaction effect coefficient”
![Page 29: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/29.jpg)
The regression equation isquality = - 5128 + 31.1 temp + 140 pressure - 0.133 tempsq - 1.14 presssq - 0.145 tp
Predictor Coef SE Coef T P VIFConstant -5127.9 110.3 -46.49 0.000temp 31.096 1.344 23.13 0.000 1154.5pressure 139.747 3.140 44.50 0.000 1574.5tempsq -0.133389 0.006853 -19.46 0.000 973.0Press -1.14422 0.02741 -41.74 0.000 1453.0tp -0.145500 0.009692 -15.01 0.000 304.0
S = 1.679 R-Sq = 99.3% R-Sq(adj) = 99.1%
![Page 30: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/30.jpg)
Again, some correlation
quality temp pressure tempsq presssqtemp -0.423pressure 0.182 0.000tempsq -0.434 0.999 0.000presssq 0.162 0.000 1.000 -0.000tp -0.227 0.773 0.632 0.772 0.632
Cell Contents: Pearson correlation
![Page 31: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/31.jpg)
A better two-predictor, second-order polynomial regression function
iiiiiiii xxxxxxY 21*12
22
*22
21
*112
*21
*1
*0
where:
• Yi = quality
• xi1 = centered temperature
• xi2 = centered pressure
• β*12 = “interaction effect coefficient”
![Page 32: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/32.jpg)
Reduced correlation
quality tcent pcent tpcent tcentsqtcent -0.423pcent 0.182 0.000tpcent -0.274 0.000 0.000tcentsq -0.355 -0.000 0.000 0.000pcentsq -0.762 0.000 0.000 0.000 -0.000
Cell Contents: Pearson correlation
![Page 33: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/33.jpg)
The regression equation isquality = 94.9 - 0.916 tcent + 0.788 pcent - 0.146 tpcent - 0.133 tcentsq - 1.14 pcentsq
Predictor Coef SE Coef T P VIFConstant 94.9259 0.7224 131.40 0.000tcent -0.91611 0.03957 -23.15 0.000 1.0pcent 0.78778 0.07913 9.95 0.000 1.0tpcent -0.145500 0.009692 -15.01 0.000 1.0tcentsq -0.133389 0.006853 -19.46 0.000 1.0pcentsq -1.14422 0.02741 -41.74 0.000 1.0
S = 1.679 R-Sq = 99.3% R-Sq(adj) = 99.1%
![Page 34: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/34.jpg)
100908070605040
3
2
1
0
-1
-2
-3
Fitted Value
Res
idua
l
Residuals Versus the Fitted Values(response is quality)
![Page 35: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/35.jpg)
3210-1-2-3
2
1
0
-1
-2
Nor
mal
Sco
re
Residual
Normal Probability Plot of the Residuals(response is quality)
![Page 36: Polynomial regression models Possible models for when the response function is “curved”](https://reader036.fdocuments.net/reader036/viewer/2022062516/56649d995503460f94a8413a/html5/thumbnails/36.jpg)
Predicted Values for New Observations
New Obs Fit SE Fit 95.0% CI 95.0% PI1 94.926 0.722 (93.424,96.428) (91.125,98.726)
Values of Predictors for New Observations
New Obs tcent pcent tpcent tcentsq pcentsq1 0.0000 0.0000 0.0000 0.0000 0.0000